2025-11-19T20:13:14.047070

Regression discontinuity aggregation, with an application to the union effects on inequality

Borusyak, Kolerman-Shemer
We extend the regression discontinuity (RD) design to settings where each unit's treatment status is an average or aggregate across multiple discontinuity events. Such situations arise in many studies where the outcome is measured at a higher level of spatial or temporal aggregation (e.g., by state with district-level discontinuities) or when spillovers from discontinuity events are of interest. We propose two novel estimation procedures - one at the level at which the outcome is measured and the other in the sample of discontinuities - and show that both identify a local average causal effect under continuity assumptions similar to those of standard RD designs. We apply these ideas to study the effect of unionization on inequality in the United States. Using credible variation from close unionization elections at the establishment level, we show that a higher rate of newly unionized workers in a state-by-industry cell reduces wage inequality within the cell.
academic

Regression discontinuity aggregation, with an application to the union effects on inequality

Basic Information

  • Paper ID: 2501.00428
  • Title: Regression discontinuity aggregation, with an application to the union effects on inequality
  • Authors: Kirill Borusyak (UC Berkeley), Matan Kolerman-Shemer (The Hebrew University of Jerusalem)
  • Classification: econ.EM (Econometrics)
  • Publication Date: December 2024
  • Paper Link: https://arxiv.org/abs/2501.00428

Abstract

This paper extends regression discontinuity (RD) design to settings where the treatment status for each unit is an average or aggregation of multiple discontinuity events. This situation arises in many studies where outcomes are measured at higher levels of spatial or temporal aggregation (e.g., state versus district-level discontinuities), or when spillover effects from discontinuity events are of concern. The authors propose two new estimation procedures—one at the level where outcomes are measured and another within the discontinuity sample—and demonstrate that both identify local average causal effects under continuity assumptions similar to standard RD designs. By applying these ideas to study the effects of unionization on inequality in the United States, leveraging credible variation from union elections at the establishment level, the authors show that increases in the share of newly unionized workers in state-industry units reduce wage inequality within those units.

Research Background and Motivation

Core Problem

Traditional regression discontinuity design (RD) requires that each unit is exposed to only a single discontinuity event. However, in many empirical studies, outcome variables are defined at higher levels of aggregation than the discontinuity events. For example:

  1. Legislative Studies: State-level outcomes depend on election results from multiple single-member districts
  2. Temporal Aggregation: Units are exposed to multiple RD events across multiple periods
  3. Spillover Effects: Each unit is exposed to multiple elections of neighbors

Importance of the Problem

Such settings are extremely common in empirical research, spanning political economy, labor economics, public finance, and other fields. Existing literature typically employs ad hoc approaches to handle these situations, lacking a unified theoretical framework and optimal estimation methods.

Limitations of Existing Methods

  1. Upper-level Specification: Often fails to include all necessary local linear control variables, losing the bias reduction advantages of RD design
  2. Lower-level Specification: Most use reduced-form estimation without defining coherent causal models
  3. Sample Restrictions: Some studies unnecessarily restrict samples, reducing statistical power

Core Contributions

  1. Theoretical Innovation: Proposes the regression discontinuity aggregation (RDA) framework, extending RD design to aggregated settings
  2. Methodological Contribution: Develops two estimators—an upper-level IV estimator and a lower-level stacked estimator
  3. Theoretical Proof: Demonstrates that both estimators identify the same local average treatment effect under similar continuity assumptions
  4. Empirical Application: Applies the RDA method to study the effects of unionization on inequality in the United States
  5. Policy Implications: Finds that unionization significantly reduces wage inequality within state-industry units

Methodological Details

Problem Definition

Consider N upper-level units i, each containing Ji lower-level sub-units j. Sub-unit j is characterized by a running variable rj and treatment zj = 1rj ≥ 0. The goal is to estimate the causal model:

Yi = βXi + εi

where Xi is the upper-level treatment variable, typically defined as:

Xi = Σj∈Ji sj zj

Model Architecture

1. Upper-level IV Estimator

Constructs instrumental variables using sub-units close to the cutoff:

Zi = Σj∈Ci sj zj

where Ci = {j ∈ Ji : |rj| ≤ h} is the set of sub-units close to the cutoff.

The key innovation is the aggregated RDA control variables:

Qi = (Σj∈Ci sj, Σj∈Ci sj rj, Σj∈Ci sj r+j)'

Estimation specification:

Yi = βXi + γ0 Σj∈Ci sj + γ1 Σj∈Ci sj rj + γ2 Σj∈Ci sj r+j + γ̃'W̃i + errori

2. Lower-level Stacked Estimator

Estimates a fuzzy RD specification within the sample of elections close to the cutoff:

Yi(j) = βXi(j) + γ̃'W̃i(j) + λ'qj + errorj

where Xi(j) is instrumented by zj, and qj = (1, rj, r+j) are standard RD control variables.

Technical Innovations

1. Theoretical Equivalence

Proposition 1 establishes the numerical equivalence of the upper-level and lower-level estimators: the upper-level IV estimator equals a specific sub-unit level fuzzy RD estimator.

2. Identification Results

Proposition 2 shows that under standard continuity assumptions, both estimators identify the same local average treatment effect:

β0 = E[sj · (Yi(j)(Xi(j)(1, zi(j)−j)) − Yi(j)(Xi(j)(0, zi(j)−j))) | rj = 0] / 
     E[sj · (Xi(j)(1, zi(j)−j) − Xi(j)(0, zi(j)−j)) | rj = 0]

3. Bias Reduction Properties

Monte Carlo simulations show that the estimator including aggregated local linear control variables inherits the bias reduction properties of traditional RD methods.

Experimental Setup

Dataset

  1. Union Election Data: Establishment-level union election data from the NLRB for 1961-2009
  2. Labor Market Outcomes: Based on decennial census samples from 1960-2010
  3. Supplementary Data: Union density and benefits data from the Current Population Survey (CPS)

Treatment and Instrumental Variables

  • Treatment Variable: NewUnionssit, the share of newly unionized workers in state-industry units
  • Instrumental Variable: Zsit, the share of workers unionized through close elections (vote share 50±10%)
  • RDA Control Variables: Involve the share of workers in close elections, average vote margins, etc.

Evaluation Metrics

Five inequality measures:

  1. Log college wage premium
  2. Log 90-10 wage ratio
  3. Gini coefficient
  4. Top 10% income share
  5. Log wage variance

Experimental Results

Main Results

Inequality Effects

For each percentage point increase in the new unionization rate:

  • Gini coefficient decreases by 0.018 (upper-level estimator) / 0.013 (lower-level estimator)
  • 90-10 ratio decreases by 0.46 / 0.27 log points
  • Top 10% share decreases by 0.14 / 0.12 percentage points
  • Log wage variance decreases by 0.025 / 0.021

Wage Distribution Effects

Unionization reduces inequality primarily by compressing high-end wages rather than raising low-end wages:

  • Average wages decline by 0.35 log points
  • Managerial wages decline significantly by 0.92 log points
  • 10th percentile wages increase slightly but insignificantly

Benefit Mechanisms

Unionization significantly increases pension coverage: each new union member corresponds to an increase of 1.48 pension holders, indicating substantial inter-establishment spillover effects.

Historical Contribution Analysis

Counterfactual analysis shows that if new unionization rates had remained at 1960s levels:

  • Gini coefficient: Union decline explains 34.5% of growth from 1970-2010
  • 90-10 ratio: Explains 33.7% of growth
  • Top 10% share: Explains 38.3% of growth
  • College premium: Explains 60.5% of growth

Robustness Checks

Results remain robust across multiple specifications:

  • Different bandwidth choices (10% and 15%)
  • Excluding union decertification elections
  • Different fixed effects specifications
  • Weighted and unweighted estimates

RD Literature

This paper extends standard RD design, distinguishing itself from multi-score RD designs in that multi-score RD addresses multiple running variables at a single boundary, while RDA addresses aggregated RD shocks.

Shift-Share Literature

The theoretical analysis builds on shift-share instrumental variable literature, particularly Borusyak et al. (2022)'s numerical equivalence results.

Union and Inequality Literature

Provides a new causal identification strategy for the effects of unions on inequality, complementing research such as Farber et al. (2021) based on observable selection.

Conclusions and Discussion

Main Conclusions

  1. Methodology: The RDA framework provides a unified theoretical foundation and optimal estimation methods for handling aggregated RD settings
  2. Empirical Findings: Unionization significantly reduces wage inequality, primarily through compressing the upper end of the wage distribution
  3. Policy Implications: Union decline is an important factor in rising U.S. inequality

Limitations

  1. Extrapolation: Based on local variation from close elections, extrapolating to long-term effects
  2. Aggregation Level: Only considers inequality within state-industry units, not between-unit inequality
  3. Mechanism Identification: The specific mechanisms through which unions affect inequality require further investigation

Future Directions

  1. Extension to other aggregation settings and spillover effect studies
  2. Development of methods addressing endogenous aggregation structures
  3. Exploration of theoretical properties of dynamic RD aggregation

In-Depth Evaluation

Strengths

  1. Theoretical Contribution: Fills a gap in the RD literature for aggregated settings, providing a rigorous theoretical foundation
  2. Methodological Innovation: The two estimators are cleverly designed and inherit desirable properties of traditional RD
  3. Empirical Value: Provides new causal evidence for important policy questions
  4. Strong Practicality: The method applies to a wide range of economic research fields

Weaknesses

  1. Complexity: RDA is more complex to implement compared to standard RD
  2. Assumption Strength: Requires stronger continuity assumptions to handle multiple running variables
  3. Computational Burden: Particularly the lower-level estimator requires handling large numbers of repeated observations

Impact

  1. Academic Contribution: Makes important methodological contributions to econometrics
  2. Policy Relevance: Provides new tools for labor policy and inequality research
  3. Reproducibility: Offers detailed implementation guidance and code

Applicable Scenarios

  1. Legislative studies in political economy
  2. School bond studies in education economics
  3. Spillover effect studies in labor economics
  4. Any economic research involving aggregated RD settings

References

This paper cites important literature in econometrics, labor economics, and political economy, particularly:

  • Borusyak et al. (2022) on shift-share instrumental variables
  • Frandsen (2021) on RD design with union elections
  • Farber et al. (2021) on unions and inequality

Overall Assessment: This is a high-quality econometric methodology paper that not only provides important theoretical contributions but also demonstrates the value of the method through meaningful empirical applications. The RDA framework fills a literature gap and provides more appropriate identification strategies for many economic studies.