2025-11-19T20:13:14.047070

Regression discontinuity aggregation, with an application to the union effects on inequality

Borusyak, Kolerman-Shemer

We extend the regression discontinuity (RD) design to settings where each unit's treatment status is an average or aggregate across multiple discontinuity events. Such situations arise in many studies where the outcome is measured at a higher level of spatial or temporal aggregation (e.g., by state with district-level discontinuities) or when spillovers from discontinuity events are of interest. We propose two novel estimation procedures - one at the level at which the outcome is measured and the other in the sample of discontinuities - and show that both identify a local average causal effect under continuity assumptions similar to those of standard RD designs. We apply these ideas to study the effect of unionization on inequality in the United States. Using credible variation from close unionization elections at the establishment level, we show that a higher rate of newly unionized workers in a state-by-industry cell reduces wage inequality within the cell.

academic

Regression discontinuity aggregation, with an application to the union effects on inequality

Basic Information

Paper ID: 2501.00428
Title: Regression discontinuity aggregation, with an application to the union effects on inequality
Authors: Kirill Borusyak (UC Berkeley), Matan Kolerman-Shemer (The Hebrew University of Jerusalem)
Classification: econ.EM (Econometrics)
Publication Date: December 2024
Paper Link: https://arxiv.org/abs/2501.00428

Abstract

This paper extends regression discontinuity (RD) design to settings where the treatment status for each unit is an average or aggregation of multiple discontinuity events. This situation arises in many studies where outcomes are measured at higher levels of spatial or temporal aggregation (e.g., state versus district-level discontinuities), or when spillover effects from discontinuity events are of concern. The authors propose two new estimation procedures—one at the level where outcomes are measured and another within the discontinuity sample—and demonstrate that both identify local average causal effects under continuity assumptions similar to standard RD designs. By applying these ideas to study the effects of unionization on inequality in the United States, leveraging credible variation from union elections at the establishment level, the authors show that increases in the share of newly unionized workers in state-industry units reduce wage inequality within those units.

Research Background and Motivation

Core Problem

Traditional regression discontinuity design (RD) requires that each unit is exposed to only a single discontinuity event. However, in many empirical studies, outcome variables are defined at higher levels of aggregation than the discontinuity events. For example:

Legislative Studies: State-level outcomes depend on election results from multiple single-member districts
Temporal Aggregation: Units are exposed to multiple RD events across multiple periods
Spillover Effects: Each unit is exposed to multiple elections of neighbors

Importance of the Problem

Such settings are extremely common in empirical research, spanning political economy, labor economics, public finance, and other fields. Existing literature typically employs ad hoc approaches to handle these situations, lacking a unified theoretical framework and optimal estimation methods.

Limitations of Existing Methods

Upper-level Specification: Often fails to include all necessary local linear control variables, losing the bias reduction advantages of RD design
Lower-level Specification: Most use reduced-form estimation without defining coherent causal models
Sample Restrictions: Some studies unnecessarily restrict samples, reducing statistical power

Core Contributions

Theoretical Innovation: Proposes the regression discontinuity aggregation (RDA) framework, extending RD design to aggregated settings
Methodological Contribution: Develops two estimators—an upper-level IV estimator and a lower-level stacked estimator
Theoretical Proof: Demonstrates that both estimators identify the same local average treatment effect under similar continuity assumptions
Empirical Application: Applies the RDA method to study the effects of unionization on inequality in the United States
Policy Implications: Finds that unionization significantly reduces wage inequality within state-industry units

Methodological Details

Problem Definition

Consider N upper-level units i, each containing Ji lower-level sub-units j. Sub-unit j is characterized by a running variable rj and treatment zj = 1rj ≥ 0. The goal is to estimate the causal model:

Yi = βXi + εi

where Xi is the upper-level treatment variable, typically defined as:

Xi = Σj∈Ji sj zj

Model Architecture

1. Upper-level IV Estimator

Constructs instrumental variables using sub-units close to the cutoff:

Zi = Σj∈Ci sj zj

where Ci = {j ∈ Ji : |rj| ≤ h} is the set of sub-units close to the cutoff.

The key innovation is the aggregated RDA control variables:

Qi = (Σj∈Ci sj, Σj∈Ci sj rj, Σj∈Ci sj r+j)'

Estimation specification:

Yi = βXi + γ0 Σj∈Ci sj + γ1 Σj∈Ci sj rj + γ2 Σj∈Ci sj r+j + γ̃'W̃i + errori

2. Lower-level Stacked Estimator

Estimates a fuzzy RD specification within the sample of elections close to the cutoff:

Yi(j) = βXi(j) + γ̃'W̃i(j) + λ'qj + errorj

where Xi(j) is instrumented by zj, and qj = (1, rj, r+j) are standard RD control variables.

Technical Innovations

1. Theoretical Equivalence

Proposition 1 establishes the numerical equivalence of the upper-level and lower-level estimators: the upper-level IV estimator equals a specific sub-unit level fuzzy RD estimator.

2. Identification Results

Proposition 2 shows that under standard continuity assumptions, both estimators identify the same local average treatment effect:

β0 = E[sj · (Yi(j)(Xi(j)(1, zi(j)−j)) − Yi(j)(Xi(j)(0, zi(j)−j))) | rj = 0] / 
     E[sj · (Xi(j)(1, zi(j)−j) − Xi(j)(0, zi(j)−j)) | rj = 0]

3. Bias Reduction Properties

Monte Carlo simulations show that the estimator including aggregated local linear control variables inherits the bias reduction properties of traditional RD methods.

Experimental Setup

Dataset

Union Election Data: Establishment-level union election data from the NLRB for 1961-2009
Labor Market Outcomes: Based on decennial census samples from 1960-2010
Supplementary Data: Union density and benefits data from the Current Population Survey (CPS)

Treatment and Instrumental Variables

Treatment Variable: NewUnionssit, the share of newly unionized workers in state-industry units
Instrumental Variable: Zsit, the share of workers unionized through close elections (vote share 50±10%)
RDA Control Variables: Involve the share of workers in close elections, average vote margins, etc.

Evaluation Metrics

Five inequality measures:

Log college wage premium
Log 90-10 wage ratio
Gini coefficient
Top 10% income share
Log wage variance

Experimental Results

Main Results

Inequality Effects

For each percentage point increase in the new unionization rate:

Gini coefficient decreases by 0.018 (upper-level estimator) / 0.013 (lower-level estimator)
90-10 ratio decreases by 0.46 / 0.27 log points
Top 10% share decreases by 0.14 / 0.12 percentage points
Log wage variance decreases by 0.025 / 0.021

Wage Distribution Effects

Unionization reduces inequality primarily by compressing high-end wages rather than raising low-end wages:

Average wages decline by 0.35 log points
Managerial wages decline significantly by 0.92 log points
10th percentile wages increase slightly but insignificantly

Benefit Mechanisms

Unionization significantly increases pension coverage: each new union member corresponds to an increase of 1.48 pension holders, indicating substantial inter-establishment spillover effects.

Historical Contribution Analysis

Counterfactual analysis shows that if new unionization rates had remained at 1960s levels:

Gini coefficient: Union decline explains 34.5% of growth from 1970-2010
90-10 ratio: Explains 33.7% of growth
Top 10% share: Explains 38.3% of growth
College premium: Explains 60.5% of growth

Robustness Checks

Results remain robust across multiple specifications:

Different bandwidth choices (10% and 15%)
Excluding union decertification elections
Different fixed effects specifications
Weighted and unweighted estimates

RD Literature

This paper extends standard RD design, distinguishing itself from multi-score RD designs in that multi-score RD addresses multiple running variables at a single boundary, while RDA addresses aggregated RD shocks.

The theoretical analysis builds on shift-share instrumental variable literature, particularly Borusyak et al. (2022)'s numerical equivalence results.

Union and Inequality Literature

Provides a new causal identification strategy for the effects of unions on inequality, complementing research such as Farber et al. (2021) based on observable selection.

Conclusions and Discussion

Main Conclusions

Methodology: The RDA framework provides a unified theoretical foundation and optimal estimation methods for handling aggregated RD settings
Empirical Findings: Unionization significantly reduces wage inequality, primarily through compressing the upper end of the wage distribution
Policy Implications: Union decline is an important factor in rising U.S. inequality

Limitations

Extrapolation: Based on local variation from close elections, extrapolating to long-term effects
Aggregation Level: Only considers inequality within state-industry units, not between-unit inequality
Mechanism Identification: The specific mechanisms through which unions affect inequality require further investigation

Future Directions

Extension to other aggregation settings and spillover effect studies
Development of methods addressing endogenous aggregation structures
Exploration of theoretical properties of dynamic RD aggregation

In-Depth Evaluation

Strengths

Theoretical Contribution: Fills a gap in the RD literature for aggregated settings, providing a rigorous theoretical foundation
Methodological Innovation: The two estimators are cleverly designed and inherit desirable properties of traditional RD
Empirical Value: Provides new causal evidence for important policy questions
Strong Practicality: The method applies to a wide range of economic research fields

Weaknesses

Complexity: RDA is more complex to implement compared to standard RD
Assumption Strength: Requires stronger continuity assumptions to handle multiple running variables
Computational Burden: Particularly the lower-level estimator requires handling large numbers of repeated observations

Impact

Academic Contribution: Makes important methodological contributions to econometrics
Policy Relevance: Provides new tools for labor policy and inequality research
Reproducibility: Offers detailed implementation guidance and code

Applicable Scenarios

Legislative studies in political economy
School bond studies in education economics
Spillover effect studies in labor economics
Any economic research involving aggregated RD settings

References

This paper cites important literature in econometrics, labor economics, and political economy, particularly:

Borusyak et al. (2022) on shift-share instrumental variables
Frandsen (2021) on RD design with union elections
Farber et al. (2021) on unions and inequality

Overall Assessment: This is a high-quality econometric methodology paper that not only provides important theoretical contributions but also demonstrates the value of the method through meaningful empirical applications. The RDA framework fills a literature gap and provides more appropriate identification strategies for many economic studies.