Inference on effect size after multiple hypothesis testing
Dzemski, Okui, Wang
Significant treatment effects are often emphasized when interpreting and summarizing empirical findings in studies that estimate multiple, possibly many, treatment effects. Under this kind of selective reporting, conventional treatment effect estimates may be biased and their corresponding confidence intervals may undercover the true effect sizes. We propose new estimators and confidence intervals that provide valid inferences on the effect sizes of the significant effects after multiple hypothesis testing. Our methods are based on the principle of selective conditional inference and complement a wide range of tests, including step-up tests and bootstrap-based step-down tests. Our approach is scalable, allowing us to study an application with over 370 estimated effects. We justify our procedure for asymptotically normal treatment effect estimators. We provide two empirical examples that demonstrate bias correction and confidence interval adjustments for significant effects. The magnitude and direction of the bias correction depend on the correlation structure of the estimated effects and whether the interpretation of the significant effects depends on the (in)significance of other effects.
academic
Inference on Effect Size After Multiple Hypothesis Testing
In studies estimating multiple treatment effects, statistically significant treatment effects are often emphasized when interpreting and summarizing empirical findings. Under such selective reporting, conventional treatment effect estimates may be biased, and their corresponding confidence intervals may fail to provide adequate coverage of true effect sizes. This paper proposes new estimators and confidence intervals that provide valid inference on effect sizes of significant effects after multiple hypothesis testing. The method is based on the principle of selective conditional inference and applies to a broad range of testing procedures, including step-up tests and bootstrap-based step-down tests. The approach is scalable and can be applied to studies with over 370 estimated effects. The authors establish the validity of the procedure for asymptotically normal treatment effect estimators and provide two empirical examples demonstrating bias correction and confidence interval adjustment for significant effects.
In empirical research across economics, medicine, psychology, and other fields, researchers frequently need to estimate multiple treatment effects. These effects may arise from different outcome variables, intervention types, or population subgroups. Through multiple hypothesis testing procedures, researchers classify these effects as statistically significant or insignificant, then focus on the practical importance of significant effects.
When researchers restrict attention to significant effects, the estimated magnitudes of these effects are subject to selection bias, which invalidates traditional statistical inference methods. Specifically:
Selection Bias: Significant effects tend to be positively selected ("winner's curse"), with magnitudes overestimated
Insufficient Confidence Interval Coverage: Traditional confidence intervals fail to provide valid statistical coverage
Lack of Bias Correction: Existing methods lack unbiased estimation for post-selection effect sizes
The paper argues that avoiding selective summarization and interpretation does not solve the problem but merely shifts the burden of synthesizing results to readers, who still face selective inference issues. Therefore, specialized statistical methods are needed to handle inference after multiple hypothesis testing.
Proposes a new method based on conditional selective inference: Provides valid point estimates and confidence intervals for effect sizes of significant effects after multiple hypothesis testing
Develops efficient computational algorithms: Proposes an algorithm with O(m³log m) time complexity, enabling the method to scale to applications with hundreds of effects
Establishes asymptotic theory: Proves the asymptotic validity of the procedure under asymptotically normal treatment effect estimators
Provides broad applicability: The method applies to various multiple testing procedures, including step-down and step-up tests
Demonstrates practical value: Two empirical applications validate the method's effectiveness and utility
Given m treatment effect parameters θ = (θ₁, ..., θₘ)' and their estimators θ̂, after determining the set of significant effects Ŝ through multiple hypothesis testing, conduct unbiased inference on the true effect sizes of significant effects.
Traditional methods require direct computation of the complex selection event X(S). This paper avoids such computation through the following innovation:
Algorithm 2: Computing Conditional Support
(A) Find intervals I by computing all intersections of linear functions xz,h(xs)
(B) For each interval I:
i. Find sorting permutation σ*I
ii. Compute interval boundaries ℓ(I) and u(I)
(C) Return ∪I I ∩ [ℓ(I), u(I)]
Response Rate and Donation Amount Including Match are significant across all three procedures
The direction and magnitude of bias correction depend on the correlation structure
For "Donation Amount Including Match," upward correction occurs under Holm and Bonferroni tests, related to the insignificance of the highly correlated "Donation Amount Excluding Match"
Theorem 4 provides sufficient conditions for conditional confidence intervals to converge to unconditional confidence intervals, with the two methods tending to agree when effects are "highly significant."
Method Validity: The proposed conditional inference method performs well in finite samples and captures selection bias even under non-Gaussian settings
Computational Feasibility: The polynomial time complexity of the algorithm enables the method to handle hundreds of effects
Practical Value: Two empirical applications show that the direction and magnitude of bias correction are difficult to anticipate, highlighting the relevance of formal statistical methods
Pre-specification Assumption: The method assumes the full set of tested hypotheses is known, unable to handle cases where insignificant results are hidden
Computational Complexity: While polynomial time, the method may face computational challenges for very large m
Model Assumptions: Requires asymptotic normality and consistently estimable covariance matrices
The paper cites key literature in selective inference, including Lee et al. (2016) on polyhedral methods, Fithian et al. (2017) on conditional selective inference principles, and Romano and Wolf (2005) on multiple testing procedures. These citations reflect the paper's depth and breadth in the field.