Directional replicability addresses the question of whether an effect studied across $n$ independent studies is present with the same direction in at least $r$ of them, for $r \geq 2$. When the expected direction of the effect is not specified in advance, the state of the art recommends assessing replicability separately by combining one-sided $p$-values for both directions (left and right), and then doubling the smaller of the two resulting combined $p$-values to account for multiple testing. In this work, we show that this multiplicative correction is not always necessary, and give conditions under which it can be safely omitted.
- Paper ID: 2510.11273
- Title: Directional replicability: when can the factor of two be omitted
- Authors: Vera Djordjilović (University of Venice), Tamar Sofer (Harvard Medical School), Jonathan M. Dreyfuss (Harvard Medical School)
- Classification: stat.ME (Statistical Methodology)
- Publication Date: October 13, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.11273
Directional replicability examines whether a certain effect exists in the same direction across at least r out of n independent studies (r ≥ 2). When the expected direction of the effect is not specified a priori, current practice recommends separately combining one-sided p-values for both directions to assess replicability, then multiplying the smaller of the two combined p-values by 2 to correct for multiple testing. This study demonstrates that this multiplicative correction is not always necessary and provides conditions under which this correction can be safely omitted.
- Problem to be Addressed: Statistical testing for assessing the consistency of effect directions across multiple independent studies, particularly determining when the traditional factor-of-two correction can be omitted.
- Problem Significance:
- Low reproducibility of scientific findings is prevalent in medicine, economics, psychology, and other fields
- Formal statistical methods are needed to assess the replicability of research findings
- Directional replicability is more stringent than merely observing effect existence, requiring consistency in effect direction
- Limitations of Existing Methods:
- Standard methods always apply a factor-of-two correction to the smaller combined p-value for multiple testing
- This correction may be overly conservative, reducing the power of the test
- Research Motivation: Through theoretical analysis, determine when the factor-of-two correction can be safely omitted, thereby improving the power of statistical tests.
- Theoretical Results: Proves that when r > (n+1)/2, the factor-of-two correction can be safely omitted when using the Bonferroni method to combine p-values
- Counterexample Construction: For smaller r values, counterexamples demonstrate that the correction factor is necessary
- Boundary Conditions: Clearly identifies critical conditions where correction is and is not needed
- Practical Guidance: Provides procedures for data-adaptive selection of r values
- Extended Discussion: Explores possible extensions of results to other combination functions
Let θ = (θ₁, ..., θₙ) ∈ ℝⁿ denote the vector of true effect sizes in n studies. Define:
- n₊ = |{i : θᵢ > 0}|: number of positive effects
- n₋ = |{i : θᵢ < 0}|: number of negative effects
Null Hypothesis for r out of n Directional Replicability:
H_{r/n} : n₊ < r ∧ n₋ < r
Corresponding Alternative Hypothesis:
K_{r/n} : n₊ ≥ r ∨ n₋ ≥ r
- Basic Setup:
- Assume independent normal estimators: Tᵢ ~ N(θᵢ, 1)
- One-sided p-values: pᵢ = 1 - Φ(Tᵢ), qᵢ = Φ(Tᵢ) = 1 - pᵢ
- Bonferroni Partial Combined p-values:
- Positive direction: p⁺_{r/n} = (n - r + 1)p₍ᵣ₎
- Negative direction: p⁻_{r/n} = (n - r + 1)p₍ₙ₋ᵣ₊₁₎
- Traditional Method:
p_{r/n} = 2min{p⁻{r/n}, p⁺{r/n}}
Main Theorem (Theorem 1):
When (n+1)/2 < r ≤ n, p_{r/n} = min{p⁻{r/n}, p⁺{r/n}} is a valid p-value for H_{r/n}.
Key Proof Strategy:
- When 2r > n + 1, T₍ᵣ₎ ≥ T₍ₙ₋ᵣ₊₁₎, causing the two Type I error events to be disjoint
- Type I error probability can be expressed as: c(θ) = Pr_θ(X ≥ r) + Pr_θ(Y ≥ r)
- By analyzing partial derivatives, prove that c(θ) achieves its maximum on the boundary
- The maximum value equals exactly α, requiring no additional correction
- Set n = 20 studies
- Consider two parameter configurations:
- "Consistent": θ⁺ = (∞,...,∞,0,...,0) (first r-1 positive infinity)
- "Inconsistent": θ* = (∞,...,∞,-∞,...,-∞,0,...,0) (r-1 positive infinity and r-1 negative infinity)
- Type I error probability c(θ)
- Nominal significance level α = 0.1
Numerical Results Shown in Figure 1:
- For r ∈ {2,...,7}: Type I error under inconsistent configuration exceeds that under consistent configuration and exceeds α
- For r ∈ {8,9,10}: Type I error under both configurations falls below α
- When r > 10, the condition r > (n+1)/2 is satisfied, returning to the setting of Theorem 1
Proposition 1: For n = 3, r = 2, although not satisfying the condition of Theorem 1, p_{r/n} is still a valid p-value.
Proof Highlights:
- By analyzing partial derivatives, prove that function c(θ) has no critical points in the feasible region
- By limit analysis, prove that the supremum equals α
- Sufficient but Not Necessary Condition: The condition r > (n+1)/2 given in Theorem 1 is sufficient but not necessary
- Transition Region: There exists a transition region where correction may not be necessary but requires specific analysis
- Type III Error Control: The proposed procedure can control Type III error, allowing post-hoc inference of effect direction
- Replicability Statistical Methods: Survey by Bogomolov and Heller (2023)
- Partial Conjunction Hypothesis Testing: General procedure by Benjamini and Heller (2008)
- Multivariate Normal Mean Testing: Related results by Sasabuchi (1980) and Berger (1989)
- P-value Combination Methods: Work by Owen (2009), Wang et al. (2022), and others
- When r > (n+1)/2, the factor-of-two correction can be safely omitted
- For smaller r values, correction is typically necessary
- Boundary cases require specific analysis
- Results primarily apply to the Bonferroni combination method
- Assumes independence between studies and normal distribution of effect estimates
- Extensions to other combination functions remain to be explored
- Extension to other combination functions such as Šidák, Simes, and Fisher
- Applications in multiple hypothesis testing scenarios
- Generalization to non-normal distribution cases
- Theoretical Rigor: Provides complete mathematical proofs and counterexamples
- Practical Value: Offers clear guidance principles for statistical practice
- Clear Writing: Logical structure and accurate mathematical exposition
- Important Problem: Addresses practical needs in replicability research
- Limited Scope: Primarily applicable to Bonferroni method and normal assumptions
- Boundary Cases: Treatment of critical regions is incomplete
- Practical Application Guidance: Lacks validation with more real data
- Theoretical Contribution: Provides new theoretical results for replicability statistics
- Practical Value: Can improve the power of statistical tests
- Extensibility: Lays foundation for development of related methods
- Meta-analysis and systematic reviews
- Multi-center clinical trials
- Cross-laboratory research verification
- Large-scale genetic association studies
- Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics.
- Bogomolov, M. and Heller, R. (2023). Replicability across multiple studies. Statistical Science.
- Owen, A. B. (2009). Karl Pearson's meta-analysis revisited. Annals of Statistics.
- Sasabuchi, S. (1980). A test of a multivariate normal mean with composite hypotheses. Biometrika.
This paper makes important theoretical contributions to replicability statistics. Through rigorous mathematical analysis, it determines when traditional conservative corrections can be omitted, thereby improving the power of statistical tests. Despite some limitations, its theoretical value and practical significance are substantial.