2025-11-23T05:46:16.390387

Directional replicability: when can the factor of two be omitted

DjordjiloviÄ, Sofer, Dreyfuss

Directional replicability addresses the question of whether an effect studied across $n$ independent studies is present with the same direction in at least $r$ of them, for $r \geq 2$. When the expected direction of the effect is not specified in advance, the state of the art recommends assessing replicability separately by combining one-sided $p$-values for both directions (left and right), and then doubling the smaller of the two resulting combined $p$-values to account for multiple testing. In this work, we show that this multiplicative correction is not always necessary, and give conditions under which it can be safely omitted.

academic

Directional Replicability: When Can the Factor of Two Be Omitted

Basic Information

Paper ID: 2510.11273
Title: Directional replicability: when can the factor of two be omitted
Authors: Vera Djordjilović (University of Venice), Tamar Sofer (Harvard Medical School), Jonathan M. Dreyfuss (Harvard Medical School)
Classification: stat.ME (Statistical Methodology)
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.11273

Abstract

Directional replicability examines whether a certain effect exists in the same direction across at least r out of n independent studies (r ≥ 2). When the expected direction of the effect is not specified a priori, current practice recommends separately combining one-sided p-values for both directions to assess replicability, then multiplying the smaller of the two combined p-values by 2 to correct for multiple testing. This study demonstrates that this multiplicative correction is not always necessary and provides conditions under which this correction can be safely omitted.

Research Background and Motivation

Problem to be Addressed: Statistical testing for assessing the consistency of effect directions across multiple independent studies, particularly determining when the traditional factor-of-two correction can be omitted.
Problem Significance:
- Low reproducibility of scientific findings is prevalent in medicine, economics, psychology, and other fields
- Formal statistical methods are needed to assess the replicability of research findings
- Directional replicability is more stringent than merely observing effect existence, requiring consistency in effect direction
Limitations of Existing Methods:
- Standard methods always apply a factor-of-two correction to the smaller combined p-value for multiple testing
- This correction may be overly conservative, reducing the power of the test
Research Motivation: Through theoretical analysis, determine when the factor-of-two correction can be safely omitted, thereby improving the power of statistical tests.

Core Contributions

Theoretical Results: Proves that when r > (n+1)/2, the factor-of-two correction can be safely omitted when using the Bonferroni method to combine p-values
Counterexample Construction: For smaller r values, counterexamples demonstrate that the correction factor is necessary
Boundary Conditions: Clearly identifies critical conditions where correction is and is not needed
Practical Guidance: Provides procedures for data-adaptive selection of r values
Extended Discussion: Explores possible extensions of results to other combination functions

Methodological Details

Task Definition

Let θ = (θ₁, ..., θₙ) ∈ ℝⁿ denote the vector of true effect sizes in n studies. Define:

n₊ = |{i : θᵢ > 0}|: number of positive effects
n₋ = |{i : θᵢ < 0}|: number of negative effects

Null Hypothesis for r out of n Directional Replicability: H_{r/n} : n₊ < r ∧ n₋ < r

Corresponding Alternative Hypothesis: K_{r/n} : n₊ ≥ r ∨ n₋ ≥ r

Model Architecture

Basic Setup:
- Assume independent normal estimators: Tᵢ ~ N(θᵢ, 1)
- One-sided p-values: pᵢ = 1 - Φ(Tᵢ), qᵢ = Φ(Tᵢ) = 1 - pᵢ
Bonferroni Partial Combined p-values:
- Positive direction: p⁺_{r/n} = (n - r + 1)p₍ᵣ₎
- Negative direction: p⁻_{r/n} = (n - r + 1)p₍ₙ₋ᵣ₊₁₎
Traditional Method: p_{r/n} = 2min{p⁻{r/n}, p⁺{r/n}}

Technical Innovation

Main Theorem (Theorem 1): When (n+1)/2 < r ≤ n, p_{r/n} = min{p⁻{r/n}, p⁺{r/n}} is a valid p-value for H_{r/n}.

Key Proof Strategy:

When 2r > n + 1, T₍ᵣ₎ ≥ T₍ₙ₋ᵣ₊₁₎, causing the two Type I error events to be disjoint
Type I error probability can be expressed as: c(θ) = Pr_θ(X ≥ r) + Pr_θ(Y ≥ r)
By analyzing partial derivatives, prove that c(θ) achieves its maximum on the boundary
The maximum value equals exactly α, requiring no additional correction

Experimental Setup

Numerical Verification

Set n = 20 studies
Consider two parameter configurations:
- "Consistent": θ⁺ = (∞,...,∞,0,...,0) (first r-1 positive infinity)
- "Inconsistent": θ* = (∞,...,∞,-∞,...,-∞,0,...,0) (r-1 positive infinity and r-1 negative infinity)

Evaluation Metrics

Type I error probability c(θ)
Nominal significance level α = 0.1

Experimental Results

Main Results

Numerical Results Shown in Figure 1:

For r ∈ {2,...,7}: Type I error under inconsistent configuration exceeds that under consistent configuration and exceeds α
For r ∈ {8,9,10}: Type I error under both configurations falls below α
When r > 10, the condition r > (n+1)/2 is satisfied, returning to the setting of Theorem 1

Special Case Analysis

Proposition 1: For n = 3, r = 2, although not satisfying the condition of Theorem 1, p_{r/n} is still a valid p-value.

Proof Highlights:

By analyzing partial derivatives, prove that function c(θ) has no critical points in the feasible region
By limit analysis, prove that the supremum equals α

Experimental Findings

Sufficient but Not Necessary Condition: The condition r > (n+1)/2 given in Theorem 1 is sufficient but not necessary
Transition Region: There exists a transition region where correction may not be necessary but requires specific analysis
Type III Error Control: The proposed procedure can control Type III error, allowing post-hoc inference of effect direction

Replicability Statistical Methods: Survey by Bogomolov and Heller (2023)
Partial Conjunction Hypothesis Testing: General procedure by Benjamini and Heller (2008)
Multivariate Normal Mean Testing: Related results by Sasabuchi (1980) and Berger (1989)
P-value Combination Methods: Work by Owen (2009), Wang et al. (2022), and others

Conclusions and Discussion

Main Conclusions

When r > (n+1)/2, the factor-of-two correction can be safely omitted
For smaller r values, correction is typically necessary
Boundary cases require specific analysis

Limitations

Results primarily apply to the Bonferroni combination method
Assumes independence between studies and normal distribution of effect estimates
Extensions to other combination functions remain to be explored

Future Directions

Extension to other combination functions such as Šidák, Simes, and Fisher
Applications in multiple hypothesis testing scenarios
Generalization to non-normal distribution cases

In-Depth Evaluation

Strengths

Theoretical Rigor: Provides complete mathematical proofs and counterexamples
Practical Value: Offers clear guidance principles for statistical practice
Clear Writing: Logical structure and accurate mathematical exposition
Important Problem: Addresses practical needs in replicability research

Weaknesses

Limited Scope: Primarily applicable to Bonferroni method and normal assumptions
Boundary Cases: Treatment of critical regions is incomplete
Practical Application Guidance: Lacks validation with more real data

Impact

Theoretical Contribution: Provides new theoretical results for replicability statistics
Practical Value: Can improve the power of statistical tests
Extensibility: Lays foundation for development of related methods

Applicable Scenarios

Meta-analysis and systematic reviews
Multi-center clinical trials
Cross-laboratory research verification
Large-scale genetic association studies

References

Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics.
Bogomolov, M. and Heller, R. (2023). Replicability across multiple studies. Statistical Science.
Owen, A. B. (2009). Karl Pearson's meta-analysis revisited. Annals of Statistics.
Sasabuchi, S. (1980). A test of a multivariate normal mean with composite hypotheses. Biometrika.

This paper makes important theoretical contributions to replicability statistics. Through rigorous mathematical analysis, it determines when traditional conservative corrections can be omitted, thereby improving the power of statistical tests. Despite some limitations, its theoretical value and practical significance are substantial.