2025-11-23T05:46:16.390387

Directional replicability: when can the factor of two be omitted

Djordjilović, Sofer, Dreyfuss
Directional replicability addresses the question of whether an effect studied across $n$ independent studies is present with the same direction in at least $r$ of them, for $r \geq 2$. When the expected direction of the effect is not specified in advance, the state of the art recommends assessing replicability separately by combining one-sided $p$-values for both directions (left and right), and then doubling the smaller of the two resulting combined $p$-values to account for multiple testing. In this work, we show that this multiplicative correction is not always necessary, and give conditions under which it can be safely omitted.
academic

Directional Replicability: When Can the Factor of Two Be Omitted

Basic Information

  • Paper ID: 2510.11273
  • Title: Directional replicability: when can the factor of two be omitted
  • Authors: Vera Djordjilović (University of Venice), Tamar Sofer (Harvard Medical School), Jonathan M. Dreyfuss (Harvard Medical School)
  • Classification: stat.ME (Statistical Methodology)
  • Publication Date: October 13, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.11273

Abstract

Directional replicability examines whether a certain effect exists in the same direction across at least r out of n independent studies (r ≥ 2). When the expected direction of the effect is not specified a priori, current practice recommends separately combining one-sided p-values for both directions to assess replicability, then multiplying the smaller of the two combined p-values by 2 to correct for multiple testing. This study demonstrates that this multiplicative correction is not always necessary and provides conditions under which this correction can be safely omitted.

Research Background and Motivation

  1. Problem to be Addressed: Statistical testing for assessing the consistency of effect directions across multiple independent studies, particularly determining when the traditional factor-of-two correction can be omitted.
  2. Problem Significance:
    • Low reproducibility of scientific findings is prevalent in medicine, economics, psychology, and other fields
    • Formal statistical methods are needed to assess the replicability of research findings
    • Directional replicability is more stringent than merely observing effect existence, requiring consistency in effect direction
  3. Limitations of Existing Methods:
    • Standard methods always apply a factor-of-two correction to the smaller combined p-value for multiple testing
    • This correction may be overly conservative, reducing the power of the test
  4. Research Motivation: Through theoretical analysis, determine when the factor-of-two correction can be safely omitted, thereby improving the power of statistical tests.

Core Contributions

  1. Theoretical Results: Proves that when r > (n+1)/2, the factor-of-two correction can be safely omitted when using the Bonferroni method to combine p-values
  2. Counterexample Construction: For smaller r values, counterexamples demonstrate that the correction factor is necessary
  3. Boundary Conditions: Clearly identifies critical conditions where correction is and is not needed
  4. Practical Guidance: Provides procedures for data-adaptive selection of r values
  5. Extended Discussion: Explores possible extensions of results to other combination functions

Methodological Details

Task Definition

Let θ = (θ₁, ..., θₙ) ∈ ℝⁿ denote the vector of true effect sizes in n studies. Define:

  • n₊ = |{i : θᵢ > 0}|: number of positive effects
  • n₋ = |{i : θᵢ < 0}|: number of negative effects

Null Hypothesis for r out of n Directional Replicability: H_{r/n} : n₊ < r ∧ n₋ < r

Corresponding Alternative Hypothesis: K_{r/n} : n₊ ≥ r ∨ n₋ ≥ r

Model Architecture

  1. Basic Setup:
    • Assume independent normal estimators: Tᵢ ~ N(θᵢ, 1)
    • One-sided p-values: pᵢ = 1 - Φ(Tᵢ), qᵢ = Φ(Tᵢ) = 1 - pᵢ
  2. Bonferroni Partial Combined p-values:
    • Positive direction: p⁺_{r/n} = (n - r + 1)p₍ᵣ₎
    • Negative direction: p⁻_{r/n} = (n - r + 1)p₍ₙ₋ᵣ₊₁₎
  3. Traditional Method: p_{r/n} = 2min{p⁻{r/n}, p⁺{r/n}}

Technical Innovation

Main Theorem (Theorem 1): When (n+1)/2 < r ≤ n, p_{r/n} = min{p⁻{r/n}, p⁺{r/n}} is a valid p-value for H_{r/n}.

Key Proof Strategy:

  1. When 2r > n + 1, T₍ᵣ₎ ≥ T₍ₙ₋ᵣ₊₁₎, causing the two Type I error events to be disjoint
  2. Type I error probability can be expressed as: c(θ) = Pr_θ(X ≥ r) + Pr_θ(Y ≥ r)
  3. By analyzing partial derivatives, prove that c(θ) achieves its maximum on the boundary
  4. The maximum value equals exactly α, requiring no additional correction

Experimental Setup

Numerical Verification

  • Set n = 20 studies
  • Consider two parameter configurations:
    • "Consistent": θ⁺ = (∞,...,∞,0,...,0) (first r-1 positive infinity)
    • "Inconsistent": θ* = (∞,...,∞,-∞,...,-∞,0,...,0) (r-1 positive infinity and r-1 negative infinity)

Evaluation Metrics

  • Type I error probability c(θ)
  • Nominal significance level α = 0.1

Experimental Results

Main Results

Numerical Results Shown in Figure 1:

  • For r ∈ {2,...,7}: Type I error under inconsistent configuration exceeds that under consistent configuration and exceeds α
  • For r ∈ {8,9,10}: Type I error under both configurations falls below α
  • When r > 10, the condition r > (n+1)/2 is satisfied, returning to the setting of Theorem 1

Special Case Analysis

Proposition 1: For n = 3, r = 2, although not satisfying the condition of Theorem 1, p_{r/n} is still a valid p-value.

Proof Highlights:

  • By analyzing partial derivatives, prove that function c(θ) has no critical points in the feasible region
  • By limit analysis, prove that the supremum equals α

Experimental Findings

  1. Sufficient but Not Necessary Condition: The condition r > (n+1)/2 given in Theorem 1 is sufficient but not necessary
  2. Transition Region: There exists a transition region where correction may not be necessary but requires specific analysis
  3. Type III Error Control: The proposed procedure can control Type III error, allowing post-hoc inference of effect direction
  1. Replicability Statistical Methods: Survey by Bogomolov and Heller (2023)
  2. Partial Conjunction Hypothesis Testing: General procedure by Benjamini and Heller (2008)
  3. Multivariate Normal Mean Testing: Related results by Sasabuchi (1980) and Berger (1989)
  4. P-value Combination Methods: Work by Owen (2009), Wang et al. (2022), and others

Conclusions and Discussion

Main Conclusions

  1. When r > (n+1)/2, the factor-of-two correction can be safely omitted
  2. For smaller r values, correction is typically necessary
  3. Boundary cases require specific analysis

Limitations

  1. Results primarily apply to the Bonferroni combination method
  2. Assumes independence between studies and normal distribution of effect estimates
  3. Extensions to other combination functions remain to be explored

Future Directions

  1. Extension to other combination functions such as Šidák, Simes, and Fisher
  2. Applications in multiple hypothesis testing scenarios
  3. Generalization to non-normal distribution cases

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Provides complete mathematical proofs and counterexamples
  2. Practical Value: Offers clear guidance principles for statistical practice
  3. Clear Writing: Logical structure and accurate mathematical exposition
  4. Important Problem: Addresses practical needs in replicability research

Weaknesses

  1. Limited Scope: Primarily applicable to Bonferroni method and normal assumptions
  2. Boundary Cases: Treatment of critical regions is incomplete
  3. Practical Application Guidance: Lacks validation with more real data

Impact

  1. Theoretical Contribution: Provides new theoretical results for replicability statistics
  2. Practical Value: Can improve the power of statistical tests
  3. Extensibility: Lays foundation for development of related methods

Applicable Scenarios

  • Meta-analysis and systematic reviews
  • Multi-center clinical trials
  • Cross-laboratory research verification
  • Large-scale genetic association studies

References

  1. Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics.
  2. Bogomolov, M. and Heller, R. (2023). Replicability across multiple studies. Statistical Science.
  3. Owen, A. B. (2009). Karl Pearson's meta-analysis revisited. Annals of Statistics.
  4. Sasabuchi, S. (1980). A test of a multivariate normal mean with composite hypotheses. Biometrika.

This paper makes important theoretical contributions to replicability statistics. Through rigorous mathematical analysis, it determines when traditional conservative corrections can be omitted, thereby improving the power of statistical tests. Despite some limitations, its theoretical value and practical significance are substantial.