2025-11-15T08:13:11.214644

Most claimed statistical findings in cross-sectional return predictability are likely true

Chen
The false discovery rate (FDR) measures the share of false positives in a set of statistical tests. I develop simple and intuitive bounds on the FDR in cross-sectional predictability publications. The simplest bound requires just a few lines of math and finds $\text{FDR} \le 25\%$ based on summary statistics in eight out of nine previous studies. A more refined bound finds $\text{FDR} \le 9\%$. The FDR is small because randomly selecting accounting ratios produces statistically significant predictability far more often than would occur if there were no predictability. The bounds also reconcile the disparate FDR estimates in the literature.
academic

Most claimed statistical findings in cross-sectional return predictability are likely true

Basic Information

  • Paper ID: 2206.15365
  • Title: Most claimed statistical findings in cross-sectional return predictability are likely true
  • Author: Andrew Y. Chen (Federal Reserve Board)
  • Classification: q-fin.GN (Quantitative Finance - General Finance)
  • Publication Date: October 2025 (First released on SSRN: August 27, 2021)
  • Paper Link: https://arxiv.org/abs/2206.15365

Abstract

The false discovery rate (FDR) measures the proportion of false positives in statistical testing. This paper develops simple and intuitive FDR bounds for cross-sectional return predictability research. The simplest bound requires only a few lines of mathematical calculation and, based on summary statistics from eight of nine prior studies, finds FDR ≤ 25%. More refined bounds find FDR ≤ 9%. The small FDR arises because randomly selected accounting ratios produce statistically significant predictability at frequencies far exceeding those expected under the null hypothesis of no predictability. These bounds also reconcile disagreements between different FDR estimates in the literature.

Research Background and Motivation

Problem Background

Researchers have discovered hundreds of cross-sectional stock return predictors, a richness that raises concerns about multiple testing problems. Intuitively, if researchers conduct many tests, some tests may be statistically significant purely by chance even under the null hypothesis of no predictability.

Core Issues

  1. Multiple Testing Problem: Large numbers of factor discoveries may lead to false positive results
  2. FDR Estimation Disagreement: Existing literature exhibits enormous variation in FDR estimates, ranging from nearly 0% to over 45%
  3. Publication Bias: Statistically significant results are more likely to be published, affecting true FDR estimates
  4. Methodological Controversy: Different research teams using different methods reach drastically different conclusions

Research Importance

Accurately estimating FDR is crucial for understanding the credibility of the financial anomalies literature, directly affecting investment strategy formulation and academic research direction.

Core Contributions

  1. Simple and Intuitive FDR Bounds: Proposes the "Easy Bound" method, requiring only a few lines of mathematical calculation to estimate the FDR upper bound
  2. Visual Bound Method: Develops "Visual Bound," providing tighter FDR bounds through histogram decomposition
  3. Literature Reconciliation: Unifies explanations for vastly different FDR estimates in existing literature, finding that disagreements stem primarily from interpretation differences rather than data differences
  4. Empirical Findings: Demonstrates that the probability of randomly selected accounting ratios producing significant predictability far exceeds theoretical expectations, providing empirical support for small FDR

Methodology Details

Task Definition

Define the predictive ability of cross-sectional signal i through rˉi\bar{r}_i, typically obtained by constructing a long-short portfolio based on i and calculating the sample mean return. The null hypothesis is E(rˉi)=0E(\bar{r}_i) = 0.

Core Framework

1. Basic Setup

  • tirˉi/SEit_i \equiv \bar{r}_i / SE_i is the t-statistic
  • Under the null hypothesis: tinulliNormal(0,1)t_i | null_i \sim Normal(0,1)
  • Discovery definition: ti>2|t_i| > 2 (corresponding to 5% significance level)
  • FDR definition: FDRt>2Pr(nulliti>2)FDR_{|t|>2} \equiv Pr(null_i | |t_i| > 2)

2. Easy Bound Method

Applying Bayes' rule yields: FDRt>2=Pr(ti>2nulli)Pr(nulli)Pr(ti>2)5%Pr(ti>2)FDR_{|t|>2} = \frac{Pr(|t_i| > 2|null_i) Pr(null_i)}{Pr(|t_i| > 2)} \leq \frac{5\%}{Pr(|t_i| > 2)}

This bound is intuitively straightforward: if the tail probability under the null hypothesis (numerator) cannot explain the observed tail probability (denominator), then FDR must be small.

3. Visual Bound Method

Tightens the bound by estimating Pr(nulli)Pr(null_i) from data: Pr(ti<0.5)(0.38)Pr(nulli)Pr(|t_i| < 0.5) \geq (0.38)Pr(null_i)

Combining yields a tighter bound: FDRt>2[5%Pr(ti>2)][Pr(ti<0.5)0.38]FDR_{|t|>2} \leq \left[\frac{5\%}{Pr(|t_i| > 2)}\right]\left[\frac{Pr(|t_i| < 0.5)}{0.38}\right]

Technical Innovations

1. Addressing Publication Bias

  • Uses data mining studies as worst-case scenarios
  • Estimates the distribution of unpublished results through conservative extrapolation
  • Avoids direct dependence on published literature statistics

2. Histogram Decomposition Method

Decomposes the t-statistic histogram into null and alternative components: Pr(tib)=Pr(tibnulli)Pr(nulli)+Pr(tibalti)Pr(alti)Pr(|t_i| \in b) = Pr(|t_i| \in b | null_i)Pr(null_i) + Pr(|t_i| \in b | alt_i)Pr(alt_i)

Estimates the FDR upper bound by constraining the null component to not exceed the data component.

3. Algorithm 1: Visual Bound Estimation

  1. Plot the histogram of ti|t_i| for data mining signals
  2. Plot the maximum null distribution histogram that still fits the data interior
  3. Draw a vertical line at 2.0; the ratio of null area to data area to the right of this line estimates the FDR bound

Experimental Setup

Datasets

  1. Data Mining Studies:
    • Yan and Zheng (2017): 18,000 accounting ratios
    • Chordia, Goyal, and Saretto (2020): approximately 200 accounting variables
    • Chen, Lopez-Lira, and Zimmermann (2025): 29,000 signals
  2. Meta-Research Data:
    • Green, Hand, Zhang (2013)
    • Chen, Zimmermann (2020): 77 published predictive factors
    • Harvey, Liu, Zhu (2016)
    • McLean, Pontiff (2016)
    • Jensen, Kelly, Pedersen (2021)
    • Jacobs, Muller (2020)

Evaluation Metrics

  • FDR Bounds: Upper bound estimates of false discovery rate
  • Significance Proportion: Proportion of signals with ti>2|t_i| > 2
  • Small t-statistic Proportion: Proportion of signals with ti<0.5|t_i| < 0.5

Implementation Details

  • Uses equal-weighted and value-weighted portfolios
  • Considers different factor model adjustments (CAPM, FF3, FF3+momentum)
  • Employs Fama-French clustered bootstrap for standard error calculation

Experimental Results

Main Results

1. Easy Bound Results

Based on eight of nine studies, FDR ≤ 25%:

  • At least 20% of accounting ratios in data mining studies produce ti>2|t_i| > 2
  • Applying the formula yields: FDRt>25%/0.20=25%FDR_{|t|>2} \leq 5\%/0.20 = 25\%

2. Visual Bound Results

More precise estimates using CLZ data:

  • Of 29,000 signals, 9,700 satisfy ti>2|t_i| > 2, and 6,300 satisfy ti<0.5|t_i| < 0.5
  • Yields: FDRt>28.5%FDR_{|t|>2} \leq 8.5\%, meaning at least 91.5% of findings are true

3. Results for Different Specifications

WeightingFactor AdjustmentFDR Upper BoundSignificance Proportion
Equal-weightedRaw returns8.6%32.7%
Equal-weightedFF37.3%34.9%
Value-weightedCAPM19.0%17.9%
Value-weightedFF3+momentum41.7%10.5%

Ablation Studies

  1. Weighting Impact: Value-weighting significantly reduces significance proportion and increases FDR bounds
  2. Factor Adjustment Impact: FF3+momentum adjustment has the largest effect on value-weighted portfolios
  3. Dataset Robustness: Data mining results from three independent research teams are consistent

Literature Reconciliation Analysis

  1. Harvey, Liu, Zhu (2016): Reinterprets findings to show FDR of only 12%, contrary to the original claim that "most findings are false"
  2. Harvey and Liu (2020): The 0.1% of "true" strategies actually corresponds to selecting the most extreme value-weighted FF3+momentum specification
  3. Chordia, Goyal, Saretto (2020): The 45% FDR estimate stems from ignoring information about small t-statistics in calibration

FDR Methodology Literature

  • Benjamini and Hochberg (1995): Classical FDR control methods
  • Storey (2002): Direct FDR estimation methods
  • Sorić (1989): Earliest FDR concepts

Financial Anomalies Literature

  • Green, Hand, Zhang (2013): Survey of cross-sectional return prediction
  • McLean and Pontiff (2016): Out-of-sample decay studies
  • Chen and Zimmermann (2022): Open-source cross-sectional asset pricing

Multiple Testing Applications in Finance

  • Harvey, Liu, Zhu (2016): Multiple testing problems in financial economics
  • Chen (2024): Discussion on whether t-statistic thresholds need to be raised

Conclusions and Discussion

Main Conclusions

  1. Small FDR: At least 75% of claimed findings in cross-sectional predictability literature are true (FDR ≤ 25%)
  2. More Precise Estimates: Considering information about small t-statistics, at least 91% of findings are true (FDR ≤ 9%)
  3. Literature Reconciliation: Different FDR estimates stem primarily from interpretation differences rather than data or methodological differences
  4. Empirical Support: High significance rates of random accounting ratios provide direct evidence for small FDR

Limitations

  1. Statistical vs. Economic Significance: "True findings" refer only to statistical significance and non-zero alpha, not considering transaction costs, information costs, and other economic factors
  2. Out-of-Sample Performance: Statistical truth does not equate to economic feasibility
  3. Structural Changes: Insufficient consideration of market structure changes' impact on predictability
  4. Data Mining Assumptions: Assumes the research process does not produce higher false discovery rates than random data mining

Future Directions

  1. Economic Significance: Combine transaction costs and market frictions to assess economic value
  2. Dynamic FDR: Consider time-varying predictability and market conditions
  3. Causal Inference: Extend from predictive relationships to causal relationships
  4. Machine Learning Methods: FDR control in high-dimensional settings

In-Depth Evaluation

Strengths

  1. Method Simplicity: The Easy Bound method is extremely simple, requiring only summary statistics for calculation
  2. Strong Intuitiveness: Visual Bound provides intuitive histogram decomposition explanations
  3. Empirical Robustness: Based on consistent results from multiple independent research teams
  4. Literature Contribution: Successfully reconciles long-standing disagreements in FDR estimates
  5. Solid Theory: Based on fundamental probability principles with rigorous mathematical derivations

Weaknesses

  1. Conservative Bounds: Bound methods may be overly conservative; true FDR may be smaller
  2. Independence Assumptions: While claiming not to require independence, correlation still affects estimation precision
  3. Data Dependence: Results depend on the quality and representativeness of specific data mining studies
  4. Temporal Stability: Insufficient discussion of FDR changes over time
  5. Economic Interpretation: Lacks in-depth discussion of the relationship between statistical and economic significance

Impact

  1. Academic Value: Provides important statistical credibility assessment for financial anomalies literature
  2. Practical Significance: Offers investors and regulators reference points for factor validity
  3. Methodological Contribution: Simple and effective FDR bound methods can be generalized to other fields
  4. Policy Impact: Influences understanding of financial market efficiency and anomaly persistence

Applicable Scenarios

  1. Academic Research: Assessing statistical credibility of newly discovered factors
  2. Investment Practice: Screening investment strategies with statistical support
  3. Regulatory Policy: Evaluating systematic risk of market anomalies
  4. Risk Management: Understanding the statistical foundation of factor exposures

References

This paper cites 22 important references covering core and cutting-edge research in FDR methodology, financial anomaly discovery, and multiple testing control, providing a solid theoretical foundation and empirical support for the research.


Overall Assessment: This is an important contribution to the field of financial econometrics, solving a long-standing controversial issue through elegant and simple methods, providing new perspectives and tools for understanding the statistical credibility of financial anomalies literature.