2025-11-15T08:13:11.214644

Most claimed statistical findings in cross-sectional return predictability are likely true

Chen

The false discovery rate (FDR) measures the share of false positives in a set of statistical tests. I develop simple and intuitive bounds on the FDR in cross-sectional predictability publications. The simplest bound requires just a few lines of math and finds $\text{FDR} \le 25\%$ based on summary statistics in eight out of nine previous studies. A more refined bound finds $\text{FDR} \le 9\%$. The FDR is small because randomly selecting accounting ratios produces statistically significant predictability far more often than would occur if there were no predictability. The bounds also reconcile the disparate FDR estimates in the literature.

academic

Most claimed statistical findings in cross-sectional return predictability are likely true

Basic Information

Paper ID: 2206.15365
Title: Most claimed statistical findings in cross-sectional return predictability are likely true
Author: Andrew Y. Chen (Federal Reserve Board)
Classification: q-fin.GN (Quantitative Finance - General Finance)
Publication Date: October 2025 (First released on SSRN: August 27, 2021)
Paper Link: https://arxiv.org/abs/2206.15365

Abstract

The false discovery rate (FDR) measures the proportion of false positives in statistical testing. This paper develops simple and intuitive FDR bounds for cross-sectional return predictability research. The simplest bound requires only a few lines of mathematical calculation and, based on summary statistics from eight of nine prior studies, finds FDR ≤ 25%. More refined bounds find FDR ≤ 9%. The small FDR arises because randomly selected accounting ratios produce statistically significant predictability at frequencies far exceeding those expected under the null hypothesis of no predictability. These bounds also reconcile disagreements between different FDR estimates in the literature.

Research Background and Motivation

Problem Background

Researchers have discovered hundreds of cross-sectional stock return predictors, a richness that raises concerns about multiple testing problems. Intuitively, if researchers conduct many tests, some tests may be statistically significant purely by chance even under the null hypothesis of no predictability.

Core Issues

Multiple Testing Problem: Large numbers of factor discoveries may lead to false positive results
FDR Estimation Disagreement: Existing literature exhibits enormous variation in FDR estimates, ranging from nearly 0% to over 45%
Publication Bias: Statistically significant results are more likely to be published, affecting true FDR estimates
Methodological Controversy: Different research teams using different methods reach drastically different conclusions

Research Importance

Accurately estimating FDR is crucial for understanding the credibility of the financial anomalies literature, directly affecting investment strategy formulation and academic research direction.

Core Contributions

Simple and Intuitive FDR Bounds: Proposes the "Easy Bound" method, requiring only a few lines of mathematical calculation to estimate the FDR upper bound
Visual Bound Method: Develops "Visual Bound," providing tighter FDR bounds through histogram decomposition
Literature Reconciliation: Unifies explanations for vastly different FDR estimates in existing literature, finding that disagreements stem primarily from interpretation differences rather than data differences
Empirical Findings: Demonstrates that the probability of randomly selected accounting ratios producing significant predictability far exceeds theoretical expectations, providing empirical support for small FDR

Methodology Details

Task Definition

Define the predictive ability of cross-sectional signal i through $\bar{r}_i$ , typically obtained by constructing a long-short portfolio based on i and calculating the sample mean return. The null hypothesis is $E(\bar{r}_i) = 0$ .

Core Framework

1. Basic Setup

$t_i \equiv \bar{r}_i / SE_i$ is the t-statistic
Under the null hypothesis: $t_i | null_i \sim Normal(0,1)$
Discovery definition: $|t_i| > 2$ (corresponding to 5% significance level)
FDR definition: $FDR_{|t|>2} \equiv Pr(null_i | |t_i| > 2)$

2. Easy Bound Method

Applying Bayes' rule yields: $FDR_{|t|>2} = \frac{Pr(|t_i| > 2|null_i) Pr(null_i)}{Pr(|t_i| > 2)} \leq \frac{5\%}{Pr(|t_i| > 2)}$

This bound is intuitively straightforward: if the tail probability under the null hypothesis (numerator) cannot explain the observed tail probability (denominator), then FDR must be small.

3. Visual Bound Method

Tightens the bound by estimating $Pr(null_i)$ from data: $Pr(|t_i| < 0.5) \geq (0.38)Pr(null_i)$

Combining yields a tighter bound: $FDR_{|t|>2} \leq \left[\frac{5\%}{Pr(|t_i| > 2)}\right]\left[\frac{Pr(|t_i| < 0.5)}{0.38}\right]$

Technical Innovations

1. Addressing Publication Bias

Uses data mining studies as worst-case scenarios
Estimates the distribution of unpublished results through conservative extrapolation
Avoids direct dependence on published literature statistics

2. Histogram Decomposition Method

Decomposes the t-statistic histogram into null and alternative components: $Pr(|t_i| \in b) = Pr(|t_i| \in b | null_i)Pr(null_i) + Pr(|t_i| \in b | alt_i)Pr(alt_i)$

Estimates the FDR upper bound by constraining the null component to not exceed the data component.

3. Algorithm 1: Visual Bound Estimation

Plot the histogram of $|t_i|$ for data mining signals
Plot the maximum null distribution histogram that still fits the data interior
Draw a vertical line at 2.0; the ratio of null area to data area to the right of this line estimates the FDR bound

Experimental Setup

Datasets

Data Mining Studies:
- Yan and Zheng (2017): 18,000 accounting ratios
- Chordia, Goyal, and Saretto (2020): approximately 200 accounting variables
- Chen, Lopez-Lira, and Zimmermann (2025): 29,000 signals
Meta-Research Data:
- Green, Hand, Zhang (2013)
- Chen, Zimmermann (2020): 77 published predictive factors
- Harvey, Liu, Zhu (2016)
- McLean, Pontiff (2016)
- Jensen, Kelly, Pedersen (2021)
- Jacobs, Muller (2020)

Evaluation Metrics

FDR Bounds: Upper bound estimates of false discovery rate
Significance Proportion: Proportion of signals with $|t_i| > 2$
Small t-statistic Proportion: Proportion of signals with $|t_i| < 0.5$

Implementation Details

Uses equal-weighted and value-weighted portfolios
Considers different factor model adjustments (CAPM, FF3, FF3+momentum)
Employs Fama-French clustered bootstrap for standard error calculation

Experimental Results

Main Results

1. Easy Bound Results

Based on eight of nine studies, FDR ≤ 25%:

At least 20% of accounting ratios in data mining studies produce $|t_i| > 2$
Applying the formula yields: $FDR_{|t|>2} \leq 5\%/0.20 = 25\%$

2. Visual Bound Results

More precise estimates using CLZ data:

Of 29,000 signals, 9,700 satisfy $|t_i| > 2$ , and 6,300 satisfy $|t_i| < 0.5$
Yields: $FDR_{|t|>2} \leq 8.5\%$ , meaning at least 91.5% of findings are true

3. Results for Different Specifications

Weighting	Factor Adjustment	FDR Upper Bound	Significance Proportion
Equal-weighted	Raw returns	8.6%	32.7%
Equal-weighted	FF3	7.3%	34.9%
Value-weighted	CAPM	19.0%	17.9%
Value-weighted	FF3+momentum	41.7%	10.5%

Ablation Studies

Weighting Impact: Value-weighting significantly reduces significance proportion and increases FDR bounds
Factor Adjustment Impact: FF3+momentum adjustment has the largest effect on value-weighted portfolios
Dataset Robustness: Data mining results from three independent research teams are consistent

Literature Reconciliation Analysis

Harvey, Liu, Zhu (2016): Reinterprets findings to show FDR of only 12%, contrary to the original claim that "most findings are false"
Harvey and Liu (2020): The 0.1% of "true" strategies actually corresponds to selecting the most extreme value-weighted FF3+momentum specification
Chordia, Goyal, Saretto (2020): The 45% FDR estimate stems from ignoring information about small t-statistics in calibration

FDR Methodology Literature

Benjamini and Hochberg (1995): Classical FDR control methods
Storey (2002): Direct FDR estimation methods
Sorić (1989): Earliest FDR concepts

Financial Anomalies Literature

Green, Hand, Zhang (2013): Survey of cross-sectional return prediction
McLean and Pontiff (2016): Out-of-sample decay studies
Chen and Zimmermann (2022): Open-source cross-sectional asset pricing

Multiple Testing Applications in Finance

Harvey, Liu, Zhu (2016): Multiple testing problems in financial economics
Chen (2024): Discussion on whether t-statistic thresholds need to be raised

Conclusions and Discussion

Main Conclusions

Small FDR: At least 75% of claimed findings in cross-sectional predictability literature are true (FDR ≤ 25%)
More Precise Estimates: Considering information about small t-statistics, at least 91% of findings are true (FDR ≤ 9%)
Literature Reconciliation: Different FDR estimates stem primarily from interpretation differences rather than data or methodological differences
Empirical Support: High significance rates of random accounting ratios provide direct evidence for small FDR

Limitations

Statistical vs. Economic Significance: "True findings" refer only to statistical significance and non-zero alpha, not considering transaction costs, information costs, and other economic factors
Out-of-Sample Performance: Statistical truth does not equate to economic feasibility
Structural Changes: Insufficient consideration of market structure changes' impact on predictability
Data Mining Assumptions: Assumes the research process does not produce higher false discovery rates than random data mining

Future Directions

Economic Significance: Combine transaction costs and market frictions to assess economic value
Dynamic FDR: Consider time-varying predictability and market conditions
Causal Inference: Extend from predictive relationships to causal relationships
Machine Learning Methods: FDR control in high-dimensional settings

In-Depth Evaluation

Strengths

Method Simplicity: The Easy Bound method is extremely simple, requiring only summary statistics for calculation
Strong Intuitiveness: Visual Bound provides intuitive histogram decomposition explanations
Empirical Robustness: Based on consistent results from multiple independent research teams
Literature Contribution: Successfully reconciles long-standing disagreements in FDR estimates
Solid Theory: Based on fundamental probability principles with rigorous mathematical derivations

Weaknesses

Conservative Bounds: Bound methods may be overly conservative; true FDR may be smaller
Independence Assumptions: While claiming not to require independence, correlation still affects estimation precision
Data Dependence: Results depend on the quality and representativeness of specific data mining studies
Temporal Stability: Insufficient discussion of FDR changes over time
Economic Interpretation: Lacks in-depth discussion of the relationship between statistical and economic significance

Impact

Academic Value: Provides important statistical credibility assessment for financial anomalies literature
Practical Significance: Offers investors and regulators reference points for factor validity
Methodological Contribution: Simple and effective FDR bound methods can be generalized to other fields
Policy Impact: Influences understanding of financial market efficiency and anomaly persistence

Applicable Scenarios

Academic Research: Assessing statistical credibility of newly discovered factors
Investment Practice: Screening investment strategies with statistical support
Regulatory Policy: Evaluating systematic risk of market anomalies
Risk Management: Understanding the statistical foundation of factor exposures

References

This paper cites 22 important references covering core and cutting-edge research in FDR methodology, financial anomaly discovery, and multiple testing control, providing a solid theoretical foundation and empirical support for the research.

Overall Assessment: This is an important contribution to the field of financial econometrics, solving a long-standing controversial issue through elegant and simple methods, providing new perspectives and tools for understanding the statistical credibility of financial anomalies literature.