The study of associations and their causal explanations is a central research activity whose methodology varies tremendously across fields. Even within specialized subfields, comparisons across textbooks and journals reveals that the basics are subject to considerable variation and controversy. This variation is often obscured by the singular viewpoints presented within textbooks and journal guidelines, which may be deceptively written as if the norms they adopt are unchallenged. Furthermore, human limitations and the vastness within fields imply that no one can have expertise across all subfields and that interpretations will be severely constrained by the limitations of studies of human populations.
The present chapter outlines an approach to statistical methods that attempts to recognize these problems from the start, rather than assume they are absent as in the claims of 'statistical significance' and 'confidence' ordinarily attached to statistical tests and interval estimates. It does so by grounding models and statistics in data description, and treating inferences from them as speculations based on assumptions that cannot be fully validated or checked using the analysis data.
Statistical Methods: Basic Concepts, Interpretations, and Cautions
- Paper ID: 2508.10168
- Title: Statistical methods: Basic concepts, interpretations, and cautions
- Author: Sander Greenland (Professor Emeritus, Department of Epidemiology and Statistics, UCLA)
- Classification: stat.ME math.ST stat.TH
- Publication Date: August 25, 2025
- Paper Type: Chapter in the Third Edition of the Epidemiology Handbook
- Paper Link: https://arxiv.org/abs/2508.10168
This paper addresses the application of statistical methods in association studies and causal interpretation, highlighting the enormous methodological variations across different fields and considerable disagreement even within specialized subfields. Traditional statistical methods assume ideal conditions (such as purely random sampling and fully randomized experiments), but these assumptions are often unmet in actual population studies. The author proposes a new interpretive framework for statistical methods, conceptualizing statistical inference as conjecture based on unverifiable assumptions rather than definitive conclusions, thereby avoiding misuse of concepts such as "statistical significance" and "confidence."
- Severe methodological disagreement: Significant discrepancies and controversies exist regarding fundamental statistical concepts across different fields, textbooks, and journals
- Idealized assumption conditions: Traditional statistical methods assume ideal random sampling or random allocation conditions that are difficult to satisfy in actual research
- Widespread misunderstandings: Surveys show that most users cannot correctly define or interpret P-values, significance tests, and confidence intervals
- Overconfidence problem: Statistical results are frequently misinterpreted as definitive answers rather than conjectures based on assumptions
- Provide a more realistic and cautious interpretive framework for statistical methods
- Reduce overconfidence and misunderstandings in statistical inference
- Reposition statistical methods as data description tools rather than authoritative arbiters of scientific inference
- Emphasize the importance of assumption verification and uncertainty assessment
- Redefine statistical inference: Reinterpret P-values as measures of data compatibility with assumed models, rather than probabilities of hypotheses
- Introduce compatibility interval concept: Replace "confidence interval" with "compatibility interval" to avoid misleading "confidence" terminology
- Introduce S-value (surprisal): Use binary surprisal values (-log₂(p)) as information measures, providing more intuitive P-value interpretation
- Emphasize assumption dependence: Systematically elucidate the sensitivity of statistical results to auxiliary assumptions and uncertainty
- Integrate multiple methodologies: Advocate for treating frequentist and Bayesian approaches as complementary perspectives for evidence synthesis
- Traditional definition: A model typically refers to an equation expressing the functional relationship between measured variables and other variables
- This paper's definition: Model M comprises the complete set of assumptions about data generation process behavior, including target hypothesis H and auxiliary assumptions A
Traditional P-value definition:
where T is the test statistic, t is the observed value, H is the target hypothesis, and A represents auxiliary assumptions.
Reinterpretation: The P-value represents the degree of data compatibility with the model, ranging from 0 (complete incompatibility) to 1 (complete compatibility).
The S-value is measured in information bits, providing more intuitive interpretation:
- S = 4.6 represents surprise equivalent to obtaining all heads in 5 coin flips
- S = 0 indicates no information; larger S values indicate greater incompatibility
For significance level α, the compatibility interval contains all parameter values satisfying p > α, avoiding the misleading "confidence" concept.
- Semantic transformation: Shift from decision-oriented to descriptive language
- Information-theoretic perspective: Introduce information theory concepts to quantify statistical evidence
- Assumption transparency: Explicitly distinguish between target and auxiliary hypotheses
- Multi-method integration: Treat different statistical schools as complementary perspectives
The author uses a hypothetical dataset on the relationship between cannabis use and mental health to demonstrate the methodology:
Data Structure:
- Sample size: 600 individuals (480 non-users, 120 cannabis users)
- Outcome variable: Mental illness diagnosis (binary)
- Observed association: Diagnosis rate 8.3% for users, 3.3% for non-users
Computational Results:
- Risk difference (RD) = 0.050 (5%)
- Risk ratio (RR) = 2.5
- Odds ratio (OR) = 2.6
- Pearson χ² = 5.79
- Approximate P-value = 0.016, exact P-value = 0.041
- Compatibility measure: P-value as an indicator of data-hypothesis compatibility
- Information content: S-value quantifies statistical evidence information
- Interval estimation: Compatibility interval provides parameter range estimates
- Hypothesis comparison: P-value function comparison across different hypothesis values
- H₀: OR = 1 exact P-value = 0.041 (S = 4.6 bits)
- H₁: OR = 2 exact P-value = 0.644 (S = 0.6 bits)
- 95% compatibility interval: 1.04, 6.36
Traditional interpretation: OR = 1 is "rejected" at α = 0.05 level; result is "statistically significant"
New framework interpretation:
- OR = 1 shows lower compatibility with data (p = 0.041)
- OR = 2 shows high compatibility with data (p = 0.644)
- OR = 6 is more compatible with data than OR = 1 (p = 0.070 > 0.041)
| Method | P-value | S-value | Interpretation |
|---|
| Pearson χ² | 0.016 | 5.97 | Approximate method |
| Fisher exact | 0.041 | 4.61 | Exact method |
| Wald approximation | Large deviation | — | Inaccurate for sparse data |
Through the cannabis use case, the author demonstrates:
- Assumption dependence: Results heavily depend on auxiliary assumptions (e.g., random sampling, no interference)
- Confounding factors: Age, medical history, other drug use, etc., may confound true associations
- Measurement error: Effects of self-reported use and diagnostic accuracy
- Selection bias: Selective participation may affect generalizability of results
- P-value origins: Traceable to early 18th century; Pearson (1900) and Fisher (1934) established theoretical foundations
- Significance concept: "Statistical significance" emerged in the 1880s
- Controversy history: Early criticism by Boring (1919); Pearson (1906) identified misunderstanding problems
The author cites extensive recent literature supporting statistical reform:
- Amrhein et al. (2019): Call to "retire" statistical significance
- McShane et al. (2019, 2024): Advocate moving beyond binary decisions
- Wasserstein et al. (2019): ASA statement on P-values
- Bayesian methods: Provide probability statements about parameters but depend on prior distributions
- Causal inference: Modern causal inference frameworks by Pearl, Hernán & Robins
- Multiple comparisons: Bonferroni adjustment and alternative methods
- Robust statistics: Computationally intensive methods such as bootstrap
- Statistical method limitations: Traditional methods based on strict assumptions often violated in practical applications
- Language importance: Terms like "significance" and "confidence" cause systematic misunderstandings
- Inferential caution: Statistical results should be viewed as conjectures based on assumptions rather than definitive conclusions
- Method integration: Different statistical methods should be used as complementary tools
- Reporting improvements:
- Provide P-value functions rather than single P-values
- Use compatibility intervals instead of confidence intervals
- Explicitly list critical assumptions
- Interpretive framework:
- Avoid binary "accept/reject" language
- Emphasize assumption dependence of results
- Consider practical significance alongside statistical significance
- Method selection:
- Use exact methods rather than large-sample approximations
- Conduct sensitivity analyses
- Integrate multiple evidence sources
- Learning curve: New framework requires fundamental reform in statistical education
- Computational complexity: Some recommended methods are computationally more complex
- Journal resistance: Existing publication conventions may hinder adoption
- Communication challenges: Explanation to non-statisticians becomes more difficult
- Educational reform: Statistical teaching requires fundamental reform starting from basic concepts
- Software development: Statistical software supporting new interpretive frameworks is needed
- Standard setting: Updates to academic journal and regulatory institution standards
- Interdisciplinary collaboration: Promote cooperation between statisticians and domain experts
- Theoretical depth: Provides profound philosophical reflection on statistical inference
- Strong practicality: Offers concrete methodological and interpretive recommendations
- Sufficient evidence: Extensive literature citations support arguments
- Clear writing: Complex concepts explained clearly with vivid examples
- S-value introduction: Novel information-theoretic perspective on P-value interpretation
- Compatibility framework: Systematic terminology and conceptual reform
- Multi-method integration: Unified perspective across different statistical schools
- Assumption stratification: Explicit distinction between target and auxiliary hypotheses
- Implementation challenges: Reforming existing statistical practice faces enormous resistance
- Computational burden: Some recommended methods increase computational complexity
- Transition difficulties: Coexistence of old and new frameworks may cause confusion
- Dissemination difficulty: Requires substantial educational and training investment
- Paradigm shift: May drive major reform in fundamental statistical concepts
- Cross-disciplinary influence: Affects all disciplines employing statistical methods
- Educational innovation: Promotes fundamental reform in statistical education
- Reduce misunderstandings: Helps reduce misinterpretation of statistical results
- Improve quality: Promotes more cautious and accurate scientific inference
- Policy-making: Improves quality of decisions based on statistical evidence
- Scientific research: All research fields based on statistical inference
- Medical research: Clinical trials and epidemiological studies
- Social sciences: Psychology, economics, and other empirical research
- Regulatory decision-making: Drug approval, policy evaluation, etc.
This paper cites numerous important references, including:
Classical Literature:
- Pearson, K. (1900). Early theoretical foundations of statistical testing
- Fisher, R.A. (1934). Foundational work in modern statistical inference theory
- Neyman, J. (1977). Frequentist statistical theory
Modern Criticism:
- Amrhein, V., et al. (2019). Statistical significance retirement movement
- Wasserstein, R.L., et al. (2019). ASA statement on P-values
- McShane, B.B., et al. (2019, 2024). Moving beyond binary statistical decisions
Methodological Development:
- Pearl, J. (2009). Causal inference theory
- Hernán, M.A., Robins, J.M. (2025). Modern epidemiological methods
- Gelman, A., et al. (2013). Bayesian data analysis
Summary: This is a theoretically and practically significant paper on statistical methodology. Drawing on deep statistical expertise and rich applied experience, the author systematically critiques problems in the traditional statistical inference framework and proposes more cautious and realistic alternatives. While implementation faces challenges, its principles hold important value for improving scientific research quality.