2025-11-10T02:53:00.054606

Cumulants, Moments and Selection: The Connection Between Evolution and Statistics

Ahmed, Goodgold, Kothari et al.
Cumulants and moments are closely related to the basic mathematics of continuous and discrete selection (respectively). These relationships generalize Fisher's fundamental theorem of natural selection and also make clear some of its limitation. The relationship between cumulants and continuous selection is especially intuitive and also provides an alternative way to understand cumulants. We show that a similarly simple relationship exists between moments and discrete selection. In more complex scenarios, we show that thinking of selection over discrete generations has significant advantages. For a simple mutation model, we find exact solutions for the equilibrium moments of the fitness distribution. These solutions are surprisingly simple and have some interesting implications including: a necessary and sufficient condition for mutation selection balance, a very simple formula for mean fitness and the fact that the shape of the equilibrium fitness distribution is determined solely by mutation (whereas the scale is determined by the starting fitness distribution).
academic

Cumulants, Moments and Selection: The Connection Between Evolution and Statistics

Basic Information

  • Paper ID: 2510.14917
  • Title: Cumulants, Moments and Selection: The Connection Between Evolution and Statistics
  • Authors: Hasan Ahmed, Deena Goodgold, Khushali Kothari, Rustom Antia (Emory University)
  • Classification: q-bio.PE (Population and Evolution)
  • Corresponding Author: Rustom Antia (rantia@emory.edu)
  • Paper Link: https://arxiv.org/abs/2510.14917

Abstract

This paper reveals the intimate relationship between cumulants and moments in statistics and the mathematical foundations of continuous/discrete selection in evolution. These relationships generalize Fisher's fundamental theorem of natural selection and elucidate its limitations. The connection between cumulants and continuous selection is particularly intuitive, providing new perspectives for understanding cumulants. The authors demonstrate that moments exhibit analogous simple relationships with discrete selection. In complex scenarios, discrete generational selection thinking offers significant advantages. For simple mutation models, the authors derive exact solutions for the equilibrium moments of fitness distributions, which have important implications: providing necessary and sufficient conditions for mutation-selection balance, simple formulas for mean fitness, and demonstrating that the shape of the equilibrium fitness distribution is determined entirely by mutation (while the scale is determined by the initial fitness distribution).

Research Background and Motivation

Core Problem

This research aims to establish mathematical connections between the concepts of cumulants/moments in statistics and the concept of selection in evolutionary biology, connections that are crucial for understanding both selection mechanisms and statistical concepts.

Significance

  1. Interdisciplinary Value: The relationship applies not only to evolutionary biology but also to epidemiology (susceptible depletion), economics, and immune memory decay
  2. Theoretical Advancement: Generalizes Fisher's fundamental theorem of natural selection and reveals its limitations
  3. Practical Value: Provides precise mathematical tools for complex evolutionary scenarios

Existing Limitations

  1. Fisher's theorem applies only to instantaneous changes and is unsuitable for describing biological evolution that inherently involves discrete generations
  2. The continuous growth rate r creates mathematical difficulties in extreme cases (r→-∞ when R→0)
  3. Lack of simple exact solutions for complex scenarios such as mutation-selection balance

Core Contributions

  1. Established exact relationships between cumulants and continuous selection: Proved that the rate of change of the i-th cumulant of fitness equals the (i+1)-th cumulant
  2. Discovered corresponding relationships between moments and discrete selection: Derived exact formulas for moment evolution under discrete selection
  3. Generalized Fisher's fundamental theorem: Clarified its applicability conditions and limitations
  4. Provided exact solutions for mutation-selection models: Obtained simple closed-form solutions for equilibrium moments
  5. Revealed structural properties of fitness distributions: Proved that the shape of the equilibrium distribution is determined solely by mutation, with scale determined by initial conditions

Methodology Details

Theoretical Framework

Continuous Selection and Cumulants (r-model)

When fitness is measured by the Malthusian parameter r (exponential growth rate), cumulants and selection exhibit an intuitive relationship:

dKi(r)dt=Ki+1(r)\frac{dK_i(r)}{dt} = K_{i+1}(r)

where Ki(r)K_i(r) is the i-th cumulant of the fitness distribution. This implies:

  • Mean fitness growth rate = fitness variance
  • Variance change rate = 3rd cumulant (unstandardized skewness)
  • Skewness change rate = 4th cumulant (unstandardized excess kurtosis)

Discrete Selection and Moments (R-model)

When fitness is measured by the multiplication factor R (R=erΔtR = e^{r \cdot \Delta t}), moment evolution follows:

Mi,t+1(R)=Mi+1,t(R)M1,t(R)M_{i,t+1}(R) = \frac{M_{i+1,t}(R)}{M_{1,t}(R)}

where Mi,t(R)M_{i,t}(R) is the i-th raw moment of the fitness distribution at time t.

Mutation-Selection Model

Model Setup

Offspring fitness is determined by the following probability model:

  • r-model: ri=rixyr_i = r_i^* - x \cdot y
  • R-model: Ri=RiexyR_i = R_i^* \cdot e^{-x \cdot y}

where x is a binomial random variable (whether a deleterious mutation occurs) and y is the mutation effect size.

Exact Equilibrium Solutions

For the R-model, equilibrium moments have surprisingly simple forms:

Mean Fitness: M1(R)=max(R)pM_1(R) = \max(R) \cdot p

Higher-order Moments: Mi(R)=max(R)ipij=1i1Mj(exy)M_i(R) = \frac{\max(R)^i \cdot p^i}{\prod_{j=1}^{i-1} M_j(e^{-x \cdot y})}

where p is the probability of no deleterious mutation and max(R)\max(R) is the maximum fitness in the initial population.

Key Insights

  1. Mutation-selection balance condition: p>0p > 0 is absolutely necessary
  2. Distribution structure: The shape of the equilibrium distribution is determined entirely by the mutation effect distribution; max(R)\max(R) serves only as a scale parameter
  3. Coefficient of variation: CV(R)=M1(exy)1CV(R) = \sqrt{M_1(e^{-x \cdot y}) - 1}

Experimental Setup

Simulation Parameters

The authors conducted detailed simulations based on influenza virus parameters:

  • Population Size: 1 million individuals, 4000 generations
  • Mutation Rate: 0.2 (based on influenza mutation rate)
  • Mutation Effects: Gamma distribution (α=1, β=2.85)
  • Maintenance Mechanism: Population doubled when below 500,000

Comparative Species Parameters

The study compared mutation patterns across three species:

  1. E. coli: λ=0.001, M1(ez)=0.969M_1(e^{-z})=0.969
  2. Humans: λ=2.1, M1(ez)=0.991M_1(e^{-z})=0.991
  3. Influenza A: λ=0.223, M1(ez)=0.761M_1(e^{-z})=0.761

Experimental Results

Main Findings

Superiority of the R-model

The R-model's theoretical predictions match simulation results perfectly:

StatisticSimulationTheory
Mean0.8000.8
Variance0.03510.0351
Unstandardized Skewness-0.00757-0.00757
Unstandardized Excess Kurtosis0.0009520.000951

Limitations of the r-model

The r-model's equilibrium condition dKi(r)dtKi(xy)\frac{dK_i(r)}{dt} \approx -K_i(-x \cdot y) only approximately holds, with significant discrepancies between theory and simulation.

Cross-species Comparison

Different species exhibit markedly different mutation patterns:

  • Influenza: p=0.8p=0.8, reflecting the trade-off between replication accuracy and speed
  • E. coli: p1p≈1, high-fidelity replication
  • Humans: Multicellularity substantially reduces the p value

Limitations of Fisher's Theorem

Fisher's theorem strictly holds only under the following conditions:

  1. Fitness is measured in r and instantaneous changes are considered
  2. When measured in R, only when parental mean fitness = 1 or variance = 0

Theoretical Foundations

  1. Hansen (1992): First noted the relationship between cumulants and selection
  2. Gerrish & Sniegowski (2012): Extended related theory
  3. Haldane's Load Theory: Provided the basis for deriving the first two moments

Application Domains

The theoretical framework has been applied to:

  • Vaccine efficacy heterogeneity studies
  • Economic evolutionary theory
  • Immune memory dynamics
  • Cellular lineage selection measurement

Conclusions and Discussion

Main Conclusions

  1. Statistical-Evolutionary Connection: Established exact mathematical relationships between cumulants/moments and selection processes
  2. Discrete Advantages: The R-model is more applicable than the r-model for handling complex scenarios
  3. Equilibrium Structure: Under mutation-selection balance, distribution shape is determined by mutation while scale is determined by initial conditions
  4. Practical Formulas: Provided simple formulas for calculating mean fitness and coefficient of variation

Limitations

  1. Genetic Fitness: The study focuses on genetic fitness rather than actual offspring numbers
  2. Simplified Assumptions: Does not consider beneficial mutations, short-term selection, and other complex factors
  3. Distribution Derivation: Only moments are obtained; exact probability distributions are not derived
  4. Extreme Cases: Does not address theoretical cases where max(R)\max(R) is unbounded and p=0p=0

Future Directions

  1. Quantify deviations of complex systems from theoretical formulas through controlled experiments and simulations
  2. Derive exact probability distributions from moments
  3. Explore the impact of recombination on the theoretical framework
  4. Investigate beneficial mutations and frequency-dependent selection

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First systematic establishment of a bridge between statistical concepts and evolutionary theory
  2. Mathematical Rigor: Provides exact mathematical derivations and proofs
  3. Practical Value: Formulas are simple and practical, easy to apply
  4. Interdisciplinary Significance: Provides a unified theoretical framework for multiple fields
  5. Experimental Validation: Simulation results perfectly verify theoretical predictions

Weaknesses

  1. Biological Realism: Some assumptions (such as constant growth rate) are not sufficiently realistic biologically
  2. Application Scope: Theory primarily applies to simple mutation-selection scenarios
  3. Distribution Completeness: Cannot fully determine probability distributions from moments
  4. Complex Scenario Handling: Insufficient consideration of epistasis, frequency-dependent selection, and other complexities

Impact

  1. Theoretical Contribution: Provides new mathematical tools for evolutionary theory
  2. Methodological Value: The R-model approach may become a standard tool for studying discrete evolutionary processes
  3. Application Prospects: Has direct application value in viral evolution, drug resistance research, and related fields
  4. Educational Value: Provides intuitive biological explanations for understanding cumulants and moments

Applicable Scenarios

  1. Viral Evolution: Particularly suitable for studying rapid evolution of RNA viruses
  2. Drug Resistance Research: Can be used to predict the spread of drug-resistant mutations
  3. Synthetic Biology: Guides the design of artificial evolution systems
  4. Epidemiology: Analyzes dynamic changes in pathogen fitness distributions

References

Key references include:

  1. Hansen, T.F. (1992). Selection in asexual populations: An extension of the fundamental theorem
  2. Gerrish, P.J. & Sniegowski, P.D. (2012). Real time forecasting of near-future evolution
  3. Galeota-Sprung, B. et al. (2020). Mutational Load and the Functional Fraction of the Human Genome
  4. Elena, S.F. et al. (1998). Distribution of fitness effects caused by random insertion mutations in Escherichia coli

By establishing a mathematical bridge between statistics and evolutionary biology, this paper not only advances theoretical evolutionary biology but also provides new perspectives for understanding statistical concepts. The proposed R-model framework demonstrates significant advantages in addressing discrete generational evolution problems and possesses important theoretical value and practical application prospects.