Reconstructing evolutionary histories and estimating the rate of evolution from molecular sequence data is of central importance in evolutionary biology and infectious disease research. We introduce a flexible Bayesian phylogenetic inference framework that accommodates changing evolutionary rates over time by modeling sequence character substitution processes as inhomogeneous continuous-time Markov chains (ICTMCs) acting along the unknown phylogeny, where the rate remains as an unknown, positive and integrable function of time. The integral of the rate function appears in the finite-time transition probabilities of the ICTMCs that must be efficiently computed for all branches of the phylogeny to evaluate the observed data likelihood. Circumventing computational challenges that arise from a fully nonparametric function, we successfully parameterize the rate function as piecewise constant with a large number of epochs that we call the polyepoch clock model. This makes the transition probability computation relatively inexpensive and continues to flexibly capture rate change over time. We employ a Gaussian Markov random field prior to achieve temporal smoothing of the estimated rate function. Hamiltonian Monte Carlo sampling enabled by scalable gradient evaluation under this model makes our framework computationally efficient. We assess the performance of the polyepoch clock model in recovering the true timescales and rates through simulations under two different evolutionary scenarios. We then apply the polyepoch clock model to examine the rates of West Nile virus, Dengue virus and influenza A/H3N2 evolution, and estimate the time-varying rate of SARS-CoV-2 spread in Europe in 2020.
- Paper ID: 2510.11982
- Title: Inhomogeneous continuous-time Markov chains to infer flexible time-varying evolutionary rates
- Authors: Pratyusa Datta (UCLA), Philippe Lemey (KU Leuven), Marc A. Suchard (UCLA)
- Classification: stat.ME (Statistics - Methodology), q-bio.PE (Quantitative Biology - Populations and Evolution)
- Publication Date: October 13, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.11982
This paper proposes a flexible Bayesian phylogenetic inference framework that accommodates time-varying evolutionary rates by modeling sequence character substitution processes as inhomogeneous continuous-time Markov chains (ICTMCs). The method parameterizes evolutionary rates as piecewise constant functions across numerous time periods (multi-period clock model), making transition probability calculations relatively inexpensive while flexibly capturing rate variations. Temporal smoothing of the estimated rate function is achieved through Gaussian Markov random field priors, and computational efficiency is enhanced via Hamiltonian Monte Carlo sampling with scalable gradient evaluation.
The central problem in phylogenetics is reconstructing evolutionary history from molecular sequence data and estimating evolutionary rates. Traditional methods assume evolutionary rates remain constant over time, but this assumption does not hold for rapidly evolving organisms such as viruses.
- Evolutionary biology relevance: Accurate estimation of time-varying evolutionary rates is crucial for understanding mechanisms of biological diversification
- Infectious disease research value: Viral genome sequences accumulate significant genetic changes over short timescales, requiring real-time analytical capabilities
- Timescale dependence: Research demonstrates that viral evolutionary rate estimates are heavily dependent on the sampling time framework
- Homogeneous CTMC assumption: Traditional methods assume substitution processes on branches follow homogeneous continuous-time Markov chains
- Fixed rate variation patterns: Existing relaxed clock models make fixed assumptions about rate variation patterns
- Computational complexity: Fully nonparametric functional approaches face computational challenges
Develop a flexible framework capable of directly modeling evolutionary rates as time functions, overcoming the limitations of homogeneous CTMC assumptions, and providing more accurate evolutionary rate estimates for rapidly evolving viruses and similar organisms.
- Theoretical innovation: First systematic introduction of inhomogeneous continuous-time Markov chains (ICTMCs) to phylogenetic inference
- Methodological breakthrough: Proposes multi-period clock model that parameterizes rate functions as piecewise constant functions across numerous time periods
- Computational optimization: Develops linear-time-complexity gradient evaluation algorithm combined with HMC for efficient sampling
- Prior design: Employs appropriate Gaussian Markov random field priors to ensure propriety of posterior distributions
- Empirical validation: Validates method effectiveness on multiple viral datasets, including SARS-CoV-2 transmission analysis
Input: N aligned molecular sequences with sampling time information
Output: Phylogenetic tree, time-varying evolutionary rate trajectory, divergence time estimates
Constraints: Rate function must be positive and integrable
For inhomogeneous CTMC, the infinitesimal generator matrix is a time function: Q(t)=f(t)Q, where:
- Q: Time-independent base infinitesimal generator matrix
- f(t): Unknown positive integrable rate function
Finite-time transition probability matrix:
P(t0,t)=exp[∫t0tf(τ)dτ⋅Q]
Parameterizes rate function as piecewise constant:
f(t)=θm,wm≤t<wm−1,m=1,…,M
where wM<⋯<w1 are time grid points and θ=(θ1,…,θM+1) is the rate parameter vector.
For a branch connecting node i to pa(i), the expected number of substitutions is:
bi=θq+1(wq−tpa(i))+∑m=pq−1θm+1(wm−wm+1)+θp(ti−wp)
Prior Design:
- Gaussian Markov random field prior on ζm=logθm
- First-order differences: ζm+1−ζm∣τ∼N(0,dm/τ)
- Proper prior: P(ζ∣τ)∝τM/2exp[−2τζ′(Dw−ρW)ζ]
Posterior Sampling: Uses Hamiltonian Monte Carlo with gradient computation via chain rule:
∂θm∂logP(θ,τ,ρ,Q,α,F∣Y)=∑i=12N−2∂bi∂logP∂θm∂bi
- Propriety guarantee: Ensures propriety of GMRF prior by introducing parameter ρ<1
- Gradient optimization: Develops gradient computation with O(NCS2+NM) complexity, significantly better than traditional O(N2CS2) approach
- Flexible grid design: Supports equally-spaced or adaptive grid point configurations
- Multi-scale modeling: Handles different timescales from weeks to centuries
- Simulated Data:
- Strict clock model simulation
- Log-linear clock model simulation (f(t)=e−4.5−0.05t)
- Real Viral Datasets:
- West Nile Virus: 104 complete genomes (1999-2007)
- Dengue Virus Type 3: 352 sequences (1972-2010)
- Seasonal Influenza A/H3N2: 402 sequences (1968-2010)
- SARS-CoV-2: 3,959 genomes (2020 European data)
- Posterior median and 95% Bayesian credible intervals of evolutionary rate trajectories
- Accuracy of time to most recent common ancestor (tMRCA) estimates
- Log marginal likelihood (model comparison)
- Effective sample size (ESS)
- Strict clock model
- Random local clock model
- Log-linear clock model
- BEAST X software package implementation
- MCMC iterations: 3-40 million
- Number of grid points: 60-360 time periods
- GMRF precision prior: Gamma(0.001, 0.001)
- Strict clock scenario: Multi-period model accurately recovers constant rates with precise tMRCA estimates
- Log-linear scenario: Accurately recovers true rate trajectories in data-rich regions with slight overestimation at root
West Nile Virus:
- Relatively constant rate trajectory (≈5×10−4 subst./site/yr)
- tMRCA: 1998 1997, 1999
- Strict clock model fits better (log marginal likelihood difference ≈27)
Dengue Virus:
- Strong time-varying pattern: 10-fold rate decrease 1995-2000, 10-fold increase 2003-2009
- Multi-period model outperforms random local clock (log marginal likelihood improvement ≈220)
- tMRCA: 1972 1963, 1973
Seasonal Influenza A/H3N2:
- Pronounced seasonal pattern: peak December-February
- Increased peak heights post-2001
- Posterior ρ=0.26 0.07, 0.58, avoiding over-smoothing
SARS-CoV-2 European Transmission:
- 90% reduction in spatial spread rate during March 2020 lockdown
- 9-fold rate increase after summer reopening
- Negative correlation with effective population size
- Grid density impact: More periods provide higher temporal resolution
- Prior sensitivity: GMRF precision prior selection has limited impact on results
- Propriety parameter ρ: Critical for detecting seasonal patterns
- Timescale dependence confirmation: Multiple viruses show significant time-varying rate patterns
- Epidemiological associations: Rate changes highly consistent with real-world intervention measures
- Computational efficiency: Gradient optimization enables large-scale data analysis
- Relaxed clock models: Random effects, local clocks, etc.
- Time-dependent models: Power-law decay, change-point models
- Nonparametric methods: Gaussian processes, spline functions
- Theoretical rigor: Solid mathematical foundation based on ICTMC
- Computational feasibility: Avoids computational difficulties of Gaussian process integration
- Flexibility: Handles arbitrary complex rate variation patterns
- Scalability: Linear time complexity supports large-scale data
- Method effectiveness: Multi-period clock model successfully captures time-varying evolutionary rates
- Biological significance: Reveals complex temporal dynamics of viral evolutionary rates
- Practical value: Provides real-time analytical tools for infectious disease surveillance
- Root uncertainty: Lack of calibration points leads to large uncertainty in root rate estimates
- Computational complexity: Despite optimization, still requires substantial MCMC iterations
- Grid selection: Requires prior knowledge to guide grid point configuration
- Model selection: Lacks automatic method for determining optimal number of periods
- Bivariate CAR models: Joint modeling of rates and effective population size
- Adaptive grids: Develop data-driven grid selection methods
- Multi-locus extension: Handle heterogeneity in whole-genome data
- Real-time inference: Develop online update algorithms
- Theoretical innovation: First systematic introduction of ICTMC to phylogenetics with solid theoretical foundation
- Clever methodology: Piecewise constant parameterization cleverly balances flexibility and computational feasibility
- Computational optimization: Linear-time gradient algorithm is important technical contribution
- Comprehensive validation: Thorough verification across simulations and multiple real datasets
- Biological insights: Reveals important temporal dynamics characteristics of viral evolution
- Prior sensitivity: GMRF prior propriety requires careful tuning of ρ parameter
- Model complexity: High-dimensional parameter space may cause convergence issues
- Interpretation challenges: Biological interpretation of complex time-varying patterns requires further investigation
- Computational resources: Large-scale data analysis still requires substantial computational resources
- Methodological contribution: Provides new theoretical framework for phylogenetic clock models
- Software implementation: BEAST X integration ensures broad applicability
- Interdisciplinary value: Successful application of statistical methods to biological problems
- Real-time monitoring: Provides important tools for infectious disease outbreak response
- Rapidly evolving viruses: RNA viruses, influenza viruses, etc.
- Epidemic monitoring: Real-time tracking of pathogen transmission dynamics
- Evolutionary biology: Studying temporal patterns of adaptive evolution
- Paleontology: Analyzing evolutionary rate changes over long timescales
The paper cites important literature from phylogenetics, Bayesian inference, and Markov processes, including Felsenstein's classic pruning algorithm, Drummond et al.'s relaxed clock models, and Rue & Held's Gaussian Markov random field theory and other foundational works.
Overall Assessment: This is a high-quality methodological paper with significant contributions in theoretical innovation, technical implementation, and practical application. The multi-period clock model provides new tools for phylogenetic inference, particularly suited for studying rapidly evolving organisms. The paper features rigorous mathematical derivations, well-designed experiments, and convincing results, and is expected to have important impacts on phylogenetics and infectious disease research.