2025-11-16T12:28:12.323029

Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo

Hofstadler, Latuszynski, Roberts et al.
We consider adaptive increasingly rare Markov chain Monte Carlo (MCMC) algorithms, which are adaptive MCMC methods, where the adaptation concerning the "past'' happens less and less frequently over time. Under a contraction assumption with respect to a Wasserstein-like function we deduce upper bounds of the convergence rate of Monte Carlo sums taking a renormalisation factor into account that is "almost'' the one that appears in a law of the iterated logarithm. We demonstrate the applicability of our results by considering different settings, among which are those of simultaneous geometric and uniform ergodicity. All proofs are carried out on an augmented state space, including the classical non-augmented setting as a special case. In contrast to other adaptive MCMC limit theory, some technical assumptions, like diminishing adaptation, are not needed.
academic

Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo

Basic Information

  • Paper ID: 2402.12122
  • Title: Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo
  • Authors: Julian Hofstadler (University of Bath), Krzysztof Latuszyński (University of Warwick), Gareth O. Roberts (University of Warwick), Daniel Rudolf (University of Passau)
  • Classification: math.NA cs.NA math.PR math.ST stat.TH
  • Publication Date: October 14, 2025 (arXiv version)
  • Paper Link: https://arxiv.org/abs/2402.12122

Abstract

This paper investigates the adaptive increasingly rare Markov chain Monte Carlo (AIR MCMC) algorithm, a class of adaptive MCMC methods where adaptations to the "past" become increasingly sparse over time. Under contraction assumptions on Wasserstein-like functions, the authors derive convergence rate upper bounds for Monte Carlo summation, accounting for renormalization factors that appear "almost" in the law of iterated logarithm. The paper demonstrates the applicability of results by considering different settings including simultaneous geometric ergodicity and uniform ergodicity. All proofs are conducted on augmented state spaces, with classical non-augmented settings as special cases. Compared to other adaptive MCMC limit theories, certain technical assumptions such as decreasing adaptation are not required.

Research Background and Motivation

Problem Definition

A ubiquitous challenge in computational statistics is approximating expectations: ν(f)=Xf(x)ν(dx)\nu(f) = \int_X f(x)\nu(dx) where ν\nu is the target distribution and f:XRf: X \to \mathbb{R} is an integrable function of interest.

Research Motivation

  1. Difficulty of Direct Sampling: When direct sampling from ν\nu is impossible or computationally infeasible (e.g., when the density contains an unknown normalization constant), alternative methods are necessary.
  2. Challenges in Adaptive MCMC: Traditional adaptive MCMC methods update single-step transition mechanisms by considering the entire history, resulting in non-Markovian processes that complicate mathematical analysis.
  3. Need for Simplified Technical Assumptions: Existing adaptive MCMC theory typically requires technical assumptions (such as decreasing adaptation), limiting the applicability of methods.

Limitations of Existing Approaches

  • The non-Markovian nature of adaptive MCMC leads to complex proof techniques
  • Strict technical conditions are needed to guarantee convergence
  • Lack of results on convergence of renormalized Monte Carlo summation

Core Contributions

  1. Proposed AIR MCMC Theoretical Framework: Established almost sure convergence rate theory for AIR algorithms under Wasserstein contraction assumptions.
  2. Improved Convergence Rates: Obtained convergence rates of the form r(n)=n(logn)1/2+εr(n) = \sqrt{n}(\log n)^{1/2+\varepsilon} or r(n)=n1/2+εr(n) = n^{1/2+\varepsilon}, approaching optimal rates of the law of iterated logarithm.
  3. Simplified Technical Assumptions: Eliminated the need for traditional technical assumptions such as decreasing adaptation, expanding the applicability of the method.
  4. Augmented State Space Analysis: Conducted analysis on augmented state spaces Y=X×ΦY = X \times \Phi, encompassing classical non-augmented settings as special cases.
  5. Broad Applicability: Results apply to multiple settings including simultaneous geometric ergodicity and uniform ergodicity.

Methodology Details

AIR MCMC Algorithm Definition

Given parameter β>0\beta > 0, set kj=jβk_j = \lceil j^\beta \rceil, performing adaptation only at specific time points: Tm=j=1mkjT_m = \sum_{j=1}^m k_j

Key observation: For any β>0\beta > 0, there exist constants cβ,Cβc_\beta, C_\beta such that: cβm1+βTmCβm1+βc_\beta m^{1+\beta} \leq T_m \leq C_\beta m^{1+\beta}

This implies that adaptation frequency decreases over time.

Core Technical Framework

1. Wasserstein-like Functions

For distance-like functions d:Y×YR+d: Y \times Y \to \mathbb{R}_+, define: W(μ1,μ2):=infξC(μ1,μ2)Y2d(x,y)ξ(dx,dy)W(\mu_1, \mu_2) := \inf_{\xi \in C(\mu_1,\mu_2)} \int_{Y^2} d(x,y)\xi(dx,dy)

2. Main Assumptions (Assumption 3.1)

For each γI\gamma \in I, assume:

  • πγ\pi_\gamma is the invariant distribution of PγP_\gamma
  • τ(Pγ)M\tau(P_\gamma) \leq M and τ(Pγk0)τ\tau(P_\gamma^{k_0}) \leq \tau

where M[1,)M \in [1,\infty), τ[0,1)\tau \in [0,1), k0Nk_0 \in \mathbb{N} are independent of γ\gamma.

3. Poisson Equation Solution

For function h:YRh: Y \to \mathbb{R} and γI\gamma \in I, the solution to the Poisson equation is: uγ(y)==0(Pγf(y)πγ(f))u_\gamma(y) = \sum_{\ell=0}^{\infty}(P_\gamma^\ell f(y) - \pi_\gamma(f))

Martingale Approximation Technique

Decompose Monte Carlo summation using the Poisson equation: j=1n(h(Yj)πΓj1(h))=Mn+Rm+bounded terms\sum_{j=1}^n (h(Y_j) - \pi_{\Gamma_{j-1}}(h)) = M_n + R_m + \text{bounded terms}

where:

  • MnM_n: martingale term
  • RmR_m: remainder term, significantly simplified for AIR algorithms

Main Theoretical Results

Theorem 3.5 (Case β1\beta \geq 1)

Under bounded eccentricity assumptions, for any ε>0\varepsilon > 0: limn1n(logn)1/2+εj=1n(f(Xj)ν(f))=0a.s.\lim_{n \to \infty} \frac{1}{\sqrt{n}(\log n)^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}

Theorem 3.6 (Case β(0,1)\beta \in (0,1))

For ε>11+β12\varepsilon > \frac{1}{1+\beta} - \frac{1}{2}: limn1n1/2+εj=1n(f(Xj)ν(f))=0a.s.\lim_{n \to \infty} \frac{1}{n^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}

Theorem 3.11 (Lyapunov Condition)

Under assumptions of Lyapunov function existence, for ε>max{0,11+β+1p12}\varepsilon > \max\{0, \frac{1}{1+\beta} + \frac{1}{p} - \frac{1}{2}\}: limn1n1/2+εj=1n(f(Xj)ν(f))=0a.s.\lim_{n \to \infty} \frac{1}{n^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}

Application Examples

1. Uniform Ergodicity Setting

Using the trivial metric d(y1,y2)=1{y1y2}d(y_1, y_2) = \mathbf{1}_{\{y_1 \neq y_2\}}, where WW corresponds to total variation distance.

Corollary 4.5: For bounded functions ff, under β1\beta \geq 1 and ε>0\varepsilon > 0: 1nj=1n(f(Xj(ω))ν(f))(logn)1/2+εnC(ω)\left|\frac{1}{n}\sum_{j=1}^n (f(X_j(\omega)) - \nu(f))\right| \leq \frac{(\log n)^{1/2+\varepsilon}}{\sqrt{n}} C(\omega)

2. Geometric Ergodicity Setting

Consider drift-small set conditions (Assumption 4.7), using weighted metric: dq(y1,y2)=1{y1y2}(Vq(y1)+Vq(y2))d_q(y_1, y_2) = \mathbf{1}_{\{y_1 \neq y_2\}}(V^q(y_1) + V^q(y_2))

3. Weak Harris Ergodicity

Using distance-like functions: d~q(y1,y2)=d(y1,y2)(1+Vq(y1)+Vq(y2))\tilde{d}_q(y_1, y_2) = \sqrt{d(y_1, y_2)(1 + V^q(y_1) + V^q(y_2))}

Technical Innovations

1. Simplified Remainder Control

The key advantage of the AIR algorithm is that most difficult terms in the remainder RmR_m cancel out, yielding: Rmn1/(1+β)constant|R_m| \leq n^{1/(1+\beta)} \cdot \text{constant}

2. No Decreasing Adaptation Required

Unlike traditional methods, the assumption ΓnΓn10\|\Gamma_n - \Gamma_{n-1}\| \to 0 is not needed.

3. Augmented State Space Treatment

Through the setting Y=X×ΦY = X \times \Phi, uniformly handle complex cases such as multimodal distributions.

Experimental Verification

The paper is primarily theoretical analysis, with results verified through:

1. Concrete Algorithm Instances

  • Adaptive random walk Metropolis algorithm
  • Adaptive stereographic MCMC algorithm
  • Preconditioned Crank-Nicolson (pCN) algorithm

2. Numerical Comparison References

Cites numerical experiments from CLR18, showing that AIR algorithm performance for β[1,2]\beta \in [1,2] is comparable to purely adaptive methods.

Classical Adaptive MCMC Theory

  • Law of Large Numbers: HST01, AR05, AM06, RR07, SV10, FMP11, PHL20
  • Central Limit Theorem: AM06, SV10
  • Convergence to Correct Target Measure: RR07, FMP11

Quantitative Ergodicity Results

  • AA07, AW15: Show P(Xn)νtvC/n\|P(X_n \in \cdot) - \nu\|_{tv} \leq C/n
  • AW15, CLR18: Mean squared error bounds showing 1/n1/n order convergence rates

Uniqueness of This Paper's Contributions

  1. Path Convergence Bounds: Unlike existing expected error bounds, provides almost sure path convergence
  2. Wasserstein Contraction Setting: Extends traditional uniform/geometric ergodicity framework
  3. Near-Optimal Rates: Convergence rates approach theoretical optimality of the law of iterated logarithm

Conclusions and Discussion

Main Conclusions

  1. AIR MCMC algorithm exhibits good almost sure convergence properties under Wasserstein contraction assumptions
  2. Convergence rates approach theoretical optimality, of the form n(logn)1/2+ε\sqrt{n}(\log n)^{1/2+\varepsilon}
  3. Technical assumptions are significantly simplified compared to traditional methods

Limitations

  1. Uniformity Requirements: Assumption 3.1 requires all bounds to be uniform in γ\gamma, which is restrictive
  2. Small β\beta Regime: When β(0,1)\beta \in (0,1), convergence rates deteriorate, requiring additional assumptions for improvement
  3. Purely Adaptive Algorithm: The purely adaptive case β=0\beta = 0 requires further investigation

Future Directions

  1. Weakening Uniformity Assumptions: May relax Assumption 3.1 under stochastic approximation algorithm frameworks
  2. Extension to Pure Adaptation: Utilize techniques from SV10 to handle the β=0\beta = 0 case
  3. Improvement in Small β\beta Regime: Develop techniques to handle β(0,1)\beta \in (0,1) without additional assumptions

In-Depth Evaluation

Strengths

  1. Theoretical Depth: Establishes complete AIR MCMC theory under Wasserstein contraction framework
  2. Technical Innovation: Cleverly exploits AIR structure to simplify remainder control in martingale approximation
  3. Broad Applicability: Covers uniform, geometric, and weak Harris ergodicity settings
  4. Practical Value: Provides path convergence bounds with practical guidance for single simulations

Weaknesses

  1. Restrictive Assumptions: Uniformity assumptions may be difficult to verify in practical applications
  2. Small β\beta Treatment: Requires additional Lipschitz and decaying adaptation conditions
  3. Limited Numerical Verification: Primarily theoretical analysis with insufficient numerical experiments

Impact

  1. Theoretical Contribution: Provides solid theoretical foundation for AIR MCMC
  2. Methodological Value: Wasserstein contraction methods may inspire analysis of other algorithms
  3. Practical Prospects: Path convergence bounds have important implications for MCMC diagnostics and stopping criteria

Applicable Scenarios

  1. High-Dimensional Statistical Inference: Suitable for sampling from complex posterior distributions
  2. Multimodal Distributions: Handles multimodality through augmented state space
  3. Computationally Constrained Resources: AIR algorithm reduces adaptation frequency, saving computational costs

References

The paper includes 34 important references covering major developments in adaptive MCMC theory, particularly:

  • CLR18: Original proposal of AIR algorithm
  • AM06, SV10: Classical adaptive MCMC theory
  • HMS11: Theoretical foundations of Wasserstein contraction methods
  • PHL20: Augmented state space methods