2025-11-16T12:28:12.323029

Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo

Hofstadler, Latuszynski, Roberts et al.

We consider adaptive increasingly rare Markov chain Monte Carlo (MCMC) algorithms, which are adaptive MCMC methods, where the adaptation concerning the "past'' happens less and less frequently over time. Under a contraction assumption with respect to a Wasserstein-like function we deduce upper bounds of the convergence rate of Monte Carlo sums taking a renormalisation factor into account that is "almost'' the one that appears in a law of the iterated logarithm. We demonstrate the applicability of our results by considering different settings, among which are those of simultaneous geometric and uniform ergodicity. All proofs are carried out on an augmented state space, including the classical non-augmented setting as a special case. In contrast to other adaptive MCMC limit theory, some technical assumptions, like diminishing adaptation, are not needed.

academic

Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo

Basic Information

Paper ID: 2402.12122
Title: Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo
Authors: Julian Hofstadler (University of Bath), Krzysztof Latuszyński (University of Warwick), Gareth O. Roberts (University of Warwick), Daniel Rudolf (University of Passau)
Classification: math.NA cs.NA math.PR math.ST stat.TH
Publication Date: October 14, 2025 (arXiv version)
Paper Link: https://arxiv.org/abs/2402.12122

Abstract

This paper investigates the adaptive increasingly rare Markov chain Monte Carlo (AIR MCMC) algorithm, a class of adaptive MCMC methods where adaptations to the "past" become increasingly sparse over time. Under contraction assumptions on Wasserstein-like functions, the authors derive convergence rate upper bounds for Monte Carlo summation, accounting for renormalization factors that appear "almost" in the law of iterated logarithm. The paper demonstrates the applicability of results by considering different settings including simultaneous geometric ergodicity and uniform ergodicity. All proofs are conducted on augmented state spaces, with classical non-augmented settings as special cases. Compared to other adaptive MCMC limit theories, certain technical assumptions such as decreasing adaptation are not required.

Research Background and Motivation

Problem Definition

A ubiquitous challenge in computational statistics is approximating expectations: $\nu(f) = \int_X f(x)\nu(dx)$ where $\nu$ is the target distribution and $f: X \to \mathbb{R}$ is an integrable function of interest.

Research Motivation

Difficulty of Direct Sampling: When direct sampling from $\nu$ is impossible or computationally infeasible (e.g., when the density contains an unknown normalization constant), alternative methods are necessary.
Challenges in Adaptive MCMC: Traditional adaptive MCMC methods update single-step transition mechanisms by considering the entire history, resulting in non-Markovian processes that complicate mathematical analysis.
Need for Simplified Technical Assumptions: Existing adaptive MCMC theory typically requires technical assumptions (such as decreasing adaptation), limiting the applicability of methods.

Limitations of Existing Approaches

The non-Markovian nature of adaptive MCMC leads to complex proof techniques
Strict technical conditions are needed to guarantee convergence
Lack of results on convergence of renormalized Monte Carlo summation

Core Contributions

Proposed AIR MCMC Theoretical Framework: Established almost sure convergence rate theory for AIR algorithms under Wasserstein contraction assumptions.
Improved Convergence Rates: Obtained convergence rates of the form $r(n) = \sqrt{n}(\log n)^{1/2+\varepsilon}$ or $r(n) = n^{1/2+\varepsilon}$ , approaching optimal rates of the law of iterated logarithm.
Simplified Technical Assumptions: Eliminated the need for traditional technical assumptions such as decreasing adaptation, expanding the applicability of the method.
Augmented State Space Analysis: Conducted analysis on augmented state spaces $Y = X \times \Phi$ , encompassing classical non-augmented settings as special cases.
Broad Applicability: Results apply to multiple settings including simultaneous geometric ergodicity and uniform ergodicity.

Methodology Details

AIR MCMC Algorithm Definition

Given parameter $\beta > 0$ , set $k_j = \lceil j^\beta \rceil$ , performing adaptation only at specific time points: $T_m = \sum_{j=1}^m k_j$

Key observation: For any $\beta > 0$ , there exist constants $c_\beta, C_\beta$ such that: $c_\beta m^{1+\beta} \leq T_m \leq C_\beta m^{1+\beta}$

This implies that adaptation frequency decreases over time.

Core Technical Framework

1. Wasserstein-like Functions

For distance-like functions $d: Y \times Y \to \mathbb{R}_+$ , define: $W(\mu_1, \mu_2) := \inf_{\xi \in C(\mu_1,\mu_2)} \int_{Y^2} d(x,y)\xi(dx,dy)$

2. Main Assumptions (Assumption 3.1)

For each $\gamma \in I$ , assume:

$\pi_\gamma$ is the invariant distribution of $P_\gamma$
$\tau(P_\gamma) \leq M$ and $\tau(P_\gamma^{k_0}) \leq \tau$

where $M \in [1,\infty)$ , $\tau \in [0,1)$ , $k_0 \in \mathbb{N}$ are independent of $\gamma$ .

3. Poisson Equation Solution

For function $h: Y \to \mathbb{R}$ and $\gamma \in I$ , the solution to the Poisson equation is: $u_\gamma(y) = \sum_{\ell=0}^{\infty}(P_\gamma^\ell f(y) - \pi_\gamma(f))$

Martingale Approximation Technique

Decompose Monte Carlo summation using the Poisson equation: $\sum_{j=1}^n (h(Y_j) - \pi_{\Gamma_{j-1}}(h)) = M_n + R_m + \text{bounded terms}$

where:

$M_n$ : martingale term
$R_m$ : remainder term, significantly simplified for AIR algorithms

Main Theoretical Results

Theorem 3.5 (Case $\beta \geq 1$ )

Under bounded eccentricity assumptions, for any $\varepsilon > 0$ : $\lim_{n \to \infty} \frac{1}{\sqrt{n}(\log n)^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}$

Theorem 3.6 (Case $\beta \in (0,1)$ )

For $\varepsilon > \frac{1}{1+\beta} - \frac{1}{2}$ : $\lim_{n \to \infty} \frac{1}{n^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}$

Theorem 3.11 (Lyapunov Condition)

Under assumptions of Lyapunov function existence, for $\varepsilon > \max\{0, \frac{1}{1+\beta} + \frac{1}{p} - \frac{1}{2}\}$ : $\lim_{n \to \infty} \frac{1}{n^{1/2+\varepsilon}} \sum_{j=1}^n (f(X_j) - \nu(f)) = 0 \quad \text{a.s.}$

Application Examples

1. Uniform Ergodicity Setting

Using the trivial metric $d(y_1, y_2) = \mathbf{1}_{\{y_1 \neq y_2\}}$ , where $W$ corresponds to total variation distance.

Corollary 4.5: For bounded functions $f$ , under $\beta \geq 1$ and $\varepsilon > 0$ : $\left|\frac{1}{n}\sum_{j=1}^n (f(X_j(\omega)) - \nu(f))\right| \leq \frac{(\log n)^{1/2+\varepsilon}}{\sqrt{n}} C(\omega)$

2. Geometric Ergodicity Setting

Consider drift-small set conditions (Assumption 4.7), using weighted metric: $d_q(y_1, y_2) = \mathbf{1}_{\{y_1 \neq y_2\}}(V^q(y_1) + V^q(y_2))$

3. Weak Harris Ergodicity

Using distance-like functions: $\tilde{d}_q(y_1, y_2) = \sqrt{d(y_1, y_2)(1 + V^q(y_1) + V^q(y_2))}$

Technical Innovations

1. Simplified Remainder Control

The key advantage of the AIR algorithm is that most difficult terms in the remainder $R_m$ cancel out, yielding: $|R_m| \leq n^{1/(1+\beta)} \cdot \text{constant}$

2. No Decreasing Adaptation Required

Unlike traditional methods, the assumption $\|\Gamma_n - \Gamma_{n-1}\| \to 0$ is not needed.

3. Augmented State Space Treatment

Through the setting $Y = X \times \Phi$ , uniformly handle complex cases such as multimodal distributions.

Experimental Verification

The paper is primarily theoretical analysis, with results verified through:

1. Concrete Algorithm Instances

Adaptive random walk Metropolis algorithm
Adaptive stereographic MCMC algorithm
Preconditioned Crank-Nicolson (pCN) algorithm

2. Numerical Comparison References

Cites numerical experiments from CLR18, showing that AIR algorithm performance for $\beta \in [1,2]$ is comparable to purely adaptive methods.

Classical Adaptive MCMC Theory

Law of Large Numbers: HST01, AR05, AM06, RR07, SV10, FMP11, PHL20
Central Limit Theorem: AM06, SV10
Convergence to Correct Target Measure: RR07, FMP11

Quantitative Ergodicity Results

AA07, AW15: Show $\|P(X_n \in \cdot) - \nu\|_{tv} \leq C/n$
AW15, CLR18: Mean squared error bounds showing $1/n$ order convergence rates

Uniqueness of This Paper's Contributions

Path Convergence Bounds: Unlike existing expected error bounds, provides almost sure path convergence
Wasserstein Contraction Setting: Extends traditional uniform/geometric ergodicity framework
Near-Optimal Rates: Convergence rates approach theoretical optimality of the law of iterated logarithm

Conclusions and Discussion

Main Conclusions

AIR MCMC algorithm exhibits good almost sure convergence properties under Wasserstein contraction assumptions
Convergence rates approach theoretical optimality, of the form $\sqrt{n}(\log n)^{1/2+\varepsilon}$
Technical assumptions are significantly simplified compared to traditional methods

Limitations

Uniformity Requirements: Assumption 3.1 requires all bounds to be uniform in $\gamma$ , which is restrictive
Small $\beta$ Regime: When $\beta \in (0,1)$ , convergence rates deteriorate, requiring additional assumptions for improvement
Purely Adaptive Algorithm: The purely adaptive case $\beta = 0$ requires further investigation

Future Directions

Weakening Uniformity Assumptions: May relax Assumption 3.1 under stochastic approximation algorithm frameworks
Extension to Pure Adaptation: Utilize techniques from SV10 to handle the $\beta = 0$ case
Improvement in Small $\beta$ Regime: Develop techniques to handle $\beta \in (0,1)$ without additional assumptions

In-Depth Evaluation

Strengths

Theoretical Depth: Establishes complete AIR MCMC theory under Wasserstein contraction framework
Technical Innovation: Cleverly exploits AIR structure to simplify remainder control in martingale approximation
Broad Applicability: Covers uniform, geometric, and weak Harris ergodicity settings
Practical Value: Provides path convergence bounds with practical guidance for single simulations

Weaknesses

Restrictive Assumptions: Uniformity assumptions may be difficult to verify in practical applications
Small $\beta$ Treatment: Requires additional Lipschitz and decaying adaptation conditions
Limited Numerical Verification: Primarily theoretical analysis with insufficient numerical experiments

Impact

Theoretical Contribution: Provides solid theoretical foundation for AIR MCMC
Methodological Value: Wasserstein contraction methods may inspire analysis of other algorithms
Practical Prospects: Path convergence bounds have important implications for MCMC diagnostics and stopping criteria

Applicable Scenarios

High-Dimensional Statistical Inference: Suitable for sampling from complex posterior distributions
Multimodal Distributions: Handles multimodality through augmented state space
Computationally Constrained Resources: AIR algorithm reduces adaptation frequency, saving computational costs

References

The paper includes 34 important references covering major developments in adaptive MCMC theory, particularly:

CLR18: Original proposal of AIR algorithm
AM06, SV10: Classical adaptive MCMC theory
HMS11: Theoretical foundations of Wasserstein contraction methods
PHL20: Augmented state space methods