2025-11-15T16:40:12.095592

Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation

Butyrin, Moulines, Naumov et al.

In this paper, we refine the Berry-Esseen bounds for the multivariate normal approximation of Polyak-Ruppert averaged iterates arising from the linear stochastic approximation (LSA) algorithm with decreasing step size. We consider the normal approximation by the Gaussian distribution with covariance matrix predicted by the Polyak-Juditsky central limit theorem and establish the rate up to order $n^{-1/3}$ in convex distance, where $n$ is the number of samples used in the algorithm. We also prove a non-asymptotic validity of the multiplier bootstrap procedure for approximating the distribution of the rescaled error of the averaged LSA estimator. We establish approximation rates of order up to $1/\sqrt{n}$ for the latter distribution, which significantly improves upon the previous results obtained by Samsonov et al. (2024).

academic

Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation

基本信息

论文ID: 2510.12375
标题: Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation
作者: Bogdan Butyrin, Eric Moulines, Alexey Naumov, Sergey Samsonov, Qi-Man Shao, Zhuo-Song Zhang
分类: stat.ML cs.LG math.OC math.PR math.ST stat.TH
发表时间: October 15, 2025 (arXiv preprint)
论文链接: https://arxiv.org/abs/2510.12375

摘要

本文改进了线性随机逼近(LSA)算法中Polyak-Ruppert平均迭代的多元正态逼近的Berry-Esseen界。研究考虑了由Polyak-Juditsky中心极限定理预测的协方差矩阵的高斯分布正态逼近，在凸距离意义下建立了 $n^{-1/3}$ 阶的收敛率，其中 $n$ 是算法使用的样本数。同时证明了乘子bootstrap程序对平均LSA估计器重标误差分布逼近的非渐近有效性，建立了 $1/\sqrt{n}$ 阶的逼近率，显著改进了Samsonov等人(2024)的先前结果。

研究背景与动机

问题背景

线性随机逼近(LSA)算法是统计学和机器学习中的基础方法，用于逼近线性方程组 $\bar{A}\theta^* = \bar{b}$ 的唯一解，其中 $\bar{A} \in \mathbb{R}^{d \times d}$ 是非退化矩阵。该算法基于观测序列 $\{(A(Z_k), b(Z_k))\}_{k \in \mathbb{N}}$ 进行迭代更新。

核心挑战

分布逼近精度：现有的正态逼近结果收敛率较慢，限制了实际应用中置信区间的构造精度
协方差矩阵估计：渐近协方差矩阵 $\Sigma_\infty$ 在实践中未知，需要有效的估计和逼近方法
Bootstrap有效性：传统bootstrap方法在在线学习算法中面临理论和实践挑战

研究动机

提高LSA算法中心极限定理的收敛率分析
发展更精确的bootstrap置信区间构造方法
为强化学习中的时序差分(TD)学习等应用提供理论支撑

核心贡献

改进的矩界：建立了 $\sqrt{n}(\bar{\theta}_n - \theta^*)$ 的高阶矩界，对于 $p \geq 2$ 得到： $E^{1/p}[\|\bar{\theta}_n - \theta^*\|^p] \lesssim \frac{\sqrt{p}\sqrt{\text{Tr}\Sigma_\infty}}{\sqrt{n}} + \frac{p^{3/2}}{n^{5/6}}$
Berry-Esseen界的改进：在凸距离意义下建立了 $n^{-1/3}$ 阶的正态逼近率，改进了先前 $n^{-1/4}$ 的结果
乘子Bootstrap的非渐近分析：证明了bootstrap程序的有效性，逼近率达到 $n^{-1/2}$ ，显著优于现有结果
技术创新：通过选择合适的协方差矩阵 $\Sigma_n$ 而非 $\Sigma_\infty$ 进行逼近，避免了直接的高斯比较步骤

方法详解

任务定义

考虑LSA迭代： $\theta_k = \theta_{k-1} - \alpha_k(A(Z_k)\theta_{k-1} - b(Z_k)), \quad k \geq 1$ $\bar{\theta}_n = n^{-1}\sum_{k=0}^{n-1}\theta_k, \quad n \geq 1$

目标是分析 $\sqrt{n}(\bar{\theta}_n - \theta^*)$ 的分布逼近性质。

核心技术框架

1. 误差分解技术

将LSA误差分解为瞬态项和波动项： $\theta_k - \theta^* = \tilde{\theta}_k^{(tr)} + \tilde{\theta}_k^{(fl)}$ 其中：

$\tilde{\theta}_k^{(tr)} = \Gamma_{1:k}(\theta_0 - \theta^*)$ （瞬态项）
$\tilde{\theta}_k^{(fl)} = -\sum_{\ell=1}^k \alpha_\ell \Gamma_{\ell+1:k}\varepsilon_\ell$ （波动项）

2. 扰动展开方法

进一步分解波动项： $\tilde{\theta}_k^{(fl)} = J_k^{(0)} + H_k^{(0)}$ 其中 $J_k^{(0)}$ 是线性主导项， $H_k^{(0)}$ 是高阶余项。

3. 随机化浓度不等式

应用Shao-Zhang框架，将统计量表示为： $\sqrt{n}\Sigma_n^{-1/2}(\bar{\theta}_n - \theta^*) = W + D$ 其中 $W$ 是线性统计量， $D$ 是非线性余项。

步长选择策略

采用多项式衰减步长： $\alpha_k = \frac{c_0}{(k + k_0)^\gamma}, \quad \gamma \in (1/2, 1)$

关键发现：最优收敛率在 $\gamma = 2/3$ 时达到。

实验设置

理论验证框架

论文主要是理论分析，通过以下方式验证结果：

假设条件：
- A1: 独立同分布观测序列
- A2: Hurwitz矩阵条件和有界噪声
- A3: 步长条件
- A4-A5: 样本大小和技术条件
应用场景：时序差分(TD)学习算法作为重要特例

评价指标

凸距离： $\rho_{Conv}(\mu, \nu) = \sup_{B \in Conv(\mathbb{R}^d)}|\mu(B) - \nu(B)|$
收敛率：以 $n$ 的幂次表示的逼近精度

实验结果

主要理论结果

定理1：矩界

在适当假设下，对于 $p \geq 2$ ： $E^{1/p}[\|\bar{\theta}_n - \theta^*\|^p] \leq \frac{C_{1,1}\sqrt{p}\sqrt{\text{Tr}\Sigma_n}}{\sqrt{n}} + \Delta^{(fl)}(n,p,\gamma) + \frac{C_{1,5}\|\theta_0 - \theta^*\|}{n}$

定理2：高斯逼近

$\rho_{Conv}(\sqrt{n}(\bar{\theta}_n - \theta^*), \Sigma_n^{1/2}\eta) \leq \frac{C_{2,1}}{\sqrt{n}} + \frac{C_{2,2}}{n^{\gamma/2}} + C_{2,3}\phi_n + \frac{C_{2,4}\|\theta_0 - \theta^*\|}{n}$

定理3：渐近逼近

$\rho_{Conv}(\sqrt{n}(\bar{\theta}_n - \theta^*), \Sigma_\infty^{1/2}\eta) \leq \frac{C_{2,1}}{\sqrt{n}} + \frac{C_{2,2}}{n^{\gamma/2}} + C_{2,3}\phi_n + \frac{C_{2,4}\|\theta_0 - \theta^*\|}{n} + \frac{C_3}{n^{1-\gamma}}$

定理4：Bootstrap逼近

以 $1-1/n$ 的概率： $\sup_{B \in Conv(\mathbb{R}^d)}|P^b(\sqrt{n}(\bar{\theta}_n^b - \bar{\theta}_n) \in B) - P(\sqrt{n}(\bar{\theta}_n - \theta^*) \in B)| \leq \frac{C_4\|\theta_0 - \theta^*\| + \Delta_{4,1}}{\sqrt{n}} + \frac{\Delta_{4,2}}{n^{\gamma/2}} + \Delta_{4,3}\phi_n + \frac{\Delta_{4,4}}{n}$

收敛率比较

方法	收敛率	参考文献
本文(Berry-Esseen)	$n^{-1/3}$	-
Samsonov et al. (2024)	$n^{-1/4}$	41
本文(Bootstrap)	$n^{-1/2}$	-
Wu et al. (2024) TD	$n^{-1/3}$	54

结论与讨论

主要结论

最优收敛率：在凸距离下达到 $n^{-1/3}$ 的Berry-Esseen界，这在LSA设置下是最优的
Bootstrap改进：bootstrap逼近率达到 $n^{-1/2}$ ，显著优于现有 $n^{-1/4}$ 的结果
技术突破：通过选择 $\Sigma_n$ 而非 $\Sigma_\infty$ 作为逼近协方差，避免了技术障碍

局限性

独立性假设：仅考虑i.i.d.噪声，Markov噪声情况留作未来工作
步长约束：需要特定形式的多项式衰减步长
维数依赖：常数中包含维数依赖项，高维情况下可能较大

未来方向

Markov扩展：将结果推广到Markov噪声设置
下界匹配：在 $\gamma \in (1/2, 2/3)$ 区间建立匹配的下界
实际应用：在具体的强化学习和优化问题中验证理论预测

深度评价

优点

理论深度：提供了LSA领域最精确的收敛率分析，技术难度高
方法创新：巧妙地选择逼近协方差矩阵，突破了现有方法的局限
结果最优：多个结果达到或接近理论最优界
证明技巧：结合了多种先进的概率论工具，技术路线清晰

不足

实验验证：作为纯理论论文，缺少数值实验验证理论预测
实用性：常数依赖复杂，实际应用中的表现需要进一步研究
适用范围：假设条件较强，实际问题中的适用性有限

影响力

学术价值：为随机逼近理论提供了重要进展，预期会被广泛引用
应用前景：为强化学习、在线优化等领域提供了理论基础
方法论贡献：技术方法可能启发其他非线性统计问题的研究

适用场景

强化学习中的策略评估(TD学习)
在线凸优化算法分析
需要精确置信区间的统计推断问题
大规模机器学习中的理论分析

参考文献

Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization.
Shao, Q. M., & Zhang, Z. S. (2022). Berry–Esseen bounds for multivariate nonlinear statistics with applications to M-estimators and stochastic gradient descent algorithms. Bernoulli.
Fang, Y., Xu, J., & Yang, L. (2018). Online bootstrap confidence intervals for the stochastic gradient descent estimator. Journal of Machine Learning Research.
Durmus, A., Moulines, E., Naumov, A., & Samsonov, S. (2025). Finite-time high-probability bounds for Polyak–Ruppert averaged iterates of linear stochastic approximation. Mathematics of Operations Research.

Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation

Improved Central Limit Theorem and Bootstrap Approximations for Linear Stochastic Approximation

基本信息

摘要

研究背景与动机

问题背景

核心挑战

研究动机

核心贡献

方法详解

任务定义

核心技术框架

1. 误差分解技术

2. 扰动展开方法

3. 随机化浓度不等式

步长选择策略

实验设置

理论验证框架

评价指标

实验结果

主要理论结果

定理1：矩界

定理2：高斯逼近

定理3：渐近逼近

定理4：Bootstrap逼近

收敛率比较

相关工作

LSA理论发展

Bootstrap方法

技术工具

结论与讨论

主要结论

局限性

未来方向

深度评价

优点

不足

影响力

适用场景

参考文献