2025-11-10T02:58:05.695123

Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature

Pischke

We define a stochastic variant of the proximal point algorithm in the general setting of nonlinear (separable) Hadamard spaces for approximating zeros of the mean of a stochastically perturbed monotone vector field and prove its convergence under a suitable strong monotonicity assumption, together with a probabilistic independence assumption and a separability assumption on the tangent spaces. As a particular case, our results transfer previous work by P. Bianchi on that method in Hilbert spaces for the first time to Hadamard manifolds. Moreover, our convergence proof is fully effective and allows for the construction of explicit rates of convergence for the iteration towards the (unique) solution both in mean and almost surely. These rates are moreover highly uniform, being independent of most data surrounding the iteration, space or distribution. In that generality, these rates are novel already in the context of Hilbert spaces. Linear nonasymptotic guarantees under additional second-moment conditions on the Yosida approximates and special cases of stochastic convex minimization are discussed.

academic

Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature

基本信息

论文ID: 2510.10697
标题: Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature
作者: Nicholas Pischke (University of Bath)
分类: math.OC (Optimization and Control), cs.LG (Machine Learning)
发表时间: October 14, 2025 (arXiv preprint)
论文链接: https://arxiv.org/abs/2510.10697

摘要

本文在可分离Hadamard空间的一般非线性设置中定义了随机邻近点算法的随机变体，用于逼近随机扰动单调向量场均值的零点。在适当的强单调性假设、概率独立性假设和切空间可分性假设下，证明了算法的收敛性。作为特例，首次将P. Bianchi在Hilbert空间中的相关工作推广到Hadamard流形。收敛证明是完全有效的，允许构造迭代向唯一解的显式收敛率，包括均值收敛和几乎必然收敛。这些收敛率具有高度一致性，独立于迭代、空间或分布的大部分数据。

研究背景与动机

要解决的问题：
- 在非线性度量空间中求解随机优化问题： $\min_{x \in X} \int f(\xi, x) d\mu(\xi)$
- 将随机邻近点算法从Hilbert空间推广到更一般的非正曲率度量空间
问题的重要性：
- 随机逼近是机器学习和优化的核心问题
- 非线性空间上的优化在机器学习中应用广泛（如流形学习）
- 现有理论主要局限于Hilbert空间，缺乏非线性空间的理论基础
现有方法的局限性：
- Bianchi的工作仅适用于Hilbert空间
- 缺乏显式收敛率分析
- 非线性空间中的随机邻近点算法理论不完善
研究动机：
- 将成熟的Hilbert空间理论推广到CAT(0)空间和Hadamard流形
- 提供显式、一致的收敛率分析
- 建立非线性空间中随机优化的理论基础

核心贡献

理论推广：首次将随机邻近点算法从Hilbert空间推广到可分离Hadamard空间
收敛性分析：在强单调性假设下证明了强收敛性，包括均值收敛和几乎必然收敛
显式收敛率：构造了高度一致的显式收敛率，独立于大部分迭代参数
技术创新：发展了度量空间中的随机单调向量场理论和Aumann-Sturm积分
应用拓展：涵盖了Hilbert空间和Hadamard流形作为特例

方法详解

任务定义

给定概率空间 $(E, \mathcal{E}, \mu)$ 和可分离Hadamard空间 $X$ ，考虑随机单调向量场 $A: E \times X \to 2^{TX}$ ，其中 $A(s, x) \subseteq T_x X$ 。目标是找到均值算子 $\bar{A}(x) := \int A(s, x) d\mu(s)$ 的零点。

算法架构

随机邻近点算法 (SPPA)： $x_{n+1} := J_{\lambda_n}(\xi_{n+1}, x_n)$

其中：

$x_0 \in X$ 为初始点
$(\lambda_n) \subseteq (0, \infty)$ 为参数序列，满足 $(\lambda_n) \in \ell^2_+ \setminus \ell^1_+$
$(\xi_{n+1})$ 为独立同分布随机变量序列，分布为 $\mu$
$J_\lambda(s, x) := \{z \in X | \frac{1}{\lambda}\log_z x \in A(s, z)\}$ 为解算子

关键技术组件

度量空间几何结构：
- CAT(0)空间：满足非正曲率条件的完备测地度量空间
- 切空间 $T_x X$ ：通过Aleksandrov角度和欧几里得锥构造
- 准内积： $g_x(t\gamma, s\eta) := ts\cos\angle_x(\gamma, \eta)$
单调向量场：对于 $(x, u), (y, v) \in A$ ，满足： $g_x(u, \log_x y) \leq -g_y(v, \log_y x)$
强单调性（参数 $\alpha > 0$ ）： $g_x(u, \log_x y) \leq -g_y(v, \log_y x) - \alpha d^2(x, y)$
Yosida逼近： $A_\lambda(s, x) := \frac{1}{\lambda}\log_{J_\lambda(s,x)} x$

技术创新点

度量空间中的概率论：利用Sturm的积分理论建立度量空间上的随机变量理论
Aumann-Sturm积分：将Aumann积分推广到度量空间的集值映射
随机准Fejér单调性：建立两个关键不等式来控制迭代的随机行为
独立性假设：引入条件 $E_n[g_{x^*}(\phi^*(\xi_{n+1}), \log_{x^*} x_n)] = 0$ 处理非线性空间的技术困难

理论分析

关键假设

(A0) 参数条件： $(\lambda_n) \in \ell^2_+ \setminus \ell^1_+$ ， $(\xi_{n+1})$ 独立同分布
(A1) 强单调性： $A(s, \cdot)$ 强单调，模数 $\alpha(s) > 0$ ，且 $\int \alpha d\mu > 0$
(A2) 零点存在性：存在唯一零点 $x^* \in ZA^{(2)}$
(A3) 独立性： $E_n[g_{x^*}(\phi^*(\xi_{n+1}), \log_{x^*} x_n)] = 0$

主要定理

定理 4.7（主要收敛结果）：在假设(A0)-(A3)下，随机邻近点算法满足：

均值收敛： $E[d^2(x_n, x^*)] \to 0$
几乎必然收敛： $d^2(x_n, x^*) \to 0$ a.s.
显式收敛率： $\forall \varepsilon > 0, \forall n \geq \rho(\varepsilon): E[d^2(x_n, x^*)] < \varepsilon$ 其中 $\rho(\varepsilon) := \theta(\chi(\varepsilon/2c), 2D/\varepsilon)$