2025-11-10T02:58:05.695123

Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature

Pischke

We define a stochastic variant of the proximal point algorithm in the general setting of nonlinear (separable) Hadamard spaces for approximating zeros of the mean of a stochastically perturbed monotone vector field and prove its convergence under a suitable strong monotonicity assumption, together with a probabilistic independence assumption and a separability assumption on the tangent spaces. As a particular case, our results transfer previous work by P. Bianchi on that method in Hilbert spaces for the first time to Hadamard manifolds. Moreover, our convergence proof is fully effective and allows for the construction of explicit rates of convergence for the iteration towards the (unique) solution both in mean and almost surely. These rates are moreover highly uniform, being independent of most data surrounding the iteration, space or distribution. In that generality, these rates are novel already in the context of Hilbert spaces. Linear nonasymptotic guarantees under additional second-moment conditions on the Yosida approximates and special cases of stochastic convex minimization are discussed.

academic

非正曲率度量空間における確率的近接点アルゴリズムの平均二乗収束と線形収束

基本情報

論文ID: 2510.10697
タイトル: Mean-square and linear convergence of a stochastic proximal point algorithm in metric spaces of nonpositive curvature
著者: Nicholas Pischke (University of Bath)
分類: math.OC (最適化と制御)、cs.LG (機械学習)
発表日時: 2025年10月14日 (arXiv プレプリント)
論文リンク: https://arxiv.org/abs/2510.10697

要旨

本論文は、可分離Hadamard空間の一般的な非線形設定において、確率的に摂動された単調ベクトル場の平均値の零点を近似するための確率的近接点アルゴリズムの確率的変種を定義する。適切な強単調性仮定、確率的独立性仮定、および接空間の可分性仮定の下で、アルゴリズムの収束性を証明する。特殊な場合として、P. Bianchによるヒルベルト空間での関連研究をHadamard多様体に初めて一般化する。収束証明は完全に有効であり、反復から唯一解への明示的な収束率の構成を可能にする。これには平均収束と概ほぼ確実収束が含まれる。これらの収束率は高度に一貫性があり、反復、空間、または分布のほとんどのデータに依存しない。

研究背景と動機

解決すべき問題：
- 非線形度量空間における確率的最適化問題の求解： $\min_{x \in X} \int f(\xi, x) d\mu(\xi)$
- ヒルベルト空間から非正曲率度量空間へのより一般的な確率的近接点アルゴリズムの一般化
問題の重要性：
- 確率的近似は機械学習と最適化の中核的課題である
- 非線形空間上の最適化は機械学習で広く応用されている（例：多様体学習）
- 既存理論は主にヒルベルト空間に限定され、非線形空間の理論的基礎が不足している
既存手法の限界：
- Bianchの研究はヒルベルト空間にのみ適用可能
- 明示的な収束率分析が不足している
- 非線形空間における確率的近接点アルゴリズム理論が不完全である
研究動機：
- 成熟したヒルベルト空間理論をCAT(0)空間とHadamard多様体に一般化する
- 明示的で一貫性のある収束率分析を提供する
- 非線形空間における確率的最適化の理論的基礎を確立する

核心的貢献

理論的一般化：確率的近接点アルゴリズムをヒルベルト空間から可分離Hadamard空間に初めて一般化
収束性分析：強単調性仮定の下での強収束性を証明。平均収束とほぼ確実収束を含む
明示的収束率：反復パラメータの大部分に依存しない高度に一貫性のある明示的収束率を構成
技術的革新：度量空間における確率的単調ベクトル場理論とAumann-Sturm積分を発展させた
応用拡張：ヒルベルト空間とHadamard多様体を特殊な場合として包含

方法論の詳細

タスク定義

確率空間 $(E, \mathcal{E}, \mu)$ と可分離Hadamard空間 $X$ が与えられたとき、確率的単調ベクトル場 $A: E \times X \to 2^{TX}$ を考える。ここで $A(s, x) \subseteq T_x X$ である。目標は平均作用素 $\bar{A}(x) := \int A(s, x) d\mu(s)$ の零点を見つけることである。

アルゴリズム構造

確率的近接点アルゴリズム (SPPA)： $x_{n+1} := J_{\lambda_n}(\xi_{n+1}, x_n)$

ここで：

$x_0 \in X$ は初期点
$(\lambda_n) \subseteq (0, \infty)$ はパラメータ列で、 $(\lambda_n) \in \ell^2_+ \setminus \ell^1_+$ を満たす
$(\xi_{n+1})$ は分布 $\mu$ を持つ独立同分布確率変数列
$J_\lambda(s, x) := \{z \in X | \frac{1}{\lambda}\log_z x \in A(s, z)\}$ は解作用素

主要な技術的構成要素

度量空間の幾何学的構造：
- CAT(0)空間：非正曲率条件を満たす完備測地度量空間
- 接空間 $T_x X$ ：Aleksandrov角度とユークリッド錐を通じて構成
- 準内積： $g_x(t\gamma, s\eta) := ts\cos\angle_x(\gamma, \eta)$
単調ベクトル場： $(x, u), (y, v) \in A$ に対して、以下を満たす： $g_x(u, \log_x y) \leq -g_y(v, \log_y x)$
強単調性（パラメータ $\alpha > 0$ ）： $g_x(u, \log_x y) \leq -g_y(v, \log_y x) - \alpha d^2(x, y)$
Yosida近似： $A_\lambda(s, x) := \frac{1}{\lambda}\log_{J_\lambda(s,x)} x$

技術的革新点

度量空間における確率論：Stumの積分理論を利用して度量空間上の確率変数理論を確立
Aumann-Sturm積分：Aumann積分を度量空間の集値写像に一般化
確率的準Fejér単調性：反復の確率的挙動を制御するための2つの主要な不等式を確立
独立性仮定：非線形空間の技術的困難に対処するため、 $E_n[g_{x^*}(\phi^*(\xi_{n+1}), \log_{x^*} x_n)] = 0$ という条件を導入

理論的分析

主要な仮定

(A0) パラメータ条件： $(\lambda_n) \in \ell^2_+ \setminus \ell^1_+$ 、 $(\xi_{n+1})$ は独立同分布
(A1) 強単調性： $A(s, \cdot)$ は強単調で、モジュラス $\alpha(s) > 0$ 、かつ $\int \alpha d\mu > 0$
(A2) 零点の存在性：唯一の零点 $x^* \in ZA^{(2)}$ が存在
(A3) 独立性： $E_n[g_{x^*}(\phi^*(\xi_{n+1}), \log_{x^*} x_n)] = 0$

主要定理

定理 4.7（主要収束結果）：仮定(A0)-(A3)の下で、確率的近接点アルゴリズムは以下を満たす：

平均収束： $E[d^2(x_n, x^*)] \to 0$
ほぼ確実収束： $d^2(x_n, x^*) \to 0$ a.s.
明示的収束率： $\forall \varepsilon > 0, \forall n \geq \rho(\varepsilon): E[d^2(x_n, x^*)] < \varepsilon$ ここで $\rho(\varepsilon) := \theta(\chi(\varepsilon/2c), 2D/\varepsilon)$