2025-11-16T17:31:12.997131

On the convergence of stochastic variance reduced gradient for linear inverse problems

Jin, Zhou
Stochastic variance reduced gradient (SVRG) is an accelerated version of stochastic gradient descent based on variance reduction, and is promising for solving large-scale inverse problems. In this work, we analyze SVRG and a regularized version that incorporates a priori knowledge of the problem, for solving linear inverse problems in Hilbert spaces. We prove that, with suitable constant step size schedules and regularity conditions, the regularized SVRG can achieve optimal convergence rates in terms of the noise level without any early stopping rules, and standard SVRG is also optimal for problems with nonsmooth solutions under a priori stopping rules. The analysis is based on an explicit error recursion and suitable prior estimates on the inner loop updates with respect to the anchor point. Numerical experiments are provided to complement the theoretical analysis.
academic

On the convergence of stochastic variance reduced gradient for linear inverse problems

Basic Information

  • Paper ID: 2510.14759
  • Title: On the convergence of stochastic variance reduced gradient for linear inverse problems
  • Authors: Bangti Jin, Zehui Zhou (Department of Mathematics, Chinese University of Hong Kong)
  • Classification: math.NA cs.NA math.OC
  • Publication Date: October 16, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.14759

Abstract

Stochastic Variance Reduced Gradient (SVRG) is an accelerated version of stochastic gradient descent based on variance reduction, showing promise for solving large-scale inverse problems. This paper analyzes SVRG and its regularized version incorporating prior knowledge for solving linear inverse problems in Hilbert spaces. The study proves that under appropriate constant step-size schedules and regularity conditions, regularized SVRG achieves optimal convergence rates with respect to noise level without requiring any early stopping rule; standard SVRG is also optimal for non-smooth solutions under a priori stopping rules. The analysis is based on explicit error recursions and appropriate a priori estimates regarding inner-loop updates around anchor points.

Research Background and Motivation

Problem Description

This paper studies linear inverse problems in Hilbert spaces: Ax=yA_\dagger x = y_\dagger

where:

  • A:XY=Y1××YnA_\dagger: X \to Y = Y_1 \times \cdots \times Y_n is the system operator
  • xXx \in X is the unknown signal, yYy_\dagger \in Y is the exact data
  • Only noisy data yδ=y+ξy^\delta = y_\dagger + \xi is available, with noise level δ=ξY\delta = \|\xi\|_Y

Research Motivation

  1. Large-scale problem demands: Linear inverse problems arise widely in practical applications such as computed tomography and positron emission tomography, with massive data scales
  2. Limitations of existing methods: Traditional iterative methods suffer from low computational efficiency on large-scale problems
  3. SVRG advantages: Stochastic variance reduced gradient methods possess excellent scalability, but their theoretical analysis in inverse problems remains incomplete
  4. Regularization requirements: Standard SVRG requires early stopping rules for regularization, while incorporation of prior knowledge may improve this situation

Core Contributions

  1. Refined theoretical analysis: Establishes complete convergence theory for SVRG and regularized SVRG (rSVRG) solving linear inverse problems
  2. Optimal convergence rates: Proves that both methods achieve optimal convergence rate O(δ2ν/(1+2ν))O(\delta^{2\nu/(1+2\nu)}) under appropriate conditions
  3. Regularization properties: rSVRG possesses intrinsic regularization mechanism without requiring early stopping; standard SVRG also exhibits regularization properties under a priori stopping
  4. Expected and uniform convergence: Establishes convergence rates in both expected and uniform senses, extending existing results
  5. Relaxed conditions: Compared to existing work, conditions for SVRG optimality are more relaxed

Methodology Details

Problem Formulation

Consider the optimization problem: J(x)=12nAxyδY2=1ni=1nfi(x)J(x) = \frac{1}{2n}\|A_\dagger x - y^\delta\|_Y^2 = \frac{1}{n}\sum_{i=1}^n f_i(x) where fi(x)=12A,ixyiδYi2f_i(x) = \frac{1}{2}\|A_{\dagger,i}x - y^\delta_i\|_{Y_i}^2

Algorithm Architecture

Standard SVRG (Algorithm 1)

Initialization: x₀^δ = x₀, frequency M, step-size {ηₖ}
for K = 0,1,... do
    Compute gₖ = J'(x_{KM}^δ) = (1/n)A_†*(A_†x_{KM}^δ - y^δ)
    for t = 0,1,...,M-1 do
        Randomly sample i_{KM+t} ∈ {1,...,n}
        Update x_{KM+t+1}^δ = x_{KM+t}^δ - η_{KM+t}(A*_{i_{KM+t}}A_{i_{KM+t}}(x_{KM+t}^δ - x_{KM}^δ) + gₖ)
    end
end

Regularized SVRG (Algorithm 2)

Replaces operator AA_\dagger with approximate operator AA obtained through truncated singular value decomposition: A()=j=1Jσjϕj,ψjA(\cdot) = \sum_{j=1}^J \sigma_j\langle\phi_j, \cdot\rangle\psi_j where singular values satisfying σjaδb\sigma_j \geq a\delta^b are retained.

Core Assumptions (Assumption 2.1)

  1. Step-size condition: ηj=c0L1\eta_j = c_0 \leq L^{-1}, where L=max1inAi2L = \max_{1\leq i\leq n}\|A_i\|^2
  2. Source condition: There exist ν>0\nu > 0 and wN(A)w \in N(A_\dagger)^\perp such that xx0=Bνwx_\dagger - x_0 = B_\dagger^\nu w
  3. Operator approximation: When a>0a > 0, AA is constructed via truncated SVD, retaining singular values σjaδb\sigma_j \geq a\delta^b

Technical Innovations

  1. Error decomposition strategy: Decomposes error into bias and variance components with precise estimation for each
  2. Anchor point analysis: Establishes critical a priori estimates by analyzing inner-loop updates relative to anchor points
  3. Unified framework: Provides a unified theoretical framework for handling both standard SVRG and regularized SVRG

Experimental Setup

Datasets

Three standard inverse problems from the Regutools package:

  • s-phillips: Mildly ill-posed problem
  • s-gravity: Severely ill-posed problem
  • s-shaw: Severely ill-posed problem

All problems are discretized into finite-dimensional linear systems with n=m=1000n = m = 1000.

Experimental Parameters

  • Exact solution generation: x=(AA)νxe1(AA)νxex_\dagger = \|(A_\dagger^*A_\dagger)^\nu x_e\|_{\ell^\infty}^{-1}(A_\dagger^*A_\dagger)^\nu x_e
  • Noise setting: yiδ=y,i+ϵyξiy^\delta_i = y_{\dagger,i} + \epsilon\|y_\dagger\|_{\ell^\infty}\xi_i, ξiN(0,1)\xi_i \sim \mathcal{N}(0,1)
  • Step-size: Landweber method uses c0=A2c_0 = \|A_\dagger\|^{-2}, (r)SVRG uses c0=O(c)c_0 = O(c), where c=L1c = L^{-1}
  • Frequency: M=2nM = 2n
  • Maximum iterations: 10510^5 rounds

Comparison Methods

  • Landweber Method (LM): Classical iterative regularization method with discrepancy principle stopping
  • Standard SVRG: Uses optimal error point stopping
  • Regularized SVRG (rSVRG): Uses theoretically guided stopping criterion

Experimental Results

Main Theoretical Result (Theorem 2.1)

Under Assumption 2.1, there exists a constant cc^* independent of k,n,δk, n, \delta such that:

Expected convergence rate:

\delta^{2\nu/(1+2\nu)}, & a > 0 \\ n^{-1/2}\sqrt{k}\delta, & a = 0 \end{cases}$$ **Uniform convergence rate**: $$\|e_k^\delta\| \leq \sqrt{n}c^*k^{-1/2+\max(1/2-\nu,0)} + c^*\begin{cases} \delta^{2\nu/(1+2\nu)}, & a > 0 \\ n^{-1/2}\sqrt{k}\delta, & a = 0 \end{cases}$$ ### Optimality Results (Corollary 2.1) - **rSVRG**: Achieves optimal rate $O(\delta^{2\nu/(1+2\nu)})$ without early stopping - **SVRG**: Achieves optimality for $\nu \in (0,1/2]$ under a priori stopping $k(\delta) = O(\delta^{-2/(1+2\nu)})$ ### Numerical Experimental Results Experimental results across different regularity parameters $\nu$ and noise levels $\epsilon$ show: 1. **rSVRG advantages**: Achieves comparable accuracy to Landweber method in all test cases with significantly fewer iterations 2. **SVRG performance**: Performs well in low regularity cases, but requires smaller step-sizes for high regularity solutions 3. **Convergence behavior**: Higher noise levels require fewer iterations, consistent with theoretical predictions 4. **Plateau effect**: rSVRG's final error is typically lower than the other two methods Specific numerical results are shown in Tables 1-3. For example, for the s-phillips problem: - When $\nu=0, \epsilon=1e-3$, rSVRG achieves relative error of $1.93e-2$ with only 102.825 iterations - In comparison, Landweber method requires 758 iterations to achieve the same accuracy ## Related Work ### Stochastic Optimization Methods - **SGD-type methods**: Applications of stochastic gradient descent and variants in inverse problems - **Variance reduction techniques**: Development of variance reduction methods such as SVRG and SAGA ### Inverse Problem Theory - **Regularization theory**: Tikhonov regularization and iterative regularization methods - **Source conditions**: Standard assumptions characterizing solution smoothness - **Optimal convergence rates**: Minimax optimality under noise settings ### Contribution Position of This Work Compared to work by Jin et al. (2022) and Jin & Chen (2025): - More relaxed conditions: More practical requirements for SVRG convergence - More complete analysis: Provides both expected and uniform convergence rates - More practical methods: rSVRG requires no early stopping rule ## Conclusions and Discussion ### Main Conclusions 1. **Theoretical completeness**: Establishes a complete theoretical framework for SVRG and rSVRG solving linear inverse problems 2. **Optimality**: Both methods achieve minimax optimal convergence rates under appropriate conditions 3. **Practicality**: rSVRG possesses intrinsic regularization, making it more suitable for practical applications 4. **Improved conditions**: Relaxes convergence conditions compared to existing work ### Limitations 1. **Noise level dependence**: Methods require known noise level $\delta$ to construct operator $A$ and select stopping criteria 2. **Parameter selection**: Practical selection of parameters $a, b$ requires heuristic techniques 3. **Linear restriction**: Current analysis applies only to linear inverse problems 4. **Computational complexity**: Each outer loop requires full gradient computation, which may be expensive in some cases ### Future Directions 1. **Adaptive methods**: Develop adaptive versions not depending on known noise level 2. **Nonlinear extensions**: Extend theory to nonlinear inverse problems 3. **Practical applications**: Verify methods on specific imaging and signal processing problems 4. **Computational optimization**: Study strategies for reducing computational complexity ## In-Depth Evaluation ### Strengths 1. **Rigorous theory**: Deep and meticulous mathematical analysis with advanced proof techniques 2. **Complete results**: Provides both expected and uniform convergence rates, filling theoretical gaps 3. **Practical methods**: rSVRG's no-early-stopping feature makes it more suitable for practical applications 4. **Improved conditions**: Significantly relaxes convergence conditions compared to existing work 5. **Sufficient experiments**: Numerical experiments verify theoretical predictions and demonstrate method advantages ### Weaknesses 1. **High technical threshold**: Proof process is extremely complex, making understanding and verification difficult 2. **Parameter sensitivity**: Method performance is relatively sensitive to parameter selection 3. **Application limitations**: Requirement of known noise level restricts practical application scope 4. **Computational cost**: Full gradient computation may offset advantages of stochastic methods ### Impact 1. **Theoretical contribution**: Provides solid theoretical foundation for stochastic optimization applications in inverse problems 2. **Method guidance**: Offers new effective approaches for solving large-scale inverse problems 3. **Research promotion**: May stimulate more research on stochastic regularization methods 4. **Practical value**: Potential applications in medical imaging, geophysical exploration, and other fields ### Applicable Scenarios 1. **Large-scale linear inverse problems**: Especially imaging problems with massive data volumes 2. **Known prior information**: Cases where appropriate approximate operators can be constructed 3. **Estimable noise level**: Applications where data noise level can be reasonably estimated 4. **Sufficient computational resources**: Environments capable of bearing full gradient computation costs ## References The paper cites 62 related references, primarily including: - Classical stochastic optimization literature: Johnson & Zhang (2013), Bottou et al. (2018) - Inverse problem theory: Engl et al. (1996), Herman et al. (1978) - Related convergence analysis: Jin et al. (2022), Jin & Chen (2025) - Application background: Hansen (2007), Kereta et al. (2021) --- This paper achieves a good balance between theoretical depth and practical utility, providing important theoretical guidance and practical methods for solving large-scale linear inverse problems. Despite some limitations, its contributions are significant for advancing development in this field.