Stochastic Variance Reduced Gradient (SVRG) is an accelerated version of stochastic gradient descent based on variance reduction, showing promise for solving large-scale inverse problems. This paper analyzes SVRG and its regularized version incorporating prior knowledge for solving linear inverse problems in Hilbert spaces. The study proves that under appropriate constant step-size schedules and regularity conditions, regularized SVRG achieves optimal convergence rates with respect to noise level without requiring any early stopping rule; standard SVRG is also optimal for non-smooth solutions under a priori stopping rules. The analysis is based on explicit error recursions and appropriate a priori estimates regarding inner-loop updates around anchor points.
This paper studies linear inverse problems in Hilbert spaces:
where:
Consider the optimization problem: where
Initialization: x₀^δ = x₀, frequency M, step-size {ηₖ}
for K = 0,1,... do
Compute gₖ = J'(x_{KM}^δ) = (1/n)A_†*(A_†x_{KM}^δ - y^δ)
for t = 0,1,...,M-1 do
Randomly sample i_{KM+t} ∈ {1,...,n}
Update x_{KM+t+1}^δ = x_{KM+t}^δ - η_{KM+t}(A*_{i_{KM+t}}A_{i_{KM+t}}(x_{KM+t}^δ - x_{KM}^δ) + gₖ)
end
end
Replaces operator with approximate operator obtained through truncated singular value decomposition: where singular values satisfying are retained.
Three standard inverse problems from the Regutools package:
All problems are discretized into finite-dimensional linear systems with .
Under Assumption 2.1, there exists a constant independent of such that:
Expected convergence rate:
\delta^{2\nu/(1+2\nu)}, & a > 0 \\ n^{-1/2}\sqrt{k}\delta, & a = 0 \end{cases}$$ **Uniform convergence rate**: $$\|e_k^\delta\| \leq \sqrt{n}c^*k^{-1/2+\max(1/2-\nu,0)} + c^*\begin{cases} \delta^{2\nu/(1+2\nu)}, & a > 0 \\ n^{-1/2}\sqrt{k}\delta, & a = 0 \end{cases}$$ ### Optimality Results (Corollary 2.1) - **rSVRG**: Achieves optimal rate $O(\delta^{2\nu/(1+2\nu)})$ without early stopping - **SVRG**: Achieves optimality for $\nu \in (0,1/2]$ under a priori stopping $k(\delta) = O(\delta^{-2/(1+2\nu)})$ ### Numerical Experimental Results Experimental results across different regularity parameters $\nu$ and noise levels $\epsilon$ show: 1. **rSVRG advantages**: Achieves comparable accuracy to Landweber method in all test cases with significantly fewer iterations 2. **SVRG performance**: Performs well in low regularity cases, but requires smaller step-sizes for high regularity solutions 3. **Convergence behavior**: Higher noise levels require fewer iterations, consistent with theoretical predictions 4. **Plateau effect**: rSVRG's final error is typically lower than the other two methods Specific numerical results are shown in Tables 1-3. For example, for the s-phillips problem: - When $\nu=0, \epsilon=1e-3$, rSVRG achieves relative error of $1.93e-2$ with only 102.825 iterations - In comparison, Landweber method requires 758 iterations to achieve the same accuracy ## Related Work ### Stochastic Optimization Methods - **SGD-type methods**: Applications of stochastic gradient descent and variants in inverse problems - **Variance reduction techniques**: Development of variance reduction methods such as SVRG and SAGA ### Inverse Problem Theory - **Regularization theory**: Tikhonov regularization and iterative regularization methods - **Source conditions**: Standard assumptions characterizing solution smoothness - **Optimal convergence rates**: Minimax optimality under noise settings ### Contribution Position of This Work Compared to work by Jin et al. (2022) and Jin & Chen (2025): - More relaxed conditions: More practical requirements for SVRG convergence - More complete analysis: Provides both expected and uniform convergence rates - More practical methods: rSVRG requires no early stopping rule ## Conclusions and Discussion ### Main Conclusions 1. **Theoretical completeness**: Establishes a complete theoretical framework for SVRG and rSVRG solving linear inverse problems 2. **Optimality**: Both methods achieve minimax optimal convergence rates under appropriate conditions 3. **Practicality**: rSVRG possesses intrinsic regularization, making it more suitable for practical applications 4. **Improved conditions**: Relaxes convergence conditions compared to existing work ### Limitations 1. **Noise level dependence**: Methods require known noise level $\delta$ to construct operator $A$ and select stopping criteria 2. **Parameter selection**: Practical selection of parameters $a, b$ requires heuristic techniques 3. **Linear restriction**: Current analysis applies only to linear inverse problems 4. **Computational complexity**: Each outer loop requires full gradient computation, which may be expensive in some cases ### Future Directions 1. **Adaptive methods**: Develop adaptive versions not depending on known noise level 2. **Nonlinear extensions**: Extend theory to nonlinear inverse problems 3. **Practical applications**: Verify methods on specific imaging and signal processing problems 4. **Computational optimization**: Study strategies for reducing computational complexity ## In-Depth Evaluation ### Strengths 1. **Rigorous theory**: Deep and meticulous mathematical analysis with advanced proof techniques 2. **Complete results**: Provides both expected and uniform convergence rates, filling theoretical gaps 3. **Practical methods**: rSVRG's no-early-stopping feature makes it more suitable for practical applications 4. **Improved conditions**: Significantly relaxes convergence conditions compared to existing work 5. **Sufficient experiments**: Numerical experiments verify theoretical predictions and demonstrate method advantages ### Weaknesses 1. **High technical threshold**: Proof process is extremely complex, making understanding and verification difficult 2. **Parameter sensitivity**: Method performance is relatively sensitive to parameter selection 3. **Application limitations**: Requirement of known noise level restricts practical application scope 4. **Computational cost**: Full gradient computation may offset advantages of stochastic methods ### Impact 1. **Theoretical contribution**: Provides solid theoretical foundation for stochastic optimization applications in inverse problems 2. **Method guidance**: Offers new effective approaches for solving large-scale inverse problems 3. **Research promotion**: May stimulate more research on stochastic regularization methods 4. **Practical value**: Potential applications in medical imaging, geophysical exploration, and other fields ### Applicable Scenarios 1. **Large-scale linear inverse problems**: Especially imaging problems with massive data volumes 2. **Known prior information**: Cases where appropriate approximate operators can be constructed 3. **Estimable noise level**: Applications where data noise level can be reasonably estimated 4. **Sufficient computational resources**: Environments capable of bearing full gradient computation costs ## References The paper cites 62 related references, primarily including: - Classical stochastic optimization literature: Johnson & Zhang (2013), Bottou et al. (2018) - Inverse problem theory: Engl et al. (1996), Herman et al. (1978) - Related convergence analysis: Jin et al. (2022), Jin & Chen (2025) - Application background: Hansen (2007), Kereta et al. (2021) --- This paper achieves a good balance between theoretical depth and practical utility, providing important theoretical guidance and practical methods for solving large-scale linear inverse problems. Despite some limitations, its contributions are significant for advancing development in this field.