Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
Qiao, Maros
We propose and study Sparse Polyak, a variant of Polyak's adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak step size performs poorly, requiring an increasing number of iterations to achieve optimal statistical precision-even when, the problem remains well conditioned and/or the achievable precision itself does not degrade with problem size. We trace this limitation to a mismatch in how smoothness is measured: in high dimensions, it is no longer effective to estimate the Lipschitz smoothness constant. Instead, it is more appropriate to estimate the smoothness restricted to specific directions relevant to the problem (restricted Lipschitz smoothness constant). Sparse Polyak overcomes this issue by modifying the step size to estimate the restricted Lipschitz smoothness constant. We support our approach with both theoretical analysis and numerical experiments, demonstrating its improved performance.
academic
Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation
This paper proposes and investigates Sparse Polyak, a variant of the Polyak adaptive step size specifically designed for high-dimensional statistical estimation problems where problem dimensionality grows much faster than sample size. In this regime, standard Polyak step sizes perform poorly, requiring increasingly more iterations to achieve optimal statistical accuracy—even when the problem remains well-conditioned and/or the achievable accuracy itself does not degrade with problem scale. The paper attributes this limitation to a mismatch in how smoothness is measured: in high dimensions, estimating the global Lipschitz smoothness constant becomes ineffective. Instead, it is more appropriate to estimate smoothness restricted to specific directions relevant to the problem (restricted Lipschitz smoothness constant). Sparse Polyak overcomes this issue by modifying the step size to estimate the restricted Lipschitz smoothness constant.
High-dimensional challenges: In high-dimensional settings, traditional Lipschitz smoothness constant estimation fails, leading to overly conservative step size selection
Performance degradation: Standard Polyak step sizes show significant performance deterioration as problem dimensionality increases, even when problem conditioning remains unchanged
Missing rate invariance: Existing methods fail to maintain convergence guarantees equivalent to fixed step size IHT
Iterative Hard Thresholding (IHT) algorithms perform excellently in high-dimensional sparse recovery but require knowledge of the restricted Lipschitz smoothness (RSS) constant L̄
Existing adaptive step size methods lack theoretical guarantees and practical performance in high-dimensional settings
There is a need for a method that adaptively adjusts step sizes while maintaining rate invariance
First high-dimensional adaptive step size rule: Proposes the first adaptive step size rule that performs well in high-dimensional settings and maintains rate invariance
Theoretical innovation: Identifies the fundamental problem of smoothness measurement in high dimensions and proposes estimating restricted Lipschitz smoothness constants rather than global constants
Convergence guarantees: Establishes linear convergence rates comparable to known optimal fixed step sizes, achieving optimal statistical accuracy
Broad applicability: Provides theoretical guarantees for multiple statistical models (logistic regression, linear regression, matrix regression, etc.)
Support recovery: Provides support recovery guarantees under signal-to-noise ratio conditions
Input: function f, target function value f̂, sparsity parameter s, number of iterations T
Initialize: θ_0 ∈ R^d, ||θ_0||_0 ≤ s
for t = 0 to T-1 do:
Compute step size: γ_t = max{f(θ_t) - f̂, 0} / (5||HT_s(∇f(θ_t))||²)
Update: θ_{t+1} = HT_s(θ_t - γ_t∇f(θ_t))
end for
Corollary 1 (Support Recovery):
Under signal-to-noise ratio condition |θ̂|_min ≥ 7||HT_s(∇f(θ̂))||/μ̄, the algorithm can accurately recover the support set.
Loh & Wainwright (2015) - High-dimensional statistics theory
Malitsky & Mishchenko (2020) - Modern adaptive methods
Overall Assessment: This is a high-quality theoretical paper that proposes an innovative solution to an important problem in high-dimensional optimization. The theoretical analysis is rigorous, experimental validation is comprehensive, and it makes significant contributions to the field. While there are some technical limitations, overall it represents important progress in this research area.