2025-11-10T03:06:05.923380

Revisit First-order Methods for Geodesically Convex Optimization

Shu, Jiang, Shi et al.
In a seminal work of Zhang and Sra, gradient descent methods for geodesically convex optimization were comprehensively studied. In particular, Zhang and Sra derived a comparison inequality that relates the iterative points in the optimization process. Since their seminal work, numerous follow-ups have studied different downstream usages of their comparison lemma. In this work, we introduce the concept of quasilinearization to optimization, presenting a novel framework for analyzing geodesically convex optimization. By leveraging this technique, we establish state-of-the-art convergence rates -- for both deterministic and stochastic settings -- under weaker assumptions than previously required. The technique of quasilinearization may prove valuable for other non-Euclidean optimization problems.
academic

Revisit First-order Methods for Geodesically Convex Optimization

Basic Information

  • Paper ID: 2504.06814
  • Title: Revisit First-order Methods for Geodesically Convex Optimization
  • Authors: Yunlu Shu, Jiaxin Jiang, Lei Shi, Tianyu Wang (Fudan University)
  • Classification: math.OC (Mathematical Optimization and Control)
  • Publication Date: October 16, 2025 (arXiv v4)
  • Paper Link: https://arxiv.org/abs/2504.06814

Abstract

This paper revisits first-order methods for geodesically convex optimization. Zhang and Sra conducted a comprehensive study of gradient descent methods for geodesically convex optimization in their seminal work, particularly deriving comparison inequalities for iterative points in the optimization process. This paper introduces the concept of quasilinearization to the optimization domain and proposes a novel framework for analyzing geodesically convex optimization. By leveraging this technique, state-of-the-art convergence rates are established for both deterministic and stochastic settings under weaker assumptions than previously required. The quasilinearization technique may prove valuable for other non-Euclidean optimization problems.

Research Background and Motivation

Problem Definition

This paper studies optimization problems on Hadamard manifolds: minxMf(x)\min_{x \in M} f(x) where M is a Hadamard manifold equipped with a Riemannian metric g.

Research Motivation

  1. Limitations of Existing Methods: The classical approach by Zhang and Sra relies on two strong assumptions:
    • (A1) Uniform lower bound on sectional curvature (CBB condition)
    • (A2) Prior upper bound on trajectory diameter
  2. Practical Issues: Many important Hadamard manifolds do not satisfy the CBB condition, such as warped product manifolds whose curvature may tend to negative infinity.
  3. Core Challenge: How to maintain state-of-the-art convergence rates while removing assumptions (A1) and (A2)?

Core Contributions

  1. Introduction of Quasilinearization Framework: First application of Berg and Nikolaev's quasilinearization concept to optimization analysis
  2. Removal of Strong Assumptions: Establishing convergence guarantees without requiring curvature lower bounds and bounded domain assumptions
  3. Deterministic Optimization: Achieving O(1/t) convergence rate for geodesically convex functions
  4. Stochastic Optimization: Achieving Õ(1/√t) convergence rate for smooth geodesically convex functions
  5. Theoretical Breakthrough: Providing an affirmative answer to Question (Q), demonstrating that optimal convergence rates can be maintained under weaker assumptions

Methodology Details

Quasilinearization Inner Product

For any two ordered geodesic segments xy\overrightarrow{xy} and zw\overrightarrow{zw} on manifold M, the quasilinearization inner product is defined as:

xy,zw=xyzwcosq(xy,zw)\langle\overrightarrow{xy}, \overrightarrow{zw}\rangle = |\overrightarrow{xy}||\overrightarrow{zw}|\cos_q(\overrightarrow{xy}, \overrightarrow{zw})

where: cosq(xy,zw)=xw2+yz2xz2yw22xyzw\cos_q(\overrightarrow{xy}, \overrightarrow{zw}) = \frac{|\overrightarrow{xw}|^2 + |\overrightarrow{yz}|^2 - |\overrightarrow{xz}|^2 - |\overrightarrow{yw}|^2}{2|\overrightarrow{xy}||\overrightarrow{zw}|}

Quasi-Convexity Definition

A function f is q-convex if: f(x)f(y)+yExpy(gradf(y)),yx+μ2d2(x,y)f(x) \geq f(y) + \langle\overrightarrow{y\text{Exp}_y(\text{grad}f(y))}, \overrightarrow{yx}\rangle + \frac{\mu}{2}d^2(x,y)

Proximal Gradient Algorithm

The core algorithm employs implicit proximal updates: xt=Expxt+1(ηgradf(xt+1))x_t = \text{Exp}_{x_{t+1}}(\eta \text{grad}f(x_{t+1}))

equivalent to solving: xt+1=argminz{f(z)+12ηd(xt,z)2}x_{t+1} = \arg\min_z \left\{f(z) + \frac{1}{2\eta}d(x_t, z)^2\right\}

Theoretical Analysis

Main Theorems

Theorem 1 (Deterministic Case): Let f be a geodesically convex function on Hadamard manifold M. The proximal gradient algorithm satisfies: f(xt)f(x)x0x2ηtf(x_t) - f(x^*) \leq \frac{|\overrightarrow{x_0x^*}|^2}{\eta t}

Theorem 2 (Stochastic Case): Under bounded variance assumptions, the stochastic proximal gradient algorithm with step size ηt=12Lt\eta_t = \frac{1}{2L\sqrt{t}} satisfies: 1t=1Tαtt=1Tαt(EF(xt)F(x))x0x22t=1Tαt+σ2log(T+1)t=1Tαt\frac{1}{\sum_{t=1}^T \alpha_t}\sum_{t=1}^T \alpha_t(\mathbb{E}F(x_t) - F(x^*)) \leq \frac{|\overrightarrow{x_0x^*}|^2}{2\sum_{t=1}^T \alpha_t} + \frac{\sigma^2 \log(T+1)}{\sum_{t=1}^T \alpha_t}

Key Technical Insights

  1. Advantages of Quasilinearization:
    • Applicable to all Hadamard manifolds without requiring curvature lower bounds
    • Preserves algebraic properties similar to Euclidean spaces
    • Naturally compatible with geodesic convexity
  2. Proof Techniques:
    • Establishing relationships between quasilinearization inner products and standard inner products via Lemma 2
    • Handling iterative inequalities through telescoping summation techniques
    • Cleverly circumventing limitations of traditional triangle comparison theorems

Experimental Setup and Results

Comparative Analysis

The paper provides a tabular comparison of different methods' assumptions and convergence rates:

MethodRequires Curvature Lower Bound?Requires Bounded Domain?Requires Solving Complex Equations?Convergence Rate
Zhang & SraYesYesNoO(t⁻¹)
Liu et al.NoYesYesO†(t⁻²)
This WorkNoNoNoO(t⁻¹)

Implementation Details

The paper provides efficient implementation methods for the proximal operator:

  • Solving strongly geodesically convex subproblems via gradient descent
  • Warm-start strategy to improve computational efficiency
  • Convergence guarantee: f(zt)f(z)(1μ4L0)t(f(z0)f(z))f(z_t) - f(z^*) \leq (1-\frac{\mu}{4L_0})^t(f(z_0) - f(z^*))

Historical Development

  1. Classical Work: Bonnabel (2013) and Zhang & Sra (2016) established the foundational analytical framework
  2. Subsequent Research: Multiple works attempted to relax assumptions but none fully resolved Question (Q)
  3. Technical Trajectory: Traditional methods rely on Toponogov triangle comparison theorem; this paper opens a new path via quasilinearization

Technical Comparison

  • Traditional Methods: Dependent on sectional curvature lower bounds, complex analysis
  • This Work: Utilizing quasilinearization, weaker assumptions, simpler analysis
  • Computational Complexity: Avoids solving complex equations involving Exp⁻¹ and Γ

Conclusions and Discussion

Main Conclusions

  1. Successfully resolves a decade-long open problem Question (Q)
  2. Establishes optimal convergence rates under the weakest assumptions
  3. Provides new analytical tools for non-Euclidean optimization

Limitations

  1. Still requires Hadamard manifold structure (non-positive curvature)
  2. Proximal operator may require numerical solution
  3. Constant factors may not be optimal

Future Directions

  1. Extension to non-smooth optimization problems
  2. Investigation of acceleration methods
  3. Application to specific machine learning problems

In-Depth Evaluation

Strengths

  1. Theoretical Breakthrough: Resolves an important open problem in the field
  2. Methodological Innovation: Introduction of quasilinearization technique is pioneering
  3. Weakest Assumptions: Achieves optimal convergence rates under minimal assumptions
  4. Elegant Analysis: Proofs are more direct compared to traditional methods

Weaknesses

  1. Insufficient Experimental Validation: Lacks numerical experiments to verify theoretical results
  2. Limited Application Scenarios: Primarily focuses on theoretical analysis with insufficient practical application demonstrations
  3. Constant Analysis: Does not provide precise estimates of convergence constants

Impact

  1. Theoretical Value: Makes significant contributions to Riemannian optimization theory
  2. Methodological Significance: Quasilinearization technique may influence broader non-Euclidean optimization
  3. Practical Potential: Provides stronger theoretical guarantees for manifold optimization in practical applications

Applicable Scenarios

  1. Manifold-constrained optimization in machine learning
  2. Geodesic problems in computational geometry
  3. Numerical solution of partial differential equations
  4. Equilibrium computation in economics

References

The paper cites 61 related references, primarily including:

  • Berg & Nikolaev (2008): Original work on quasilinearization
  • Zhang & Sra (2016): Classical analysis of geodesically convex optimization
  • Bonnabel (2013): Stochastic gradient descent on Riemannian manifolds
  • Multiple recent works on Hadamard manifold optimization

Overall Assessment: This is a high-quality theoretical paper that successfully resolves an important open problem in geodesically convex optimization through the introduction of quasilinearization techniques. While lacking numerical experiments, its theoretical contributions are significant, providing new analytical frameworks and tools for non-Euclidean optimization.