2025-11-10T03:06:05.923380

Revisit First-order Methods for Geodesically Convex Optimization

Shu, Jiang, Shi et al.

In a seminal work of Zhang and Sra, gradient descent methods for geodesically convex optimization were comprehensively studied. In particular, Zhang and Sra derived a comparison inequality that relates the iterative points in the optimization process. Since their seminal work, numerous follow-ups have studied different downstream usages of their comparison lemma. In this work, we introduce the concept of quasilinearization to optimization, presenting a novel framework for analyzing geodesically convex optimization. By leveraging this technique, we establish state-of-the-art convergence rates -- for both deterministic and stochastic settings -- under weaker assumptions than previously required. The technique of quasilinearization may prove valuable for other non-Euclidean optimization problems.

academic

Revisit First-order Methods for Geodesically Convex Optimization

Basic Information

Paper ID: 2504.06814
Title: Revisit First-order Methods for Geodesically Convex Optimization
Authors: Yunlu Shu, Jiaxin Jiang, Lei Shi, Tianyu Wang (Fudan University)
Classification: math.OC (Mathematical Optimization and Control)
Publication Date: October 16, 2025 (arXiv v4)
Paper Link: https://arxiv.org/abs/2504.06814

Abstract

This paper revisits first-order methods for geodesically convex optimization. Zhang and Sra conducted a comprehensive study of gradient descent methods for geodesically convex optimization in their seminal work, particularly deriving comparison inequalities for iterative points in the optimization process. This paper introduces the concept of quasilinearization to the optimization domain and proposes a novel framework for analyzing geodesically convex optimization. By leveraging this technique, state-of-the-art convergence rates are established for both deterministic and stochastic settings under weaker assumptions than previously required. The quasilinearization technique may prove valuable for other non-Euclidean optimization problems.

Research Background and Motivation

Problem Definition

This paper studies optimization problems on Hadamard manifolds: $\min_{x \in M} f(x)$ where M is a Hadamard manifold equipped with a Riemannian metric g.

Research Motivation

Limitations of Existing Methods: The classical approach by Zhang and Sra relies on two strong assumptions:
- (A1) Uniform lower bound on sectional curvature (CBB condition)
- (A2) Prior upper bound on trajectory diameter
Practical Issues: Many important Hadamard manifolds do not satisfy the CBB condition, such as warped product manifolds whose curvature may tend to negative infinity.
Core Challenge: How to maintain state-of-the-art convergence rates while removing assumptions (A1) and (A2)?

Core Contributions

Introduction of Quasilinearization Framework: First application of Berg and Nikolaev's quasilinearization concept to optimization analysis
Removal of Strong Assumptions: Establishing convergence guarantees without requiring curvature lower bounds and bounded domain assumptions
Deterministic Optimization: Achieving O(1/t) convergence rate for geodesically convex functions
Stochastic Optimization: Achieving Õ(1/√t) convergence rate for smooth geodesically convex functions
Theoretical Breakthrough: Providing an affirmative answer to Question (Q), demonstrating that optimal convergence rates can be maintained under weaker assumptions

Methodology Details

Quasilinearization Inner Product

For any two ordered geodesic segments $\overrightarrow{xy}$ and $\overrightarrow{zw}$ on manifold M, the quasilinearization inner product is defined as:

$\langle\overrightarrow{xy}, \overrightarrow{zw}\rangle = |\overrightarrow{xy}||\overrightarrow{zw}|\cos_q(\overrightarrow{xy}, \overrightarrow{zw})$

where: $\cos_q(\overrightarrow{xy}, \overrightarrow{zw}) = \frac{|\overrightarrow{xw}|^2 + |\overrightarrow{yz}|^2 - |\overrightarrow{xz}|^2 - |\overrightarrow{yw}|^2}{2|\overrightarrow{xy}||\overrightarrow{zw}|}$

Quasi-Convexity Definition

A function f is q-convex if: $f(x) \geq f(y) + \langle\overrightarrow{y\text{Exp}_y(\text{grad}f(y))}, \overrightarrow{yx}\rangle + \frac{\mu}{2}d^2(x,y)$

Proximal Gradient Algorithm

The core algorithm employs implicit proximal updates: $x_t = \text{Exp}_{x_{t+1}}(\eta \text{grad}f(x_{t+1}))$

equivalent to solving: $x_{t+1} = \arg\min_z \left\{f(z) + \frac{1}{2\eta}d(x_t, z)^2\right\}$

Theoretical Analysis

Main Theorems

Theorem 1 (Deterministic Case): Let f be a geodesically convex function on Hadamard manifold M. The proximal gradient algorithm satisfies: $f(x_t) - f(x^*) \leq \frac{|\overrightarrow{x_0x^*}|^2}{\eta t}$

Theorem 2 (Stochastic Case): Under bounded variance assumptions, the stochastic proximal gradient algorithm with step size $\eta_t = \frac{1}{2L\sqrt{t}}$ satisfies: $\frac{1}{\sum_{t=1}^T \alpha_t}\sum_{t=1}^T \alpha_t(\mathbb{E}F(x_t) - F(x^*)) \leq \frac{|\overrightarrow{x_0x^*}|^2}{2\sum_{t=1}^T \alpha_t} + \frac{\sigma^2 \log(T+1)}{\sum_{t=1}^T \alpha_t}$

Key Technical Insights

Advantages of Quasilinearization:
- Applicable to all Hadamard manifolds without requiring curvature lower bounds
- Preserves algebraic properties similar to Euclidean spaces
- Naturally compatible with geodesic convexity
Proof Techniques:
- Establishing relationships between quasilinearization inner products and standard inner products via Lemma 2
- Handling iterative inequalities through telescoping summation techniques
- Cleverly circumventing limitations of traditional triangle comparison theorems

Experimental Setup and Results

Comparative Analysis

The paper provides a tabular comparison of different methods' assumptions and convergence rates:

Method	Requires Curvature Lower Bound?	Requires Bounded Domain?	Requires Solving Complex Equations?	Convergence Rate
Zhang & Sra	Yes	Yes	No	O(t⁻¹)
Liu et al.	No	Yes	Yes	O†(t⁻²)
This Work	No	No	No	O(t⁻¹)

Implementation Details

The paper provides efficient implementation methods for the proximal operator:

Solving strongly geodesically convex subproblems via gradient descent
Warm-start strategy to improve computational efficiency
Convergence guarantee: $f(z_t) - f(z^*) \leq (1-\frac{\mu}{4L_0})^t(f(z_0) - f(z^*))$

Historical Development

Classical Work: Bonnabel (2013) and Zhang & Sra (2016) established the foundational analytical framework
Subsequent Research: Multiple works attempted to relax assumptions but none fully resolved Question (Q)
Technical Trajectory: Traditional methods rely on Toponogov triangle comparison theorem; this paper opens a new path via quasilinearization

Technical Comparison

Traditional Methods: Dependent on sectional curvature lower bounds, complex analysis
This Work: Utilizing quasilinearization, weaker assumptions, simpler analysis
Computational Complexity: Avoids solving complex equations involving Exp⁻¹ and Γ

Conclusions and Discussion

Main Conclusions

Successfully resolves a decade-long open problem Question (Q)
Establishes optimal convergence rates under the weakest assumptions
Provides new analytical tools for non-Euclidean optimization

Limitations

Still requires Hadamard manifold structure (non-positive curvature)
Proximal operator may require numerical solution
Constant factors may not be optimal

Future Directions

Extension to non-smooth optimization problems
Investigation of acceleration methods
Application to specific machine learning problems

In-Depth Evaluation

Strengths

Theoretical Breakthrough: Resolves an important open problem in the field
Methodological Innovation: Introduction of quasilinearization technique is pioneering
Weakest Assumptions: Achieves optimal convergence rates under minimal assumptions
Elegant Analysis: Proofs are more direct compared to traditional methods

Weaknesses

Insufficient Experimental Validation: Lacks numerical experiments to verify theoretical results
Limited Application Scenarios: Primarily focuses on theoretical analysis with insufficient practical application demonstrations
Constant Analysis: Does not provide precise estimates of convergence constants

Impact

Theoretical Value: Makes significant contributions to Riemannian optimization theory
Methodological Significance: Quasilinearization technique may influence broader non-Euclidean optimization
Practical Potential: Provides stronger theoretical guarantees for manifold optimization in practical applications

Applicable Scenarios

Manifold-constrained optimization in machine learning
Geodesic problems in computational geometry
Numerical solution of partial differential equations
Equilibrium computation in economics

References

The paper cites 61 related references, primarily including:

Berg & Nikolaev (2008): Original work on quasilinearization
Zhang & Sra (2016): Classical analysis of geodesically convex optimization
Bonnabel (2013): Stochastic gradient descent on Riemannian manifolds
Multiple recent works on Hadamard manifold optimization

Overall Assessment: This is a high-quality theoretical paper that successfully resolves an important open problem in geodesically convex optimization through the introduction of quasilinearization techniques. While lacking numerical experiments, its theoretical contributions are significant, providing new analytical frameworks and tools for non-Euclidean optimization.