2025-11-17T22:04:13.678417

A Stochastic Algorithm for Searching Saddle Points with Convergence Guarantee

Shi, Zhang, Du

Saddle points provide a hierarchical view of the energy landscape, revealing transition pathways and interconnected basins of attraction, and offering insight into the global structure, metastability, and possible collective mechanisms of the underlying system. In this work, we propose a stochastic saddle-search algorithm to circumvent exact derivative and Hessian evaluations that have been used in implementing traditional and deterministic saddle dynamics. At each iteration, the algorithm uses a stochastic eigenvector-search method, based on a stochastic Hessian, to approximate the unstable directions, followed by a stochastic gradient update with reflections in the approximate unstable direction to advance toward the saddle point. We carry out rigorous numerical analysis to establish the almost sure convergence for the stochastic eigenvector search and local almost sure convergence with an $O(1/n)$ rate for the saddle search, and present a theoretical guarantee to ensure the high-probability identification of the saddle point when the initial point is sufficiently close. Numerical experiments, including the application to a neural network loss landscape and a Landau-de Gennes type model for nematic liquid crystal, demonstrate the practical applicability and the ability for escaping from "bad" areas of the algorithm.

academic

A Stochastic Algorithm for Searching Saddle Points with Convergence Guarantee

Basic Information

Paper ID: 2510.14144
Title: A Stochastic Algorithm for Searching Saddle Points with Convergence Guarantee
Authors: Baoming Shi (Columbia University), Lei Zhang (Peking University), Qiang Du (Columbia University)
Classification: math.NA, cs.NA (Numerical Analysis)
Publication Date: October 15, 2024
Paper Link: https://arxiv.org/abs/2510.14144

Abstract

Saddle points provide a hierarchical perspective on energy landscapes, revealing transition pathways and interconnected basins of attraction, offering insights into understanding the global structure, metastability, and possible collective mechanisms of systems. This paper proposes a stochastic saddle point search algorithm that avoids exact derivative and Hessian matrix evaluations required in traditional deterministic saddle point dynamics. The algorithm employs a stochastic eigenvector search method based on random Hessian at each iteration to approximate unstable directions, then advances toward the saddle point through stochastic gradient updates with reflections along the approximate unstable directions. The authors conduct rigorous numerical analysis, establishing almost sure convergence of the stochastic eigenvector search and local almost sure convergence of saddle point search with convergence rate O(1/n), and provide theoretical guarantees to ensure high-probability identification of saddle points when the initial point is sufficiently close.

Research Background and Motivation

Problem Background

Saddle point search is of significant importance in multiple scientific domains, including:

Materials Science and Chemistry: Understanding critical nucleation in phase transitions and transition pathways
Liquid Crystal Physics: Analyzing defect configurations
Biology: Protein folding research
Deep Learning: Neural network loss landscape analysis

Limitations of Existing Methods

Traditional saddle point search algorithms fall into two main categories:

Path-Finding Methods: Such as the string method, searching for minimum energy paths
Surface Walking Methods: Such as gentlest ascent dynamics, dimer method, and high-index saddle point dynamics (HiSD)

The main limitations of these methods include:

Requiring exact computation of gradients and Hessian matrices, which is computationally expensive
Gradients/Hessians being unavailable or difficult to obtain in certain applications
Lacking rigorous theoretical analysis for stochastic versions

Research Motivation

This paper aims to develop a stochastic saddle point search algorithm that can:

Avoid exact derivative and Hessian evaluations
Provide rigorous convergence theoretical guarantees
Demonstrate good performance and escape capability in practical applications

Core Contributions

First Proposal of a stochastic saddle point search algorithm with convergence guarantees, filling a theoretical gap in this field
Establishment of a Complete Theoretical Framework:
- Almost sure convergence of stochastic eigenvector search
- Local almost sure convergence of saddle point search with convergence rate O(1/n)
- Theoretical guarantees for high-probability convergence
Provision of Multiple Convergence Results:
- Global convergence when unstable space is known
- Local convergence when unstable space is unknown
- Convergence analysis under inexact eigenvector computation
Verification of Algorithm Practicality: Demonstrating algorithm effectiveness through practical applications such as neural network loss landscapes and liquid crystal models

Method Details

Problem Definition

Given an objective function $f(x): \mathbb{R}^d \to \mathbb{R}$ , find its index-k saddle point $x^*$ satisfying:

$\nabla f(x^*) = 0$
$\nabla^2 f(x^*)$ has k negative eigenvalues and (d-k) positive eigenvalues

Algorithm Architecture

1. Case with Known Unstable Space

For problems with convex-concave structure: $\min_{x_{V^⊥} \in V^⊥} \max_{x_V \in V} f(x_V + x_{V^⊥})$

The stochastic saddle point dynamics are:

x_V(n+1) = x_V(n) + \alpha(n)P_V\nabla f(x_V(n) + x_{V^⊥}(n);\omega(n)) \\ x_{V^⊥}(n+1) = x_{V^⊥}(n) - \alpha(n)(I-P_V)\nabla f(x_V(n) + x_{V^⊥}(n);\omega(n)) \end{cases}$$ where $P_V = \sum_{i=1}^k v_i v_i^T$ is the orthogonal projection onto the unstable subspace V. #### 2. Case with Unknown Unstable Space The algorithm contains two main components: **Stochastic Eigenvector Search**: $$\hat{v}(n+1) = v(n) - \alpha(n)(I-v(n)v(n)^T)H(\omega(n))v(n)$$ $$v(n+1) = \frac{\hat{v}(n+1)}{\|\hat{v}(n+1)\|_2}$$ **Stochastic Saddle Point Update**: $$x(n+1) = x(n) - \alpha(n)P_{\tilde{V}}(x(n))\nabla f(x(n);\omega(n))$$ where $P_{\tilde{V}} = I - 2\sum_{i=1}^k \tilde{v}_i\tilde{v}_i^T$, and $\{\tilde{v}_i\}$ are approximate unstable eigenvectors. ### Technical Innovations 1. **Stochastic Eigenvector Search**: Extends classical stochastic PCA methods to handle repeated negative eigenvalues 2. **Projection Operator Design**: Cleverly combines ascent and descent directions to achieve saddle point search 3. **Theoretical Analysis Framework**: Establishes a complete theoretical system for convergence of stochastic algorithms 4. **Fault Tolerance**: Algorithm is robust to inexact eigenvector computations ## Experimental Setup ### Datasets and Test Problems 1. **Müller-Brown Potential**: Two-dimensional chemical potential function, standard saddle point search benchmark 2. **Butterfly Energy Landscape**: Tests algorithm's ability to escape "bad" regions 3. **Neural Network Loss Landscape**: Linear neural networks, depth H=5, dimensions dx=10, dy=4 4. **Landau-de Gennes Energy Functional**: Nematic liquid crystal model, discretized via finite differences ### Evaluation Metrics - Convergence error: $\|x(n) - x^*\|_2^2$ - Gradient norm: $\|\nabla f(x(n))\|_2^2$ - Convergence rate verification ### Implementation Details - Step size strategy: $\alpha(n) = \gamma/(n+m)^p$, where $p \in (1/2, 1]$ - Stochastic gradient: Gaussian perturbation $\nabla f(x;\omega) = \nabla f(x) + \sigma\xi$, $\xi \sim N(0,I)$ - Tolerance settings: $\epsilon_v$ for eigenvector search, $\epsilon_x$ for saddle point search ## Experimental Results ### Main Results #### Müller-Brown Potential Experiment - Using decaying step size $\alpha(n) = 0.01/(n+100)$, the algorithm converges to the target saddle point - From iteration $10^2$ to $10^5$, error decreases from $10^{-3}$ to $10^{-6}$, verifying O(1/n) convergence rate - Constant step size leads to oscillations without exact convergence #### Butterfly Energy Landscape - Stochastic algorithm successfully escapes the basin boundary that deterministic algorithms cannot cross - Demonstrates the ability of stochastic noise to help the algorithm explore a broader space #### Neural Network Loss Landscape - Successfully locates degenerate saddle points with 16 negative eigenvalues - Performs well across different dataset scales (N=100 and N=10000) - Verifies algorithm effectiveness in high-dimensional degenerate cases #### Landau-de Gennes Model - Successfully finds index-1 boundary twist saddle point connecting two stable diagonal states - Observes empirical convergence rate faster than theoretical O(1/n) - Demonstrates practical benefits of variance reduction effects ### Convergence Verification All experiments verify the theoretically predicted O(1/n) convergence rate, with some cases exhibiting faster convergence due to variance reduction effects. ## Theoretical Analysis ### Convergence Theorems #### Theorem 1: Global Convergence with Known Unstable Space Under strong convex-concave assumptions, the stochastic saddle point search algorithm converges almost surely to the unique saddle point. #### Theorem 2: Stochastic Eigenvector Search Convergence Under appropriate assumptions, the limit point of stochastic eigenvector search almost surely lies in the eigenspace of the Hessian matrix. #### Theorem 3: Local High-Probability Convergence When the initial point is sufficiently close to the target saddle point and step size is sufficiently small, the algorithm converges to the saddle point with high probability at convergence rate O(1/n). ### Key Assumptions 1. **Regularity Assumption**: $\nabla f$ is Lipschitz continuous and bounded 2. **Unbiasedness Assumption**: $E[\nabla f(x,\omega)] = \nabla f(x)$ 3. **Local Property Assumption**: Hessian eigenvalues satisfy gap conditions in the neighborhood of the saddle point ## Related Work ### Deterministic Saddle Point Search Methods - **String Method**: Searches for minimum energy paths - **Dimer Method**: Uses two-point approximation to estimate unstable directions - **High-Index Saddle Point Dynamics (HiSD)**: Simultaneously searches for multiple unstable directions ### Stochastic Optimization Theory - **Stochastic Gradient Descent (SGD)**: Primarily focuses on minimization problems - **Stochastic PCA Methods**: Stochastic approximation of principal component analysis - **Saddle Point Escape Theory**: Theoretical analysis of SGD avoiding saddle points ### Innovations of This Paper 1. First to provide rigorous convergence analysis for stochastic saddle point search 2. Addresses the challenging problem of unknown unstable directions 3. Establishes a complete theoretical framework from local to global convergence ## Conclusions and Discussion ### Main Conclusions 1. Proposes the first stochastic saddle point search algorithm with convergence guarantees 2. Establishes complete convergence theory from global to local convergence 3. Verifies algorithm effectiveness in multiple practical applications 4. Demonstrates advantages of stochasticity in escaping "bad" regions ### Limitations 1. **Local Convergence**: Only guarantees local convergence for general objective functions 2. **Initial Condition Requirements**: Requires initial point sufficiently close to target saddle point 3. **Parameter Tuning**: Requires careful selection of step size and tolerance parameters 4. **Computational Complexity**: Although avoiding exact Hessian computation, still requires multiple eigenvector searches ### Future Directions 1. **Nonlinear Constraints**: Extension to saddle point search on manifolds 2. **Convergence Rate Improvement**: Study adaptive step sizes and variance reduction techniques 3. **Global Convergence**: Explore global convergence in more general cases 4. **Parallelization**: Develop parallel versions for ultra-high-dimensional problems ## In-Depth Evaluation ### Strengths 1. **Outstanding Theoretical Contribution**: Fills the gap in theoretical analysis of stochastic saddle point search 2. **Clever Method Design**: Ingeniously combines stochastic eigenvector search and gradient reflection 3. **Rigorous and Complete Analysis**: Complete theoretical system from simple to complex cases 4. **Sufficient Experimental Verification**: Covers practical applications across multiple domains 5. **Clear Writing**: Clear logical structure and accurate mathematical exposition ### Weaknesses 1. **Limited Practicality**: Local convergence property restricts algorithm applicability 2. **Parameter Sensitivity**: Algorithm performance is relatively sensitive to parameter selection 3. **Computational Overhead**: Eigenvector search still incurs certain computational cost 4. **Convergence Radius**: Theoretical convergence radius may be relatively small ### Impact 1. **Academic Value**: Establishes foundation for stochastic saddle point search theory 2. **Application Prospects**: Has application potential in machine learning, materials science, and other fields 3. **Methodological Contribution**: Provides theoretical framework for analyzing stochastic saddle point algorithms 4. **Follow-up Research**: Provides foundation for further improvements and extensions ### Applicable Scenarios 1. **High-Dimensional Optimization**: Saddle point analysis in neural network training 2. **Physical Simulation**: Phase transition research in materials science 3. **Chemical Computation**: Molecular reaction pathway calculation 4. **Engineering Applications**: Critical point analysis in structural optimization ## References The paper cites 75 relevant references, covering important works in saddle point search, stochastic optimization, numerical analysis, and other domains, providing a solid theoretical foundation for the research. --- **Overall Evaluation**: This is a high-quality theoretical numerical analysis paper that provides for the first time rigorous convergence analysis for stochastic saddle point search. While limited by local convergence, its theoretical contributions and methodological innovations possess significant academic value and application prospects.