2025-11-23T20:28:23.967320

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

Xu, Li

Wasserstein gradient flows have become a central tool for optimization problems over probability measures. A natural numerical approach is forward-Euler time discretization. We show, however, that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence against a smooth target density, forward-Euler can fail dramatically: the scheme does not converge to the gradient flow, despite the fact that the first variation $\nabla\frac{Î´F}{Î´Ï}$ remains formally well defined at every step. We identify the root cause as a loss of regularity induced by the discretization, and prove that a suitable regularization of the functional restores the necessary smoothness, making forward-Euler a viable solver that converges in discrete time to the global minimizer.

academic

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

Basic Information

Paper ID: 2509.13260
Title: Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization
Authors: Yewei Xu, Qin Li (University of Wisconsin-Madison)
Classification: math.NA cs.NA math.OC
Publication Date: 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2509.13260

Abstract

Wasserstein gradient flows have become a fundamental tool for optimization problems on probability measures. Forward Euler time discretization is a natural numerical method. However, this paper proves that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence with respect to a smooth target density, the forward Euler method fails dramatically: the scheme does not converge to the gradient flow, despite the first variation $\nabla\frac{\delta F}{\delta \rho}$ remaining formally well-defined at each step. The authors identify the root cause as regularity loss induced by discretization and prove that appropriate regularization of the functional can restore the necessary smoothness, making forward Euler a viable solver that converges to the global minimum in discrete time.

Research Background and Motivation

Problem Background

Optimization on Probability Measure Spaces: Minimizing functionals $F[\rho]$ on probability measure spaces $P(Ω)$ arises widely in machine learning and statistical physics
Wasserstein Gradient Flows: Analogous to gradient descent in Euclidean spaces, gradient flows under the Wasserstein metric provide a natural framework for probability measure optimization
Numerical Implementation Challenges: Numerical solution of gradient flow PDEs requires time discretization, with forward Euler being the most intuitive choice

Core Problem

Although forward Euler performs well for classical PDEs, does it remain effective for Wasserstein gradient flows? Particularly for fundamental functionals such as KL divergence.

Research Motivation

Forward Euler is widely used in engineering applications due to its simplicity
Existing theoretical analyses focus primarily on implicit methods (e.g., JKO schemes)
Lack of deep understanding of failure mechanisms for explicit methods

Core Contributions

Theoretical Finding: Proves structural incompatibility of forward Euler with Wasserstein gradient flows
Failure Mechanism: Identifies regularity loss as the fundamental cause of method failure
Counterexample Construction: Provides two concrete counterexamples demonstrating qualitative and quantitative failures of forward Euler
Regularization Solution: Proposes regularized KL functional that restores forward Euler's validity
Convergence Guarantees: Proves convergence and error bounds for the regularized method

Methodology Details

Problem Formulation

Consider the optimization problem on probability measure spaces: $\rho_{opt} = \arg\min_{\rho \in P(Ω)} F[\rho]$

The corresponding Wasserstein gradient flow is: $\partial_t \rho_t = \nabla \cdot \left(\rho_t \nabla \frac{\delta F}{\delta \rho}\bigg|_{\rho_t}\right)$

Forward Euler discretization: $\rho^{n+1} = (T_n)_\# \rho^n, \quad T_n(x) = x - h_n \nabla \frac{\delta F}{\delta \rho}\bigg|_{\rho^n}(x)$

Regularity Theory Framework

Three Concepts of Differentiability

First Variation (FV): Derivative in the linear measure space
Wasserstein Differentiability (W-differentiability): Geometric derivative based on W₂ metric
Lions Differentiability (L-differentiability): Derivative defined through random variable lifting

Regularity Hierarchy

$\text{Smooth FV} \Rightarrow \text{Continuous L-differentiability} \Rightarrow \text{W-differentiability}$

Key Observation: $S_F^W \subset S_F^f$ , i.e., there exist $\rho \in S_F^f \setminus S_F^W$ where the first variation is computable but not W-differentiable.

Failure Mechanism Analysis

Regularity Loss Theorem

Theorem 3.4: Let $F[\rho] = KL[\rho|e^{-U}]$ with $U \in C^∞$ . If $\rho_0 = e^{-V_0}$ and $V_0 \in C^{m+2}$ , then after one forward Euler step, $V_1 \in C^m$ , i.e., two derivatives are lost.

Counterexample Construction

Counterexample 1 (Non-injectivity): Target distribution $\rho^* = e^{-U}$ with $U(x) = \frac{x^2}{2} + \frac{x^4}{4}$ , initial distribution as standard Gaussian. Non-injectivity of the pushforward map $T(x) = x - hx^3$ leads to density discontinuities.

Counterexample 2 (Derivative Depletion): Piecewise initial distribution produces jump discontinuities after forward Euler step, with KL divergence bounded below by $> 0.019$ .

Regularization Solution

Regularized KL Functional

$F^ε[\rho] = KL^ε[\rho|\rho^*] = \int_C \left(U(x) + \ln((φ_ε * \rho)(x))\right) d\rho(x)$

where $φ_ε(x) = \exp(-\frac{\|x\|_2^2}{2ε})$ is a Gaussian kernel.

Smoothness Recovery

Theorem 4.3: Under Assumption 4.1, $F^ε$ is both L-differentiable and W-differentiable on $P_2(C)$ , with uniform gradients: $\nabla_W F^ε[\rho] = \partial_ρ F^ε[\rho] = \nabla \frac{\delta F^ε}{\delta \rho}\bigg|_ρ$

Projected Gradient Descent

$\rho^{n+1} = \text{proj}_C\left(\left(\text{Id} - h_n \nabla \frac{\delta F^ε}{\delta \rho}\bigg|_{\rho^n}\right)_\# \rho^n\right)$

Experimental Setup

Theoretical Verification Experiments

Counterexample 2 Numerical Verification: Computing KL divergence evolution using explicit formulas
Step Size Independence: Testing three step sizes $h = 0.1, 0.01, 0.001$
Convergence Lower Bound: Verifying theoretical lower bound of 0.019

Regularized Method Experiments

Computational Domain: Ball domain $C = B_3(0) \subset \mathbb{R}^2$
Target Potential: Correlated quadratic form $U(x) = \frac{1}{2}x^⊤Ax$
Particle Count: $N = 2000$
Regularization Parameter: $ε = 0.1$
Step Size: $h = 0.05$ , 100 iterations

Experimental Results

Forward Euler Failure Verification

KL divergence exhibits nearly identical behavior across different step sizes, confirming step-size-independent failure
Numerical results align with theoretical lower bound of 0.019
Confirms structural failure of forward Euler

Regularized Method Performance

Energy decreases monotonically, consistent with theoretical predictions
Initial exponential convergence validates strong convexity property
Particle distribution successfully converges to target distribution
Method remains within constraint domain

Key Findings

Regularity loss is the fundamental cause of forward Euler failure, not numerical error
Regularization effectively restores necessary smoothness
Projected gradient descent performs stably on bounded domains

Wasserstein Gradient Flow Theory

Foundational Theory: Pioneering work by Ambrosio-Gigli-Savaré establishing theoretical framework
Implicit Methods: JKO scheme and its Γ-convergence properties
Explicit Methods: λ-dissipative framework by Cavagnari-Savaré-Sodini

Numerical Methods

Particle Methods: Interacting particle systems and ensemble methods
Blob Methods: Density estimation techniques related to this paper's regularization scheme
Variational Methods: Numerical solution based on optimal transport

Positioning of This Work

This paper fills the gap in theoretical analysis of explicit methods, particularly providing deep understanding of forward Euler failure mechanisms.

Conclusions and Discussion

Main Conclusions

Forward Euler method exhibits structural incompatibility with Wasserstein gradient flows
Regularity loss is the fundamental cause of failure
Appropriate functional regularization can restore method validity

Limitations

Discretization Error: Rigorous O(h) accuracy error analysis remains to be established
Regularization Parameter: Relationship between minimizers of $F^ε$ and original KL requires further investigation
Convexity Loss: Regularization may destroy geodesic convexity of original functional

Future Directions

Establish complete error analysis for regularized method
Study convergence as regularization parameter $ε \to 0$
Extend to more general classes of functionals

In-Depth Evaluation

Strengths

Theoretical Depth: Deeply reveals essential mechanisms of numerical method failure
Counterexample Construction: Provides concrete, verifiable failure cases
Solution Provided: Not only identifies problems but provides effective solutions
Mathematical Rigor: Rigorous theoretical analysis with complete proofs

Weaknesses

Practical Limitations: Regularized method primarily applicable to bounded domains
Parameter Selection: Lacks guidance for choosing regularization parameters
Computational Complexity: Does not discuss additional computational costs from regularization

Impact

Theoretical Contribution: Provides important theoretical insights for numerical methods of Wasserstein gradient flows
Practical Value: Offers solutions to numerical stability issues in practical applications
Methodology: Establishes theoretical framework for analyzing such problems

Applicable Scenarios

Probability measure optimization problems
Distribution learning in machine learning
Non-equilibrium evolution in statistical physics
Optimal transport applications in image processing and computer vision

References

This paper cites 41 relevant references covering important works in optimal transport theory, Wasserstein gradient flows, numerical analysis and other fields, providing solid theoretical foundation for the research.

Technical Summary:

Central role of regularity in Wasserstein gradient flows
Structural limitations of forward Euler method
Effectiveness of Gaussian regularization
Convergence guarantees of projected gradient descent