2025-11-23T20:28:23.967320

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

Xu, Li
Wasserstein gradient flows have become a central tool for optimization problems over probability measures. A natural numerical approach is forward-Euler time discretization. We show, however, that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence against a smooth target density, forward-Euler can fail dramatically: the scheme does not converge to the gradient flow, despite the fact that the first variation $\nabla\frac{δF}{δρ}$ remains formally well defined at every step. We identify the root cause as a loss of regularity induced by the discretization, and prove that a suitable regularization of the functional restores the necessary smoothness, making forward-Euler a viable solver that converges in discrete time to the global minimizer.
academic

Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization

Basic Information

  • Paper ID: 2509.13260
  • Title: Forward Euler for Wasserstein Gradient Flows: Breakdown and Regularization
  • Authors: Yewei Xu, Qin Li (University of Wisconsin-Madison)
  • Classification: math.NA cs.NA math.OC
  • Publication Date: 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2509.13260

Abstract

Wasserstein gradient flows have become a fundamental tool for optimization problems on probability measures. Forward Euler time discretization is a natural numerical method. However, this paper proves that even in the simple case where the energy functional is the Kullback-Leibler (KL) divergence with respect to a smooth target density, the forward Euler method fails dramatically: the scheme does not converge to the gradient flow, despite the first variation δFδρ\nabla\frac{\delta F}{\delta \rho} remaining formally well-defined at each step. The authors identify the root cause as regularity loss induced by discretization and prove that appropriate regularization of the functional can restore the necessary smoothness, making forward Euler a viable solver that converges to the global minimum in discrete time.

Research Background and Motivation

Problem Background

  1. Optimization on Probability Measure Spaces: Minimizing functionals F[ρ]F[\rho] on probability measure spaces P(Ω)P(Ω) arises widely in machine learning and statistical physics
  2. Wasserstein Gradient Flows: Analogous to gradient descent in Euclidean spaces, gradient flows under the Wasserstein metric provide a natural framework for probability measure optimization
  3. Numerical Implementation Challenges: Numerical solution of gradient flow PDEs requires time discretization, with forward Euler being the most intuitive choice

Core Problem

Although forward Euler performs well for classical PDEs, does it remain effective for Wasserstein gradient flows? Particularly for fundamental functionals such as KL divergence.

Research Motivation

  • Forward Euler is widely used in engineering applications due to its simplicity
  • Existing theoretical analyses focus primarily on implicit methods (e.g., JKO schemes)
  • Lack of deep understanding of failure mechanisms for explicit methods

Core Contributions

  1. Theoretical Finding: Proves structural incompatibility of forward Euler with Wasserstein gradient flows
  2. Failure Mechanism: Identifies regularity loss as the fundamental cause of method failure
  3. Counterexample Construction: Provides two concrete counterexamples demonstrating qualitative and quantitative failures of forward Euler
  4. Regularization Solution: Proposes regularized KL functional that restores forward Euler's validity
  5. Convergence Guarantees: Proves convergence and error bounds for the regularized method

Methodology Details

Problem Formulation

Consider the optimization problem on probability measure spaces: ρopt=argminρP(Ω)F[ρ]\rho_{opt} = \arg\min_{\rho \in P(Ω)} F[\rho]

The corresponding Wasserstein gradient flow is: tρt=(ρtδFδρρt)\partial_t \rho_t = \nabla \cdot \left(\rho_t \nabla \frac{\delta F}{\delta \rho}\bigg|_{\rho_t}\right)

Forward Euler discretization: ρn+1=(Tn)#ρn,Tn(x)=xhnδFδρρn(x)\rho^{n+1} = (T_n)_\# \rho^n, \quad T_n(x) = x - h_n \nabla \frac{\delta F}{\delta \rho}\bigg|_{\rho^n}(x)

Regularity Theory Framework

Three Concepts of Differentiability

  1. First Variation (FV): Derivative in the linear measure space
  2. Wasserstein Differentiability (W-differentiability): Geometric derivative based on W₂ metric
  3. Lions Differentiability (L-differentiability): Derivative defined through random variable lifting

Regularity Hierarchy

Smooth FVContinuous L-differentiabilityW-differentiability\text{Smooth FV} \Rightarrow \text{Continuous L-differentiability} \Rightarrow \text{W-differentiability}

Key Observation: SFWSFfS_F^W \subset S_F^f, i.e., there exist ρSFfSFW\rho \in S_F^f \setminus S_F^W where the first variation is computable but not W-differentiable.

Failure Mechanism Analysis

Regularity Loss Theorem

Theorem 3.4: Let F[ρ]=KL[ρeU]F[\rho] = KL[\rho|e^{-U}] with UCU \in C^∞. If ρ0=eV0\rho_0 = e^{-V_0} and V0Cm+2V_0 \in C^{m+2}, then after one forward Euler step, V1CmV_1 \in C^m, i.e., two derivatives are lost.

Counterexample Construction

Counterexample 1 (Non-injectivity): Target distribution ρ=eU\rho^* = e^{-U} with U(x)=x22+x44U(x) = \frac{x^2}{2} + \frac{x^4}{4}, initial distribution as standard Gaussian. Non-injectivity of the pushforward map T(x)=xhx3T(x) = x - hx^3 leads to density discontinuities.

Counterexample 2 (Derivative Depletion): Piecewise initial distribution produces jump discontinuities after forward Euler step, with KL divergence bounded below by >0.019> 0.019.

Regularization Solution

Regularized KL Functional

Fε[ρ]=KLε[ρρ]=C(U(x)+ln((φερ)(x)))dρ(x)F^ε[\rho] = KL^ε[\rho|\rho^*] = \int_C \left(U(x) + \ln((φ_ε * \rho)(x))\right) d\rho(x)

where φε(x)=exp(x222ε)φ_ε(x) = \exp(-\frac{\|x\|_2^2}{2ε}) is a Gaussian kernel.

Smoothness Recovery

Theorem 4.3: Under Assumption 4.1, FεF^ε is both L-differentiable and W-differentiable on P2(C)P_2(C), with uniform gradients: WFε[ρ]=ρFε[ρ]=δFεδρρ\nabla_W F^ε[\rho] = \partial_ρ F^ε[\rho] = \nabla \frac{\delta F^ε}{\delta \rho}\bigg|_ρ

Projected Gradient Descent

ρn+1=projC((IdhnδFεδρρn)#ρn)\rho^{n+1} = \text{proj}_C\left(\left(\text{Id} - h_n \nabla \frac{\delta F^ε}{\delta \rho}\bigg|_{\rho^n}\right)_\# \rho^n\right)

Experimental Setup

Theoretical Verification Experiments

  • Counterexample 2 Numerical Verification: Computing KL divergence evolution using explicit formulas
  • Step Size Independence: Testing three step sizes h=0.1,0.01,0.001h = 0.1, 0.01, 0.001
  • Convergence Lower Bound: Verifying theoretical lower bound of 0.019

Regularized Method Experiments

  • Computational Domain: Ball domain C=B3(0)R2C = B_3(0) \subset \mathbb{R}^2
  • Target Potential: Correlated quadratic form U(x)=12xAxU(x) = \frac{1}{2}x^⊤Ax
  • Particle Count: N=2000N = 2000
  • Regularization Parameter: ε=0.1ε = 0.1
  • Step Size: h=0.05h = 0.05, 100 iterations

Experimental Results

Forward Euler Failure Verification

  • KL divergence exhibits nearly identical behavior across different step sizes, confirming step-size-independent failure
  • Numerical results align with theoretical lower bound of 0.019
  • Confirms structural failure of forward Euler

Regularized Method Performance

  • Energy decreases monotonically, consistent with theoretical predictions
  • Initial exponential convergence validates strong convexity property
  • Particle distribution successfully converges to target distribution
  • Method remains within constraint domain

Key Findings

  1. Regularity loss is the fundamental cause of forward Euler failure, not numerical error
  2. Regularization effectively restores necessary smoothness
  3. Projected gradient descent performs stably on bounded domains

Wasserstein Gradient Flow Theory

  • Foundational Theory: Pioneering work by Ambrosio-Gigli-Savaré establishing theoretical framework
  • Implicit Methods: JKO scheme and its Γ-convergence properties
  • Explicit Methods: λ-dissipative framework by Cavagnari-Savaré-Sodini

Numerical Methods

  • Particle Methods: Interacting particle systems and ensemble methods
  • Blob Methods: Density estimation techniques related to this paper's regularization scheme
  • Variational Methods: Numerical solution based on optimal transport

Positioning of This Work

This paper fills the gap in theoretical analysis of explicit methods, particularly providing deep understanding of forward Euler failure mechanisms.

Conclusions and Discussion

Main Conclusions

  1. Forward Euler method exhibits structural incompatibility with Wasserstein gradient flows
  2. Regularity loss is the fundamental cause of failure
  3. Appropriate functional regularization can restore method validity

Limitations

  1. Discretization Error: Rigorous O(h) accuracy error analysis remains to be established
  2. Regularization Parameter: Relationship between minimizers of FεF^ε and original KL requires further investigation
  3. Convexity Loss: Regularization may destroy geodesic convexity of original functional

Future Directions

  1. Establish complete error analysis for regularized method
  2. Study convergence as regularization parameter ε0ε \to 0
  3. Extend to more general classes of functionals

In-Depth Evaluation

Strengths

  1. Theoretical Depth: Deeply reveals essential mechanisms of numerical method failure
  2. Counterexample Construction: Provides concrete, verifiable failure cases
  3. Solution Provided: Not only identifies problems but provides effective solutions
  4. Mathematical Rigor: Rigorous theoretical analysis with complete proofs

Weaknesses

  1. Practical Limitations: Regularized method primarily applicable to bounded domains
  2. Parameter Selection: Lacks guidance for choosing regularization parameters
  3. Computational Complexity: Does not discuss additional computational costs from regularization

Impact

  1. Theoretical Contribution: Provides important theoretical insights for numerical methods of Wasserstein gradient flows
  2. Practical Value: Offers solutions to numerical stability issues in practical applications
  3. Methodology: Establishes theoretical framework for analyzing such problems

Applicable Scenarios

  • Probability measure optimization problems
  • Distribution learning in machine learning
  • Non-equilibrium evolution in statistical physics
  • Optimal transport applications in image processing and computer vision

References

This paper cites 41 relevant references covering important works in optimal transport theory, Wasserstein gradient flows, numerical analysis and other fields, providing solid theoretical foundation for the research.


Technical Summary:

  • Central role of regularity in Wasserstein gradient flows
  • Structural limitations of forward Euler method
  • Effectiveness of Gaussian regularization
  • Convergence guarantees of projected gradient descent