Given two measures $μ$, $ν$ on Rd that satisfy Carleman's condition, we provide a numerical scheme to approximate as closely as desired the total variation distance between $μ$ and $ν$. It consists of solving a sequence (hierarchy) of convex relaxations whose associated sequence of optimal values converges to the total variation distance, an additional illustration of the versatility of the Moment-SOS hierarchy. Indeed each relaxation in the hierarchy is a semidefinite program whose size increases with the number of involved moments. It has an optimal solution which is a couple of degree-2n pseudo-moments which converge, as n grows, to moments of the Hahn-Jordan decomposition of $μ$-$ν$.
- Paper ID: 2401.01086
- Title: A hierarchy of convex relaxations for the total variation distance
- Author: Jean B. Lasserre
- Classification: math.OC (Optimization and Control)
- Publication Date: January 2024 (arXiv v3: October 16, 2025)
- Paper Link: https://arxiv.org/abs/2401.01086
This paper proposes a numerical scheme to approximate arbitrarily accurately the total variation distance between two measures μ and ν satisfying the Carleman condition. The scheme solves a sequence of (hierarchical) convex relaxation problems whose optimal values converge to the total variation distance, further demonstrating the versatility of the Moment-SOS hierarchy. Each relaxation in the hierarchy is a semidefinite programming problem whose size increases with the number of moments involved, with optimal solutions—a pair of pseudo-moments of degree 2n—that converge to the moments of the Hahn-Jordan decomposition of μ-ν as n grows.
Numerical computation of the total variation (TV) distance is an important and challenging problem with broad applications in:
- Statistical Testing: Used for homogeneity and independence testing
- Distributionally Robust Optimization: Defining uncertainty sets
- Data Science and Machine Learning: Distance metrics between measures
- Empirical Estimator Issues: Empirical estimators based on random samples are often inconsistent, particularly for TV distance
- Computational Challenges: TV distance is equivalent to Wasserstein distance with a "bad" cost function c(x,y) = 1_{x≠y}, which is difficult to compute efficiently
- Large Function Space: The space of bounded measurable functions in the standard dual formulation is too large for effective evaluation
- Compact Support Restriction: Existing methods typically require compact support
Existing contributions focus mainly on providing analytical bounds for specific distribution classes, lacking a general numerical computation method. This paper aims to provide a practical computational scheme under relatively weak assumptions.
- Numerical Computation Scheme: Proposes a sequence of semidefinite programming relaxations based on the Moment-SOS hierarchy that can approximate TV distance with arbitrary precision
- Theoretical Guarantees: Proves monotonicity and convergence of the relaxation sequence, with optimal values converging to the true TV distance from below
- Non-compact Support Handling: The method applies to measures with non-compact support, requiring only the Carleman condition
- Exact Solutions for Special Cases: For atomic measures on the real line, exact solutions are obtained when the relaxation degree n ≥ maxm₁,m₂
- Dual Theory: Provides dual semidefinite programming formulations, revealing how to strengthen the classical TV distance dual formula by restricting to polynomials and adding penalty terms
Given two finite Borel measures μ, ν ∈ M(ℝᵈ)₊, compute their total variation distance:
∥μ−ν∥TV=supf{∫fdμ−∫fdν:∥f∥∞≤1}
Assumption 2.5:
- All moments of μ and ν are finite
- μ and ν satisfy the condition: ∫exp(c∣xi∣)dμ<∞, for some c > 0 and all i = 1,...,d
Reformulate TV distance as an infinite-dimensional LP:
τ=infϕ+,ϕ−∈M(Rd)+{ϕ+(1)+ϕ−(1):ϕ+−ϕ−=μ−ν}
Add domination constraints to obtain an equivalent problem:
ρ=infϕ+,ϕ−∈M(Rd)+{ϕ+(1)+ϕ−(1):ϕ+−ϕ−=μ−ν;ϕ+≤μ;ϕ−≤ν}
Using Lemma 2.2, domination constraints are equivalent to moment matrix conditions:
ϕ≤μ⇔Mn(ϕ)⪯Mn(μ),∀n∈N
The n-th level relaxation problem:
ρn=minϕ,ψ{ϕ(1)+ψ(1):ϕα−ψα=μα−να,∀α∈N2nd;0⪯Mn(ϕ)⪯Mn(μ);0⪯Mn(ψ)⪯Mn(ν)}
- Critical Role of Domination Constraints: Although redundant in the infinite-dimensional formulation, they serve as a crucial compactification tool in the relaxation scheme
- Clever Use of Carleman Condition: Ensures measure uniqueness, making moment constraints equivalent to domination constraints
- Natural Emergence of Hahn-Jordan Decomposition: Optimal solutions converge to the moments of the Hahn-Jordan decomposition of μ-ν
- Polynomial Restriction in Dual Problem: Handles the ‖f‖∞ ≤ 1 constraint by restricting to polynomial space and adding penalty terms
- Software: GloptiPoly 3 for polynomial optimization
- Solver: SeDuMi 1.03 semidefinite programming solver
- Platform: HP Elitebook Ubuntu 24 notebook
- Disjoint supports: X = {-1.0, 0.0, 1.0, 2.0}, Y = {-0.7, 0.3, 1.3, 2.3}
- One common point: X ∩ Y = {-1.0}
- Near-atomic measures: Testing numerical stability
- Gaussian distributions N(m₁,σ₁) and N(m₂,σ₂) with different means and variances
- Moment numbers from 2n = 4 to 2n = 8
- Proximity of relaxation optimal values ρₙ to the true TV distance
- Convergence rate and numerical stability
- Computation time (all results completed within 0.35 seconds)
For atomic measures on the real line, exact solutions are obtained when n ≥ maxm₁,m₂:
- Example 1: μ = δ₀, ν = δₑ, ρ₁ = 2 (exact)
- Example 2: Four atoms with one common point, ρ₄ = 1.499 ≈ 1.5
- Example 3: Disjoint atoms, ρ₄ = 1.9999 ≈ 2
| (m₁,σ₁) | (m₂,σ₂) | ρ₁ | ρ₂ | ρ₃ | ρ₄ |
|---|
| (0,0.1) | (1,0.1) | 1.9231 | 1.9936 | 1.9991 | 1.9997 |
| (0,0.2) | (1,0.2) | 1.7241 | 1.9049 | 1.9376 | 1.939 |
| (0,0.5) | (1,0.5) | 1.0000 | 1.0000 | 1.1653 | 1.1897 |
- ρ₁ is completely consistent with analytical lower bounds in reference 1
- Significant improvement starting from n=2, with better results at n=3,4
- Behavior approaches mutually singular measures with small variance (TV distance close to 2)
Theorem 3.1 guarantees:
- Each relaxation has an optimal solution
- ρₙ ↑ ‖μ-ν‖_ converges monotonically
- Optimal solutions converge to the moments of the Hahn-Jordan decomposition
- Empirical Estimators: Distance estimation based on samples, but TV distance estimators are often inconsistent
- Analytical Bounds: Providing upper and lower bounds for specific distribution classes (e.g., high-dimensional Gaussians, Gaussian mixtures)
- Inequality Methods: Pinsker inequality, Hellinger distance bounds
- Optimal Transport: Specialized algorithms for Kantorovich metrics (e.g., Sinkhorn algorithm)
- Generality: Applies to general measures satisfying the Carleman condition
- Non-compact Support: Does not require compact support
- Guaranteed Convergence: Theoretical guarantee of monotonic convergence
- Practicality: Can handle both closed-form moments and empirical moments
- Provides a general numerical scheme for TV distance computation
- Achieves arbitrary precision approximation under relatively weak assumptions
- Each relaxation provides a guaranteed lower bound
- Exact solutions are obtainable for discrete measures
- Dimension Sensitivity: The method is sensitive to dimension, currently limited to small-dimensional problems
- Semidefinite Programming Solver Limitations: Cannot be expected to solve highly relaxed problems
- Numerical Precision: May encounter numerical issues when atoms are too close
- Sample Size Requirements: Requires sufficiently large samples when using empirical moments
- Provide computationally cheaper alternative lower bounds
- Extend to high-dimensional problem handling
- Improve numerical stability
- Comparative studies with other distance metrics
- Theoretical Rigor: Complete convergence proofs and dual theory
- Method Novelty: First application of Moment-SOS hierarchy to TV distance computation
- Practical Value: Can handle both closed-form and sample-based data
- Exactness Guarantees: Exact solutions obtainable for specific cases (atomic measures)
- Computational Complexity: Semidefinite programming computational complexity limits application scale
- Limited Experiments: Testing primarily on low-dimensional and simple distributions
- Insufficient Comparison with Existing Methods: Lacks systematic comparison with other TV distance computation methods
- Theoretical Contribution: Provides a new theoretical framework for TV distance computation
- Methodological Value: Demonstrates the potential of Moment-SOS hierarchy in probabilistic metric computation
- Practical Application: Provides new tools for fields such as distributionally robust optimization
- Exact Computation Requirements: Need for theoretically guaranteed TV distance lower bounds
- Low-Dimensional Problems: Applications with relatively low dimensions
- Special Distributions: Gaussian, exponential distributions and their mixtures
- Discrete Distributions: Atomic measures with finite support
The paper cites 28 related references covering multiple fields including optimal transport, moment problems, semidefinite programming, and probabilistic metrics. Particularly, the author's own series of works on the Moment-SOS hierarchy 21,24,26 form the theoretical foundation of this paper.