2025-11-22T06:46:16.153487

A hierarchy of convex relaxations for the total variation distance

Lasserre

Given two measures $Î¼$, $Î½$ on Rd that satisfy Carleman's condition, we provide a numerical scheme to approximate as closely as desired the total variation distance between $Î¼$ and $Î½$. It consists of solving a sequence (hierarchy) of convex relaxations whose associated sequence of optimal values converges to the total variation distance, an additional illustration of the versatility of the Moment-SOS hierarchy. Indeed each relaxation in the hierarchy is a semidefinite program whose size increases with the number of involved moments. It has an optimal solution which is a couple of degree-2n pseudo-moments which converge, as n grows, to moments of the Hahn-Jordan decomposition of $Î¼$-$Î½$.

academic

A hierarchy of convex relaxations for the total variation distance

Basic Information

Paper ID: 2401.01086
Title: A hierarchy of convex relaxations for the total variation distance
Author: Jean B. Lasserre
Classification: math.OC (Optimization and Control)
Publication Date: January 2024 (arXiv v3: October 16, 2025)
Paper Link: https://arxiv.org/abs/2401.01086

Abstract

This paper proposes a numerical scheme to approximate arbitrarily accurately the total variation distance between two measures μ and ν satisfying the Carleman condition. The scheme solves a sequence of (hierarchical) convex relaxation problems whose optimal values converge to the total variation distance, further demonstrating the versatility of the Moment-SOS hierarchy. Each relaxation in the hierarchy is a semidefinite programming problem whose size increases with the number of moments involved, with optimal solutions—a pair of pseudo-moments of degree 2n—that converge to the moments of the Hahn-Jordan decomposition of μ-ν as n grows.

Research Background and Motivation

Problem Importance

Numerical computation of the total variation (TV) distance is an important and challenging problem with broad applications in:

Statistical Testing: Used for homogeneity and independence testing
Distributionally Robust Optimization: Defining uncertainty sets
Data Science and Machine Learning: Distance metrics between measures

Limitations of Existing Methods

Empirical Estimator Issues: Empirical estimators based on random samples are often inconsistent, particularly for TV distance
Computational Challenges: TV distance is equivalent to Wasserstein distance with a "bad" cost function c(x,y) = 1_{x≠y}, which is difficult to compute efficiently
Large Function Space: The space of bounded measurable functions in the standard dual formulation is too large for effective evaluation
Compact Support Restriction: Existing methods typically require compact support

Research Motivation

Existing contributions focus mainly on providing analytical bounds for specific distribution classes, lacking a general numerical computation method. This paper aims to provide a practical computational scheme under relatively weak assumptions.

Core Contributions

Numerical Computation Scheme: Proposes a sequence of semidefinite programming relaxations based on the Moment-SOS hierarchy that can approximate TV distance with arbitrary precision
Theoretical Guarantees: Proves monotonicity and convergence of the relaxation sequence, with optimal values converging to the true TV distance from below
Non-compact Support Handling: The method applies to measures with non-compact support, requiring only the Carleman condition
Exact Solutions for Special Cases: For atomic measures on the real line, exact solutions are obtained when the relaxation degree n ≥ maxm₁,m₂
Dual Theory: Provides dual semidefinite programming formulations, revealing how to strengthen the classical TV distance dual formula by restricting to polynomials and adding penalty terms

Methodology Details

Problem Definition

Given two finite Borel measures μ, ν ∈ M(ℝᵈ)₊, compute their total variation distance: $\|\mu - \nu\|_{TV} = \sup_f \left\{\int f d\mu - \int f d\nu : \|f\|_\infty \leq 1\right\}$

Core Assumptions

Assumption 2.5:

All moments of μ and ν are finite
μ and ν satisfy the condition: $\int \exp(c|x_i|) d\mu < \infty$ , for some c > 0 and all i = 1,...,d

Method Architecture

1. Infinite-Dimensional Linear Programming Reformulation

Reformulate TV distance as an infinite-dimensional LP: $\tau = \inf_{\phi_+,\phi_- \in M(\mathbb{R}^d)_+} \{\phi_+(1) + \phi_-(1) : \phi_+ - \phi_- = \mu - \nu\}$

2. Key Constraint Enhancement

Add domination constraints to obtain an equivalent problem: $\rho = \inf_{\phi_+,\phi_- \in M(\mathbb{R}^d)_+} \{\phi_+(1) + \phi_-(1) : \phi_+ - \phi_- = \mu - \nu; \phi_+ \leq \mu; \phi_- \leq \nu\}$

3. Moment Condition Conversion

Using Lemma 2.2, domination constraints are equivalent to moment matrix conditions: $\phi \leq \mu \Leftrightarrow M_n(\phi) \preceq M_n(\mu), \forall n \in \mathbb{N}$

4. Semidefinite Programming Relaxation

The n-th level relaxation problem: $\rho_n = \min_{\phi,\psi} \{\phi(1) + \psi(1) : \phi_\alpha - \psi_\alpha = \mu_\alpha - \nu_\alpha, \forall \alpha \in \mathbb{N}^d_{2n};$ $0 \preceq M_n(\phi) \preceq M_n(\mu); 0 \preceq M_n(\psi) \preceq M_n(\nu)\}$

Technical Innovations

Critical Role of Domination Constraints: Although redundant in the infinite-dimensional formulation, they serve as a crucial compactification tool in the relaxation scheme
Clever Use of Carleman Condition: Ensures measure uniqueness, making moment constraints equivalent to domination constraints
Natural Emergence of Hahn-Jordan Decomposition: Optimal solutions converge to the moments of the Hahn-Jordan decomposition of μ-ν
Polynomial Restriction in Dual Problem: Handles the ‖f‖∞ ≤ 1 constraint by restricting to polynomial space and adding penalty terms

Experimental Setup

Implementation Tools

Software: GloptiPoly 3 for polynomial optimization
Solver: SeDuMi 1.03 semidefinite programming solver
Platform: HP Elitebook Ubuntu 24 notebook

Test Cases

1. Discrete Measures

Disjoint supports: X = {-1.0, 0.0, 1.0, 2.0}, Y = {-0.7, 0.3, 1.3, 2.3}
One common point: X ∩ Y = {-1.0}
Near-atomic measures: Testing numerical stability

2. Gaussian Measures

Gaussian distributions N(m₁,σ₁) and N(m₂,σ₂) with different means and variances
Moment numbers from 2n = 4 to 2n = 8

Evaluation Metrics

Proximity of relaxation optimal values ρₙ to the true TV distance
Convergence rate and numerical stability
Computation time (all results completed within 0.35 seconds)

Experimental Results

Main Results

1. Theoretical Verification (Theorem 3.4)

For atomic measures on the real line, exact solutions are obtained when n ≥ maxm₁,m₂:

Example 1: μ = δ₀, ν = δₑ, ρ₁ = 2 (exact)
Example 2: Four atoms with one common point, ρ₄ = 1.499 ≈ 1.5
Example 3: Disjoint atoms, ρ₄ = 1.9999 ≈ 2

2. Gaussian Measure Results

(m₁,σ₁)	(m₂,σ₂)	ρ₁	ρ₂	ρ₃	ρ₄
(0,0.1)	(1,0.1)	1.9231	1.9936	1.9991	1.9997
(0,0.2)	(1,0.2)	1.7241	1.9049	1.9376	1.939
(0,0.5)	(1,0.5)	1.0000	1.0000	1.1653	1.1897

3. Key Findings

ρ₁ is completely consistent with analytical lower bounds in reference 1
Significant improvement starting from n=2, with better results at n=3,4
Behavior approaches mutually singular measures with small variance (TV distance close to 2)

Convergence Analysis

Theorem 3.1 guarantees:

Each relaxation has an optimal solution
ρₙ ↑ ‖μ-ν‖_ converges monotonically
Optimal solutions converge to the moments of the Hahn-Jordan decomposition

Main Research Directions

Empirical Estimators: Distance estimation based on samples, but TV distance estimators are often inconsistent
Analytical Bounds: Providing upper and lower bounds for specific distribution classes (e.g., high-dimensional Gaussians, Gaussian mixtures)
Inequality Methods: Pinsker inequality, Hellinger distance bounds
Optimal Transport: Specialized algorithms for Kantorovich metrics (e.g., Sinkhorn algorithm)

Advantages of This Work

Generality: Applies to general measures satisfying the Carleman condition
Non-compact Support: Does not require compact support
Guaranteed Convergence: Theoretical guarantee of monotonic convergence
Practicality: Can handle both closed-form moments and empirical moments

Conclusions and Discussion

Main Conclusions

Provides a general numerical scheme for TV distance computation
Achieves arbitrary precision approximation under relatively weak assumptions
Each relaxation provides a guaranteed lower bound
Exact solutions are obtainable for discrete measures

Limitations

Dimension Sensitivity: The method is sensitive to dimension, currently limited to small-dimensional problems
Semidefinite Programming Solver Limitations: Cannot be expected to solve highly relaxed problems
Numerical Precision: May encounter numerical issues when atoms are too close
Sample Size Requirements: Requires sufficiently large samples when using empirical moments

Future Directions

Provide computationally cheaper alternative lower bounds
Extend to high-dimensional problem handling
Improve numerical stability
Comparative studies with other distance metrics

In-Depth Evaluation

Strengths

Theoretical Rigor: Complete convergence proofs and dual theory
Method Novelty: First application of Moment-SOS hierarchy to TV distance computation
Practical Value: Can handle both closed-form and sample-based data
Exactness Guarantees: Exact solutions obtainable for specific cases (atomic measures)

Weaknesses

Computational Complexity: Semidefinite programming computational complexity limits application scale
Limited Experiments: Testing primarily on low-dimensional and simple distributions
Insufficient Comparison with Existing Methods: Lacks systematic comparison with other TV distance computation methods

Impact

Theoretical Contribution: Provides a new theoretical framework for TV distance computation
Methodological Value: Demonstrates the potential of Moment-SOS hierarchy in probabilistic metric computation
Practical Application: Provides new tools for fields such as distributionally robust optimization

Applicable Scenarios

Exact Computation Requirements: Need for theoretically guaranteed TV distance lower bounds
Low-Dimensional Problems: Applications with relatively low dimensions
Special Distributions: Gaussian, exponential distributions and their mixtures
Discrete Distributions: Atomic measures with finite support

References

The paper cites 28 related references covering multiple fields including optimal transport, moment problems, semidefinite programming, and probabilistic metrics. Particularly, the author's own series of works on the Moment-SOS hierarchy 21,24,26 form the theoretical foundation of this paper.