2025-11-22T06:46:16.153487

A hierarchy of convex relaxations for the total variation distance

Lasserre
Given two measures $μ$, $ν$ on Rd that satisfy Carleman's condition, we provide a numerical scheme to approximate as closely as desired the total variation distance between $μ$ and $ν$. It consists of solving a sequence (hierarchy) of convex relaxations whose associated sequence of optimal values converges to the total variation distance, an additional illustration of the versatility of the Moment-SOS hierarchy. Indeed each relaxation in the hierarchy is a semidefinite program whose size increases with the number of involved moments. It has an optimal solution which is a couple of degree-2n pseudo-moments which converge, as n grows, to moments of the Hahn-Jordan decomposition of $μ$-$ν$.
academic

A hierarchy of convex relaxations for the total variation distance

Basic Information

  • Paper ID: 2401.01086
  • Title: A hierarchy of convex relaxations for the total variation distance
  • Author: Jean B. Lasserre
  • Classification: math.OC (Optimization and Control)
  • Publication Date: January 2024 (arXiv v3: October 16, 2025)
  • Paper Link: https://arxiv.org/abs/2401.01086

Abstract

This paper proposes a numerical scheme to approximate arbitrarily accurately the total variation distance between two measures μ and ν satisfying the Carleman condition. The scheme solves a sequence of (hierarchical) convex relaxation problems whose optimal values converge to the total variation distance, further demonstrating the versatility of the Moment-SOS hierarchy. Each relaxation in the hierarchy is a semidefinite programming problem whose size increases with the number of moments involved, with optimal solutions—a pair of pseudo-moments of degree 2n—that converge to the moments of the Hahn-Jordan decomposition of μ-ν as n grows.

Research Background and Motivation

Problem Importance

Numerical computation of the total variation (TV) distance is an important and challenging problem with broad applications in:

  1. Statistical Testing: Used for homogeneity and independence testing
  2. Distributionally Robust Optimization: Defining uncertainty sets
  3. Data Science and Machine Learning: Distance metrics between measures

Limitations of Existing Methods

  1. Empirical Estimator Issues: Empirical estimators based on random samples are often inconsistent, particularly for TV distance
  2. Computational Challenges: TV distance is equivalent to Wasserstein distance with a "bad" cost function c(x,y) = 1_{x≠y}, which is difficult to compute efficiently
  3. Large Function Space: The space of bounded measurable functions in the standard dual formulation is too large for effective evaluation
  4. Compact Support Restriction: Existing methods typically require compact support

Research Motivation

Existing contributions focus mainly on providing analytical bounds for specific distribution classes, lacking a general numerical computation method. This paper aims to provide a practical computational scheme under relatively weak assumptions.

Core Contributions

  1. Numerical Computation Scheme: Proposes a sequence of semidefinite programming relaxations based on the Moment-SOS hierarchy that can approximate TV distance with arbitrary precision
  2. Theoretical Guarantees: Proves monotonicity and convergence of the relaxation sequence, with optimal values converging to the true TV distance from below
  3. Non-compact Support Handling: The method applies to measures with non-compact support, requiring only the Carleman condition
  4. Exact Solutions for Special Cases: For atomic measures on the real line, exact solutions are obtained when the relaxation degree n ≥ maxm₁,m₂
  5. Dual Theory: Provides dual semidefinite programming formulations, revealing how to strengthen the classical TV distance dual formula by restricting to polynomials and adding penalty terms

Methodology Details

Problem Definition

Given two finite Borel measures μ, ν ∈ M(ℝᵈ)₊, compute their total variation distance: μνTV=supf{fdμfdν:f1}\|\mu - \nu\|_{TV} = \sup_f \left\{\int f d\mu - \int f d\nu : \|f\|_\infty \leq 1\right\}

Core Assumptions

Assumption 2.5:

  1. All moments of μ and ν are finite
  2. μ and ν satisfy the condition: exp(cxi)dμ<\int \exp(c|x_i|) d\mu < \infty, for some c > 0 and all i = 1,...,d

Method Architecture

1. Infinite-Dimensional Linear Programming Reformulation

Reformulate TV distance as an infinite-dimensional LP: τ=infϕ+,ϕM(Rd)+{ϕ+(1)+ϕ(1):ϕ+ϕ=μν}\tau = \inf_{\phi_+,\phi_- \in M(\mathbb{R}^d)_+} \{\phi_+(1) + \phi_-(1) : \phi_+ - \phi_- = \mu - \nu\}

2. Key Constraint Enhancement

Add domination constraints to obtain an equivalent problem: ρ=infϕ+,ϕM(Rd)+{ϕ+(1)+ϕ(1):ϕ+ϕ=μν;ϕ+μ;ϕν}\rho = \inf_{\phi_+,\phi_- \in M(\mathbb{R}^d)_+} \{\phi_+(1) + \phi_-(1) : \phi_+ - \phi_- = \mu - \nu; \phi_+ \leq \mu; \phi_- \leq \nu\}

3. Moment Condition Conversion

Using Lemma 2.2, domination constraints are equivalent to moment matrix conditions: ϕμMn(ϕ)Mn(μ),nN\phi \leq \mu \Leftrightarrow M_n(\phi) \preceq M_n(\mu), \forall n \in \mathbb{N}

4. Semidefinite Programming Relaxation

The n-th level relaxation problem: ρn=minϕ,ψ{ϕ(1)+ψ(1):ϕαψα=μανα,αN2nd;\rho_n = \min_{\phi,\psi} \{\phi(1) + \psi(1) : \phi_\alpha - \psi_\alpha = \mu_\alpha - \nu_\alpha, \forall \alpha \in \mathbb{N}^d_{2n};0Mn(ϕ)Mn(μ);0Mn(ψ)Mn(ν)}0 \preceq M_n(\phi) \preceq M_n(\mu); 0 \preceq M_n(\psi) \preceq M_n(\nu)\}

Technical Innovations

  1. Critical Role of Domination Constraints: Although redundant in the infinite-dimensional formulation, they serve as a crucial compactification tool in the relaxation scheme
  2. Clever Use of Carleman Condition: Ensures measure uniqueness, making moment constraints equivalent to domination constraints
  3. Natural Emergence of Hahn-Jordan Decomposition: Optimal solutions converge to the moments of the Hahn-Jordan decomposition of μ-ν
  4. Polynomial Restriction in Dual Problem: Handles the ‖f‖∞ ≤ 1 constraint by restricting to polynomial space and adding penalty terms

Experimental Setup

Implementation Tools

  • Software: GloptiPoly 3 for polynomial optimization
  • Solver: SeDuMi 1.03 semidefinite programming solver
  • Platform: HP Elitebook Ubuntu 24 notebook

Test Cases

1. Discrete Measures

  • Disjoint supports: X = {-1.0, 0.0, 1.0, 2.0}, Y = {-0.7, 0.3, 1.3, 2.3}
  • One common point: X ∩ Y = {-1.0}
  • Near-atomic measures: Testing numerical stability

2. Gaussian Measures

  • Gaussian distributions N(m₁,σ₁) and N(m₂,σ₂) with different means and variances
  • Moment numbers from 2n = 4 to 2n = 8

Evaluation Metrics

  • Proximity of relaxation optimal values ρₙ to the true TV distance
  • Convergence rate and numerical stability
  • Computation time (all results completed within 0.35 seconds)

Experimental Results

Main Results

1. Theoretical Verification (Theorem 3.4)

For atomic measures on the real line, exact solutions are obtained when n ≥ maxm₁,m₂:

  • Example 1: μ = δ₀, ν = δₑ, ρ₁ = 2 (exact)
  • Example 2: Four atoms with one common point, ρ₄ = 1.499 ≈ 1.5
  • Example 3: Disjoint atoms, ρ₄ = 1.9999 ≈ 2

2. Gaussian Measure Results

(m₁,σ₁)(m₂,σ₂)ρ₁ρ₂ρ₃ρ₄
(0,0.1)(1,0.1)1.92311.99361.99911.9997
(0,0.2)(1,0.2)1.72411.90491.93761.939
(0,0.5)(1,0.5)1.00001.00001.16531.1897

3. Key Findings

  • ρ₁ is completely consistent with analytical lower bounds in reference 1
  • Significant improvement starting from n=2, with better results at n=3,4
  • Behavior approaches mutually singular measures with small variance (TV distance close to 2)

Convergence Analysis

Theorem 3.1 guarantees:

  1. Each relaxation has an optimal solution
  2. ρₙ ↑ ‖μ-ν‖_ converges monotonically
  3. Optimal solutions converge to the moments of the Hahn-Jordan decomposition

Main Research Directions

  1. Empirical Estimators: Distance estimation based on samples, but TV distance estimators are often inconsistent
  2. Analytical Bounds: Providing upper and lower bounds for specific distribution classes (e.g., high-dimensional Gaussians, Gaussian mixtures)
  3. Inequality Methods: Pinsker inequality, Hellinger distance bounds
  4. Optimal Transport: Specialized algorithms for Kantorovich metrics (e.g., Sinkhorn algorithm)

Advantages of This Work

  1. Generality: Applies to general measures satisfying the Carleman condition
  2. Non-compact Support: Does not require compact support
  3. Guaranteed Convergence: Theoretical guarantee of monotonic convergence
  4. Practicality: Can handle both closed-form moments and empirical moments

Conclusions and Discussion

Main Conclusions

  1. Provides a general numerical scheme for TV distance computation
  2. Achieves arbitrary precision approximation under relatively weak assumptions
  3. Each relaxation provides a guaranteed lower bound
  4. Exact solutions are obtainable for discrete measures

Limitations

  1. Dimension Sensitivity: The method is sensitive to dimension, currently limited to small-dimensional problems
  2. Semidefinite Programming Solver Limitations: Cannot be expected to solve highly relaxed problems
  3. Numerical Precision: May encounter numerical issues when atoms are too close
  4. Sample Size Requirements: Requires sufficiently large samples when using empirical moments

Future Directions

  1. Provide computationally cheaper alternative lower bounds
  2. Extend to high-dimensional problem handling
  3. Improve numerical stability
  4. Comparative studies with other distance metrics

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Complete convergence proofs and dual theory
  2. Method Novelty: First application of Moment-SOS hierarchy to TV distance computation
  3. Practical Value: Can handle both closed-form and sample-based data
  4. Exactness Guarantees: Exact solutions obtainable for specific cases (atomic measures)

Weaknesses

  1. Computational Complexity: Semidefinite programming computational complexity limits application scale
  2. Limited Experiments: Testing primarily on low-dimensional and simple distributions
  3. Insufficient Comparison with Existing Methods: Lacks systematic comparison with other TV distance computation methods

Impact

  1. Theoretical Contribution: Provides a new theoretical framework for TV distance computation
  2. Methodological Value: Demonstrates the potential of Moment-SOS hierarchy in probabilistic metric computation
  3. Practical Application: Provides new tools for fields such as distributionally robust optimization

Applicable Scenarios

  1. Exact Computation Requirements: Need for theoretically guaranteed TV distance lower bounds
  2. Low-Dimensional Problems: Applications with relatively low dimensions
  3. Special Distributions: Gaussian, exponential distributions and their mixtures
  4. Discrete Distributions: Atomic measures with finite support

References

The paper cites 28 related references covering multiple fields including optimal transport, moment problems, semidefinite programming, and probabilistic metrics. Particularly, the author's own series of works on the Moment-SOS hierarchy 21,24,26 form the theoretical foundation of this paper.