2025-11-22T21:43:16.336737

A Martingale Kernel Two-Sample Test

Chatterjee, Ramdas
The Maximum Mean Discrepancy (MMD) is a widely used multivariate distance metric for two-sample testing. The standard MMD test statistic has an intractable null distribution typically requiring costly resampling or permutation approaches for calibration. In this work we leverage a martingale interpretation of the estimated squared MMD to propose martingale MMD (mMMD), a quadratic-time statistic which has a limiting standard Gaussian distribution under the null. Moreover we show that the test is consistent against any fixed alternative and for large sample sizes, mMMD offers substantial computational savings over the standard MMD test, with only a minor loss in power.
academic

A Martingale Kernel Two-Sample Test

Basic Information

  • Paper ID: 2510.11853
  • Title: A Martingale Kernel Two-Sample Test
  • Authors: Anirban Chatterjee (University of Chicago), Aaditya Ramdas (Carnegie Mellon University)
  • Classification: stat.ME, math.ST, stat.TH
  • Publication Date: October 13, 2025
  • Paper Link: https://arxiv.org/abs/2510.11853

Abstract

Maximum Mean Discrepancy (MMD) is a widely-used multivariate distance metric in two-sample testing. The standard MMD test statistic has an intractable null distribution, typically requiring expensive resampling or permutation methods for calibration. This paper leverages the martingale interpretation of the squared MMD estimator to propose martingale MMD (mMMD)—a quadratic-time statistic with an asymptotically standard Gaussian distribution under the null hypothesis. Furthermore, we prove that the test is consistent against any fixed alternative hypothesis. For large sample sizes, mMMD provides significant computational savings compared to standard MMD testing with minimal power loss.

Research Background and Motivation

Problem Description

Two-sample testing is a classical problem in statistics, aiming to test whether two distributions P and Q are equal based on independent samples: H0:P=QvsH1:PQH_0: P = Q \quad \text{vs} \quad H_1: P \neq Q

Limitations of Existing Methods

  1. Parametric methods: Often fail under model misspecification or on non-Euclidean data
  2. Classical nonparametric methods: Primarily applicable to univariate data; multivariate extensions are challenging
  3. Standard MMD testing: The null distribution is a sum of infinitely weighted χ² variables with weights depending on unknown distributions, requiring computationally intensive resampling or permutation methods

Research Motivation

  • MMD demonstrates excellent performance in detecting distributional differences in general domains via kernel methods
  • Determining the threshold τα is a key practical challenge in MMD testing
  • Existing moment-based parametric approximations lack consistency or accuracy guarantees
  • There is a need for an efficient alternative method with a tractable null distribution

Core Contributions

  1. Proposes mMMD test: A novel MMD variant based on martingale structure with standard Gaussian null distribution
  2. Theoretical guarantees:
    • Proves asymptotic normality under the null hypothesis (Theorems 2, 3)
    • Establishes consistency against fixed alternatives (Theorems 6, 7)
    • Provides distributional convergence under alternatives (Theorem 8)
  3. Computational efficiency: Avoids resampling while maintaining O(n²) complexity with significantly reduced actual runtime
  4. Extended applications:
    • Multi-kernel testing (mmMMD)
    • Generalized statistic family Tn,γ, encompassing standard MMD and mMMD as special cases

Methodology Details

Task Definition

Given independent samples from two distributions P and Q on a metric space X:

  • Xn = {X₁, ..., Xn} ~ P
  • Yn = {Y₁, ..., Yn} ~ Q

Objective: Test H₀: P = Q vs H₁: P ≠ Q

Core Idea: Martingale Structure

Key observation: A modified form of the squared MMD estimator possesses a martingale structure.

Witness function approach:

  • Theoretically optimal witness function: f₀ = (νP - νQ)/‖νP - νQ‖K
  • For each 2 ≤ i ≤ n, estimate using historical data: f^i=1ij=1i1[K(Xj,)K(Yj,)]\hat{f}_i = \frac{1}{i}\sum_{j=1}^{i-1}[K(X_j, \cdot) - K(Y_j, \cdot)]

mMMD Statistic

Tn:=1ni=2nf^i,K(Xi,)K(Yi,)KT_n := \frac{1}{n}\sum_{i=2}^n \langle \hat{f}_i, K(X_i, \cdot) - K(Y_i, \cdot) \rangle_K

Using the kernel trick, this simplifies to: Tn=1ni=2n1ij=1i1[K(Xi,Xj)K(Xi,Yj)K(Xj,Yi)+K(Yi,Yj)]T_n = \frac{1}{n}\sum_{i=2}^n \frac{1}{i}\sum_{j=1}^{i-1}[K(X_i, X_j) - K(X_i, Y_j) - K(X_j, Y_i) + K(Y_i, Y_j)]

Standardized Statistic

To achieve asymptotic normality, define the variance estimator: σn2:=1n2i=2n(1ij=1i1K(Xi,Xj)K(Xi,Yj)K(Xj,Yi)+K(Yi,Yj))2\sigma_n^2 := \frac{1}{n^2}\sum_{i=2}^n \left(\frac{1}{i}\sum_{j=1}^{i-1}K(X_i, X_j) - K(X_i, Y_j) - K(X_j, Y_i) + K(Y_i, Y_j)\right)^2

Final test statistic: ηn=Tn/σn\eta_n = T_n/\sigma_n

Testing Rule

Ψn:=1{ηn>z1α}\Psi_n := \mathbf{1}\{\eta_n > z_{1-\alpha}\} where z₁₋α is the (1-α)-quantile of the standard normal distribution.

Technical Innovations

  1. Martingale structure identification: First identification of martingale difference sequences in MMD estimators
  2. Resampling avoidance: Directly obtains standard Gaussian distribution via martingale central limit theorem
  3. Dimension independence: Under appropriate conditions, null distribution is independent of data dimensionality
  4. Unified framework: The Tn,γ family unifies multiple MMD variants

Experimental Setup

Theoretical Verification Experiments

Null distribution verification:

  • Dimensions: d ∈ {10, 100, 250, 500}
  • Data distributions: Nd(0d, Id) and td(10)
  • Kernels: Gaussian and Laplacian (median heuristic bandwidth)
  • Sample size: n = 200, 2000 repetitions

Power Comparison Experiments

Setup:

  • P = Nd(0d, Id), Q = Nd(μd,j,ε, Id)
  • Configurations: (d,j,ε) = (10,5,0.3), (50,5,0.3), (100,5,0.5)
  • Comparison methods: Standard MMD, Linear-time MMD (LMMD), Block MMD (BMMD), Cross MMD (xMMD), BetMMD

Real Data Experiments

MNIST dataset:

  • 5 digit pair comparisons with gradually increasing overlap
  • 100 samples per group, 100 repetitions
  • Significance level: α = 0.05

Multi-kernel Experiments

Configuration:

  • mmMMD Gauss: 3 Gaussian kernels, bandwidths (1,2,4)λmed
  • mmMMD Laplace: 3 Laplacian kernels, same bandwidths
  • mmMMD Mixed: Mixed Gaussian and Laplacian kernels

Experimental Results

Null Distribution Verification

  • Main finding: The empirical distribution of ηn closely matches the standard Gaussian distribution across all settings
  • Robustness: Results demonstrate robustness to data distribution, kernel choice, and dimensionality
  • Comparative advantage: Contrasts sharply with the complex null distribution of standard MMD

Power Comparison

Method(10,5,0.3)(50,5,0.3)(100,5,0.5)
mMMD0.850.780.82
MMD0.920.850.89
xMMD0.830.760.80
BMMD0.650.580.62
LMMD0.450.380.42

Key findings:

  • mMMD power is comparable to standard MMD, outperforming other computationally efficient variants
  • Performance comparable to xMMD while avoiding sample splitting

Computational Efficiency

Sample SizemMMDMMDLMMDBMMDxMMD
1000.0008±0.00070.0817±0.00780.0007±0.00030.0006±0.00030.0004±0.0001
2000.0026±0.00100.3150±0.02270.0023±0.00100.0020±0.00080.0011±0.0007
3000.0072±0.00230.8335±0.05010.0058±0.00200.0050±0.00200.0022±0.0013

Results: mMMD is approximately 100 times faster than standard MMD, comparable to other efficient methods.

MNIST Experiment Results

  • Trend: Power decreases across all methods as group index increases (overlap increases)
  • Performance ranking: mMMD and xMMD > BMMD > LMMD
  • Practical significance: Validates theoretical advantages on real data

Development of Kernel Two-Sample Testing

  1. Early methods: Conservative approaches based on large deviation bounds
  2. Spectral methods: Spectral approximation by Gretton et al. (2009), requiring strong assumptions
  3. Incomplete U-statistics: Linear-time MMD, block MMD, etc.
  4. Sample splitting strategies: Kübler et al. (2022), Shekhar et al. (2022)

Advantages Relative to This Work

  • Theoretical completeness: Establishes distributional theory under both null and alternative hypotheses
  • Computational efficiency: Avoids computational burden of permutation testing
  • Practicality: No sample splitting required, preserves complete sample information

Conclusions and Discussion

Main Conclusions

  1. Theoretical contribution: First utilization of martingale structure to construct MMD test with standard Gaussian null distribution
  2. Practical value: Significantly reduces computational cost while maintaining good statistical performance
  3. Extensibility: Framework extends to multi-kernel settings and more general statistic families

Limitations

  1. Theoretical limitations:
    • Median heuristic bandwidth selection lacks theoretical support
    • Minimax optimality for γ > 1/2 remains undetermined
  2. Practical limitations:
    • Still requires O(n²) computational complexity
    • Power slightly lower than standard MMD in some settings

Future Directions

  1. Theoretical extensions:
    • Theoretical guarantees for data-dependent kernels
    • Applicability to more general kernel functions
    • Complete characterization of minimax optimality
  2. Method improvements:
    • Integration with kernel approximation techniques to reduce complexity
    • Extension to independence testing
    • Applications to distance-based testing

In-Depth Evaluation

Strengths

  1. Strong novelty: Martingale perspective represents a novel contribution to MMD research
  2. Theoretical rigor: Complete asymptotic theory including Berry-Esseen type convergence rates
  3. High practical value: Addresses the computational bottleneck of MMD testing
  4. Comprehensive experiments: Full assessment from theoretical verification to real applications
  5. Clear presentation: Good balance between technical details and intuitive explanation

Weaknesses

  1. Theoretical gaps: Incomplete theoretical analysis of data-dependent bandwidth
  2. Power loss: Power is indeed lower than standard MMD in some cases
  3. Limited scope: Primarily validated on Euclidean spaces
  4. Computational complexity: Still O(n²), no fundamental improvement achieved

Impact

  1. Academic value: Provides new perspective for MMD theory, potentially inspiring more martingale-based methods
  2. Practical value: Directly applicable to large-scale two-sample testing tasks
  3. Reproducibility: Method is simple and clear, easy to implement and verify
  4. Extensibility: Framework has good potential for extension

Applicable Scenarios

  1. Large-scale data: Computational efficiency advantages are pronounced
  2. High-dimensional data: Dimension-independent null distribution property is advantageous
  3. Real-time applications: Avoids computational demands of permutation testing
  4. Multi-kernel scenarios: mmMMD has advantages when kernel selection is uncertain

References

  1. Gretton, A., et al. (2012a). A kernel two-sample test. JMLR, 13(1), 723-773.
  2. Shekhar, S., Kim, I., & Ramdas, A. (2022). A permutation-free kernel two-sample test. NeurIPS, 35, 18168-18180.
  3. Li, T. & Yuan, M. (2024). On the optimality of Gaussian kernel based nonparametric tests against smooth alternatives. JMLR, 25(334), 1-62.
  4. Fan, X. & Shao, Q. M. (2018). Berry–Esseen bounds for self-normalized martingales. Communications in Mathematics and Statistics, 6(1), 13-27.

Summary: This is a high-quality theoretical statistics paper that provides a novel solution to the classical MMD testing problem through clever identification of martingale structure. The theoretical contributions are solid, experimental validation is comprehensive, and it possesses significant academic and practical value.