2025-11-25T02:22:17.580847

Optimal Bounds for Tyler's M-Estimator for Elliptical Distributions

Lau, Ramachandran

A fundamental problem in statistics is estimating the shape matrix of an Elliptical distribution. This generalizes the familiar problem of Gaussian covariance estimation, for which the sample covariance achieves optimal estimation error. For Elliptical distributions, Tyler proposed a natural M-estimator and showed strong statistical properties in the asymptotic regime, independent of the underlying distribution. Numerical experiments show that this estimator performs very well, and that Tyler's iterative procedure converges quickly to the estimator. Franks and Moitra recently provided the first distribution-free error bounds in the finite sample setting, as well as the first rigorous convergence analysis of Tyler's iterative procedure. However, their results exceed the sample complexity of the Gaussian setting by a $\log^{2} d$ factor. We close this gap by proving optimal sample threshold and error bounds for Tyler's M-estimator for all Elliptical distributions, fully matching the Gaussian result. Moreover, we recover the algorithmic convergence even at this lower sample threshold. Our approach builds on the operator scaling connection of Franks and Moitra by introducing a novel pseudorandom condition, which we call $\infty$-expansion. We show that Elliptical distributions satisfy $\infty$-expansion at the optimal sample threshold, and then prove a novel scaling result for inputs satisfying this condition.

academic

Optimal Bounds for Tyler's M-Estimator for Elliptical Distributions

基本信息

论文ID: 2510.13751
标题: Optimal Bounds for Tyler's M-Estimator for Elliptical Distributions
作者: Lap Chi Lau (University of Waterloo), Akshay Ramachandran (University of British Columbia)
分类: math.ST cs.LG stat.TH
发表时间: May 2025 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.13751

摘要

椭圆分布的形状矩阵估计是统计学中的基本问题，它推广了高斯协方差估计问题。Tyler提出了一种自然的M-估计器，并在渐近情况下证明了强统计性质。Franks和Moitra最近提供了有限样本情况下的首个分布无关误差界限，但其结果在样本复杂度上比高斯情况多了 $\log^2 d$ 因子。本文通过引入新的伪随机条件 $\infty$ -expansion，证明了Tyler M-估计器的最优样本阈值和误差界限，完全匹配高斯结果，并在较低样本阈值下恢复了算法收敛性。

研究背景与动机

问题背景

核心问题：估计椭圆分布的形状矩阵(shape matrix)，这是高维分布协方差估计的重要推广
实际意义：
- 椭圆分布包含多元高斯分布和t-分布等重要特例
- 对于重尾分布，协方差矩阵可能不存在，但形状矩阵仍能捕获几何性质
- 在金融、信号处理等领域有广泛应用

现有方法局限性

样本协方差的局限：对于重尾分布表现较差，甚至可能不存在
Tyler估计器的理论缺陷：
- Tyler(1987)只给出了渐近保证
- Franks和Moitra(2020)的有限样本界限存在 $\log^2 d$ 的额外因子
- 样本复杂度为 $n \gtrsim d\log^2 d$ ，超过高斯情况的最优 $n \gtrsim d$

研究动机

本文旨在回答：Tyler估计器能否在椭圆分布上达到与高斯协方差估计相同的最优保证，还是形状估计本质上更困难？

核心贡献

最优样本复杂度：证明了Tyler M-估计器在样本数 $n \gtrsim \frac{d}{\varepsilon^2}$ 时达到相对算子范数误差 $\varepsilon$
最优误差界限：完全匹配高斯情况的下界，证明结果的紧致性
算法收敛性：在最优样本阈值 $n \gtrsim d$ 下恢复Tyler迭代过程的线性收敛
新的理论工具：引入 $\infty$ -expansion条件，为frame scaling提供更强的分析工具
技术创新：改进了Franks-Moitra方法中的两个关键组件，去除了 $\log d$ 因子

方法详解

任务定义

输入：来自椭圆分布 $E(\Sigma, u)$ 的 $n$ 个样本 $x_1, \ldots, x_n \in \mathbb{R}^d$ 输出：形状矩阵 $\Sigma$ 的估计 $\hat{\Sigma}$ 目标：最小化相对算子范数误差 $\|I_d - \Sigma^{1/2}\hat{\Sigma}^{-1}\Sigma^{1/2}\|_{op}$

椭圆分布与Tyler估计器

椭圆分布定义： $X := \Sigma^{1/2}V \cdot u$ 其中 $V \sim S^{d-1}$ 是均匀随机单位向量， $u \in \mathbb{R}$ 是独立的标量随机变量。

Tyler M-估计器：满足以下方程的唯一解 $\hat{\Sigma}$ ： $\frac{d}{n}\sum_{j=1}^n \frac{x_jx_j^T}{x_j^T\hat{\Sigma}^{-1}x_j} = \hat{\Sigma}, \quad \text{Tr}[\hat{\Sigma}] = d$

核心技术框架

1. Frame Scaling连接

Tyler估计器等价于frame scaling问题：

Frame： $V = \{v_1, \ldots, v_n\} \in \mathbb{R}^{d \times n}$
目标：找到左右缩放 $L \in \mathbb{R}^{d \times d}$ $L \in R^{d \times d}$ 和 $R \in \text{diag}(n)$ $R \in diag (n)$ 使得 $V' = LVR$ $V^{'} = L V R$ 满足：
- 等距性： $V'V'^T = \frac{s(V')}{d}I_d$
- 等范数： $\|v'_j\|_2^2 = \frac{s(V')}{n}$

2. ∞-Expansion条件

定义：Frame $V$ 满足 $(1-\lambda)$ - $\infty$ -expansion如果： $\forall y \perp \mathbf{1}_n, \|y\|_\infty \leq 1: \left\|\sum_{j=1}^n y_j v_j v_j^T\right\|_{op} \leq \frac{s(V)(1-\lambda)}{d}$

这是比quantum expansion更强的条件，关键改进：

约束从 $\|y\|_2 \leq 1$ 强化为 $\|y\|_\infty \leq 1$
输出从Frobenius范数改为算子范数

3. 伪随机条件

定义：Frame $V$ 是 $(\alpha_{\min}, \alpha_{\max}, \beta)$ -伪随机的如果： $\forall |B| = \beta n: \beta\frac{\alpha_{\min}}{d}I_d \preceq V_BV_B^T \preceq \beta\frac{\alpha_{\max}}{d}I_d$

主要理论结果

定理1.1（样本复杂度）：当 $n \gtrsim \frac{d}{\varepsilon^2}$ 且 $\varepsilon$ 为小常数时，Tyler M-估计器满足： $\|I_d - \Sigma^{1/2}\hat{\Sigma}^{-1}\Sigma^{1/2}\|_{op} \leq \varepsilon$ 概率至少为 $1 - \exp(-\Omega(\varepsilon^2 n))$ 。

定理1.2（算法收敛）：当 $n \gtrsim d$ 时，Tyler迭代过程的第 $T$ 步迭代 $\Sigma^{(T)}$ 满足： $\|I_d - \hat{\Sigma}^{1/2}\Sigma^{(T),-1}\hat{\Sigma}^{1/2}\|_F \leq \delta$ 在 $T \lesssim |\log \det \Sigma| + d + \log(1/\delta)$ 步内达成。