2025-11-25T18:25:18.428479

Structured covariance estimation via tensor-train decomposition

Patarusau, Puchkin, Rakhuba et al.

We consider a problem of covariance estimation from a sample of i.i.d. high-dimensional random vectors. To avoid the curse of dimensionality we impose an additional assumption on the structure of the covariance matrix $Î£$. To be more precise we study the case when $Î£$ can be approximated by a sum of double Kronecker products of smaller matrices in a tensor train (TT) format. Our setup naturally extends widely known Kronecker sum and CANDECOMP/PARAFAC models but admits richer interaction across modes. We suggest an iterative polynomial time algorithm based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structure. We derive non-asymptotic dimension-free bounds on the accuracy of covariance estimation taking into account hidden Kronecker product and tensor train structures. The efficiency of our approach is illustrated with numerical experiments.

academic

Structured Covariance Estimation via Tensor-Train Decomposition

Basic Information

Paper ID: 2510.08174
Title: Structured Covariance Estimation via Tensor-Train Decomposition
Authors: Artsiom Patarusau, Nikita Puchkin, Maxim Rakhuba, Fedor Noskov (HSE University)
Classification: math.ST (Statistics Theory)
Publication Date: October 15, 2025
Paper Link: https://arxiv.org/abs/2510.08174v2

Abstract

This paper investigates the problem of estimating covariance matrices from independent and identically distributed samples of high-dimensional random vectors. To circumvent the curse of dimensionality, the authors impose additional structural assumptions on the covariance matrix Σ, specifically studying the case where Σ can be approximated as a sum of double Kronecker products of smaller matrices in tensor-train (TT) format. This setting naturally extends the well-known Kronecker sum and CANDECOMP/PARAFAC models while allowing richer interactions between modes. The authors propose polynomial-time iterative algorithms based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structures, and derive non-asymptotic dimension-free convergence bounds that account for the hidden Kronecker product and tensor-train structures.

Research Background and Motivation

Problem Definition

Given independent and identically distributed centered random vectors $X, X_1, \ldots, X_n \in \mathbb{R}^d$ , the goal is to estimate their covariance matrix $\Sigma = \mathbb{E}[XX^T] \in \mathbb{R}^{d \times d}$ .

Research Motivation

Curse of Dimensionality: In high-dimensional settings, the classical sample covariance estimator $\hat{\Sigma} = \frac{1}{n}\sum_{i=1}^n X_i X_i^T$ suffers from the curse of dimensionality, with performance degrading sharply as $d$ increases.
Necessity of Structural Assumptions: To overcome this challenge, statisticians typically impose additional structural assumptions on $\Sigma$ to exploit data structure and reduce the total number of unknown parameters.
Limitations of Existing Methods:
- Kronecker product models $\Sigma = \Phi \otimes \Psi$ are overly simplistic
- Kronecker sum models $\Sigma = \sum_{k=1}^K \Phi_k \otimes \Psi_k$ lack sufficient flexibility
- CANDECOMP/PARAFAC models face NP-hardness computationally

Novelty of This Work

The authors propose a tensor-train format covariance model: $\Sigma = \sum_{j=1}^J \sum_{k=1}^K U_j \otimes V_{jk} \otimes W_k$ where $U_j \in \mathbb{R}^{p \times p}$ , $V_{jk} \in \mathbb{R}^{q \times q}$ , $W_k \in \mathbb{R}^{r \times r}$ , and $pqr = d$ .

Core Contributions

Novel Covariance Model: Proposes a covariance structure based on tensor-train decomposition that naturally extends Kronecker sum and CANDECOMP/PARAFAC models while allowing richer inter-modal interactions.
Efficient Algorithm: Designs the HardTTh algorithm (Hard Tensor Train Thresholding) based on TT-SVD and HOOI adapted to Tucker-2 hybrid structures, with computational complexity $O((J+K)Td_1d_2d_3)$ .
Theoretical Guarantees: Establishes non-asymptotic, dimension-free convergence bounds—the first dimension-free theoretical results for TT-structured tensor estimation.
Practical Validation: Numerical experiments verify the method's effectiveness and demonstrate the necessity of iterative refinement.

Methodology Details

Task Definition

Input: Independent and identically distributed samples $X_1, \ldots, X_n \in \mathbb{R}^{pqr}$ Output: Estimate $\tilde{\Sigma}$ of the covariance matrix $\Sigma$ Constraint: $\Sigma$ has TT structure, expressible as $\Sigma = \sum_{j=1}^J \sum_{k=1}^K U_j \otimes V_{jk} \otimes W_k$

Model Architecture

Tensor Reshaping and Decomposition

Reshaping Operation: Reshape the covariance matrix $\Sigma \in \mathbb{R}^{pqr \times pqr}$ into a third-order tensor $\mathcal{R}(\Sigma) \in \mathbb{R}^{p^2 \times q^2 \times r^2}$
TT Decomposition Representation: $\mathcal{R}(\Sigma) = \sum_{j=1}^J \sum_{k=1}^K \text{vec}(U_j) \otimes \text{vec}(V_{jk}) \otimes \text{vec}(W_k)$
Compact Form: $\mathcal{R}(\Sigma) = U \times_1 V \times_3 W$ where $U \in O_{p^2,J}$ , $V \in O_{r^2,K}$ , $W \in \mathbb{R}^{J \times q^2 \times K}$

HardTTh Algorithm

Algorithm 1: HardTTh

Input: Tensor Y ∈ ℝ^{d₁×d₂×d₃}, TT ranks (J,K), number of iterations T
Output: TT approximation T̂ = Û ×₁ V̂ ×₃ Ŵ

1. Compute truncated SVD of m₁(Y): Û₀, Σ₀,₁, Ũ₀ = SVD_J(m₁(Y))
2. Compute truncated SVD of m₃(Û₀ᵀ ×₁ Y): V̂₀, Σ₀,₂, Ṽ₀ = SVD_K(m₃(Û₀ᵀ ×₁ Y))

for t = 1, ..., T do:
3. Ût, Σt,₁, Ũt = SVD_J(m₁(V̂ₜ₋₁ᵀ ×₃ Y))
4. V̂t, Σt,₂, Ṽt = SVD_K(m₃(Ûtᵀ ×₁ Y))

5. Set Û = ÛT, V̂ = V̂T, Ŵ = Ûᵀ ×₁ V̂ᵀ ×₃ Y

Technical Innovations

Tucker-2 Hybrid Structure: Unlike standard Tucker decomposition requiring three orthogonal factors, the TT structure requires only two, reducing computational complexity.
Iterative Refinement Strategy: Progressively improves estimation accuracy through alternating optimization of modal subspaces.
Hard Thresholding: Uses hard thresholding rather than soft thresholding, avoiding the NP-hardness of tensor nuclear norm approximation.

Experimental Setup

Data Generation Model

TT Ranks: $J = 7, K = 9$
Dimensions: $p = q = r = 10$ , total dimension $d = 1000$
Generation Process:
- Generate random symmetric matrices $A_j \in \mathbb{R}^{p \times p}$ , $B_{jk} \in \mathbb{R}^{q \times q}$ , $C_k \in \mathbb{R}^{r \times r}$
- Random vectors defined as: $\sum_{j=1}^J \sum_{k=1}^K A_j \times_1 B_{jk} \times_2 C_k \times_3 E_{ijk}$
- where $E_{ijk}$ is a standard Gaussian tensor

Evaluation Metrics

Relative Error: $\|\hat{S} - \Sigma\|_F / \|\Sigma\|_F$

Comparison Methods

Sample Mean: Sample covariance estimator
TT-HOSVD: HardTTh algorithm without iterations ( $T=0$ )
Tucker: Standard Tucker decomposition
Tucker+HOOI: Tucker decomposition with HOOI iterations
PRLS: Modified regularized least squares

Implementation Details

Number of iterations: $T = 10$
PRLS parameters: Tuned on logarithmic scale grid for $\lambda_1, \lambda_2$
Experiment repetitions: 16-32 repetitions per setting

Experimental Results

Main Results

Sample Size	Sample Mean	TT-HOSVD	HardTTh	Tucker	Tucker+HOOI	PRLS
n=500	1.22±0.02	0.269±0.008	0.238±0.013	0.252±0.007	0.240±0.013	0.238±0.017
n=2000	0.611±0.009	0.154±0.006	0.082±0.005	0.150±0.005	0.082±0.005	0.216±0.012
n=4000	0.430±0.007	0.105±0.008	0.054±0.002	0.105±0.007	0.054±0.002	0.217±0.015

Key Findings

Necessity of Iterations: HardTTh shows significant improvement over TT-HOSVD, particularly at n=2000 where relative error decreases from 0.154 to 0.082.
Convergence Behavior:
- At n=500: $\sin\Theta(\text{Im}\hat{U}_0, \text{Im}U^*) \approx 1$ , $\sin\Theta(\text{Im}\hat{U}_T, \text{Im}U^*) \approx 1$
- At n=2000: $\sin\Theta(\text{Im}\hat{U}_0, \text{Im}U^*) \approx 1$ , $\sin\Theta(\text{Im}\hat{U}_T, \text{Im}U^*) = 0.33±0.08$
Computational Efficiency: HardTTh has moderate time complexity, faster than full Tucker decomposition but slower than TT-HOSVD.

Theoretical Validation

Experiments confirm the necessity of theoretical conditions: when singular value conditions are not satisfied (e.g., n=500), the algorithm cannot effectively recover subspaces; when conditions are satisfied (e.g., n≥2000), iterations significantly improve performance.

Kronecker Product Models

Single Kronecker Product: Werner et al. (2008) proposed the $\Sigma = \Phi \otimes \Psi$ model
Kronecker Sum: Tsiligkaridis & Hero (2013), Puchkin & Rakhuba (2024) studied the $\Sigma = \sum_k \Phi_k \otimes \Psi_k$ model

Tensor Decomposition Methods

CP Decomposition: Faces computational complexity issues
Tucker Decomposition: Zhang & Xia (2018) and others established dimension-dependent bounds
TT Decomposition: Oseledets (2011) proposed it; this paper is the first to apply it to covariance estimation

Theoretical Progress

Dimension-Dependent Bounds: Most existing results depend on ambient dimension
Dimension-Free Bounds: Limited to simple cases; this paper extends to TT structures

Conclusions and Discussion

Main Conclusions

Model Advantages: The TT-format covariance model provides richer structure than traditional Kronecker models while maintaining computational feasibility.
Algorithm Effectiveness: The HardTTh algorithm achieves polynomial-time complexity and significantly improves estimation quality through iterations.
Theoretical Guarantees: Establishes the first dimension-free convergence bounds for TT-structured estimation, with variance term: $\tilde{v} = 96\omega\|\Sigma\|\sqrt{\frac{Jr_1^2(\Sigma) + JKr_2^2(\Sigma) + Kr_3^2(\Sigma) + \log(48/\delta)}{n}}$

Limitations

Singular Value Conditions: The algorithm requires $\sigma_J(m_1(\mathcal{R}(\Sigma))) \gtrsim \|\Sigma\|\sqrt{r_2^2(\Sigma)r_3^2(\Sigma)/n}$ , which is stronger than theoretically optimal conditions.
Noise Structure: Theoretical analysis assumes specific noise structure, differing from homogeneous noise.
Parameter Selection: Choosing TT ranks $(J,K)$ requires prior knowledge or data-driven methods.

Future Directions

Debiasing Methods: Develop debiasing techniques for non-homogeneous noise.
Adaptive Rank Selection: Establish theoretically guaranteed rank selection methods.
Extended Applications: Extend the method to other structured matrix estimation problems.

In-Depth Evaluation

Strengths

Theoretical Innovation: First to provide dimension-free theoretical bounds for TT-structured covariance estimation, filling an important theoretical gap.
Practical Methodology: HardTTh algorithm has reasonable computational complexity and avoids NP-hardness.
Comprehensive Experiments: Validates effectiveness through multiple comparison methods and varying sample sizes.
Thorough Analysis: Provides detailed theoretical analysis and algorithm convergence studies.

Weaknesses

Strong Conditions: Theoretical conditions are stricter than known lower bounds, indicating a statistical-computational gap.
Model Limitations: Only applicable to covariance matrices well-approximated by TT format.
Parameter Sensitivity: Performance depends on correct selection of TT rank parameters.

Impact

Theoretical Contribution: Provides new theoretical tools for tensor methods in high-dimensional statistics.
Practical Value: Potential applications in multi-modal data analysis, signal processing, and related fields.
Methodological Significance: Demonstrates effective application of tensor decomposition techniques to statistical estimation problems.

Applicable Scenarios

Multi-Modal Data: Images, videos, and other data with natural tensor structure
Spatio-Temporal Data: Covariance estimation for data with temporal-spatial structure
High-Dimensional Financial Data: Structured covariance modeling of asset returns
Sensor Networks: Covariance estimation for multi-sensor data

References

Werner, K., Jansson, M., & Stoica, P. (2008). On estimation of covariance matrices with Kronecker product structure.
Tsiligkaridis, T., & Hero, A. O. (2013). Covariance estimation in high dimensions via Kronecker product expansions.
Zhang, A., & Xia, D. (2018). Tensor SVD: Statistical and computational limits.
Puchkin, N., & Rakhuba, M. (2024). Dimension-free structured covariance estimation.
Oseledets, I. V. (2011). Tensor-train decomposition.