2025-11-25T18:25:18.428479

Structured covariance estimation via tensor-train decomposition

Patarusau, Puchkin, Rakhuba et al.
We consider a problem of covariance estimation from a sample of i.i.d. high-dimensional random vectors. To avoid the curse of dimensionality we impose an additional assumption on the structure of the covariance matrix $Σ$. To be more precise we study the case when $Σ$ can be approximated by a sum of double Kronecker products of smaller matrices in a tensor train (TT) format. Our setup naturally extends widely known Kronecker sum and CANDECOMP/PARAFAC models but admits richer interaction across modes. We suggest an iterative polynomial time algorithm based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structure. We derive non-asymptotic dimension-free bounds on the accuracy of covariance estimation taking into account hidden Kronecker product and tensor train structures. The efficiency of our approach is illustrated with numerical experiments.
academic

Structured Covariance Estimation via Tensor-Train Decomposition

Basic Information

  • Paper ID: 2510.08174
  • Title: Structured Covariance Estimation via Tensor-Train Decomposition
  • Authors: Artsiom Patarusau, Nikita Puchkin, Maxim Rakhuba, Fedor Noskov (HSE University)
  • Classification: math.ST (Statistics Theory)
  • Publication Date: October 15, 2025
  • Paper Link: https://arxiv.org/abs/2510.08174v2

Abstract

This paper investigates the problem of estimating covariance matrices from independent and identically distributed samples of high-dimensional random vectors. To circumvent the curse of dimensionality, the authors impose additional structural assumptions on the covariance matrix Σ, specifically studying the case where Σ can be approximated as a sum of double Kronecker products of smaller matrices in tensor-train (TT) format. This setting naturally extends the well-known Kronecker sum and CANDECOMP/PARAFAC models while allowing richer interactions between modes. The authors propose polynomial-time iterative algorithms based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structures, and derive non-asymptotic dimension-free convergence bounds that account for the hidden Kronecker product and tensor-train structures.

Research Background and Motivation

Problem Definition

Given independent and identically distributed centered random vectors X,X1,,XnRdX, X_1, \ldots, X_n \in \mathbb{R}^d, the goal is to estimate their covariance matrix Σ=E[XXT]Rd×d\Sigma = \mathbb{E}[XX^T] \in \mathbb{R}^{d \times d}.

Research Motivation

  1. Curse of Dimensionality: In high-dimensional settings, the classical sample covariance estimator Σ^=1ni=1nXiXiT\hat{\Sigma} = \frac{1}{n}\sum_{i=1}^n X_i X_i^T suffers from the curse of dimensionality, with performance degrading sharply as dd increases.
  2. Necessity of Structural Assumptions: To overcome this challenge, statisticians typically impose additional structural assumptions on Σ\Sigma to exploit data structure and reduce the total number of unknown parameters.
  3. Limitations of Existing Methods:
    • Kronecker product models Σ=ΦΨ\Sigma = \Phi \otimes \Psi are overly simplistic
    • Kronecker sum models Σ=k=1KΦkΨk\Sigma = \sum_{k=1}^K \Phi_k \otimes \Psi_k lack sufficient flexibility
    • CANDECOMP/PARAFAC models face NP-hardness computationally

Novelty of This Work

The authors propose a tensor-train format covariance model: Σ=j=1Jk=1KUjVjkWk\Sigma = \sum_{j=1}^J \sum_{k=1}^K U_j \otimes V_{jk} \otimes W_k where UjRp×pU_j \in \mathbb{R}^{p \times p}, VjkRq×qV_{jk} \in \mathbb{R}^{q \times q}, WkRr×rW_k \in \mathbb{R}^{r \times r}, and pqr=dpqr = d.

Core Contributions

  1. Novel Covariance Model: Proposes a covariance structure based on tensor-train decomposition that naturally extends Kronecker sum and CANDECOMP/PARAFAC models while allowing richer inter-modal interactions.
  2. Efficient Algorithm: Designs the HardTTh algorithm (Hard Tensor Train Thresholding) based on TT-SVD and HOOI adapted to Tucker-2 hybrid structures, with computational complexity O((J+K)Td1d2d3)O((J+K)Td_1d_2d_3).
  3. Theoretical Guarantees: Establishes non-asymptotic, dimension-free convergence bounds—the first dimension-free theoretical results for TT-structured tensor estimation.
  4. Practical Validation: Numerical experiments verify the method's effectiveness and demonstrate the necessity of iterative refinement.

Methodology Details

Task Definition

Input: Independent and identically distributed samples X1,,XnRpqrX_1, \ldots, X_n \in \mathbb{R}^{pqr}Output: Estimate Σ~\tilde{\Sigma} of the covariance matrix Σ\SigmaConstraint: Σ\Sigma has TT structure, expressible as Σ=j=1Jk=1KUjVjkWk\Sigma = \sum_{j=1}^J \sum_{k=1}^K U_j \otimes V_{jk} \otimes W_k

Model Architecture

Tensor Reshaping and Decomposition

  1. Reshaping Operation: Reshape the covariance matrix ΣRpqr×pqr\Sigma \in \mathbb{R}^{pqr \times pqr} into a third-order tensor R(Σ)Rp2×q2×r2\mathcal{R}(\Sigma) \in \mathbb{R}^{p^2 \times q^2 \times r^2}
  2. TT Decomposition Representation: R(Σ)=j=1Jk=1Kvec(Uj)vec(Vjk)vec(Wk)\mathcal{R}(\Sigma) = \sum_{j=1}^J \sum_{k=1}^K \text{vec}(U_j) \otimes \text{vec}(V_{jk}) \otimes \text{vec}(W_k)
  3. Compact Form: R(Σ)=U×1V×3W\mathcal{R}(\Sigma) = U \times_1 V \times_3 W where UOp2,JU \in O_{p^2,J}, VOr2,KV \in O_{r^2,K}, WRJ×q2×KW \in \mathbb{R}^{J \times q^2 \times K}

HardTTh Algorithm

Algorithm 1: HardTTh

Input: Tensor Y ∈ ℝ^{d₁×d₂×d₃}, TT ranks (J,K), number of iterations T
Output: TT approximation T̂ = Û ×₁ V̂ ×₃ Ŵ

1. Compute truncated SVD of m₁(Y): Û₀, Σ₀,₁, Ũ₀ = SVD_J(m₁(Y))
2. Compute truncated SVD of m₃(Û₀ᵀ ×₁ Y): V̂₀, Σ₀,₂, Ṽ₀ = SVD_K(m₃(Û₀ᵀ ×₁ Y))

for t = 1, ..., T do:
3. Ût, Σt,₁, Ũt = SVD_J(m₁(V̂ₜ₋₁ᵀ ×₃ Y))
4. V̂t, Σt,₂, Ṽt = SVD_K(m₃(Ûtᵀ ×₁ Y))

5. Set Û = ÛT, V̂ = V̂T, Ŵ = Ûᵀ ×₁ V̂ᵀ ×₃ Y

Technical Innovations

  1. Tucker-2 Hybrid Structure: Unlike standard Tucker decomposition requiring three orthogonal factors, the TT structure requires only two, reducing computational complexity.
  2. Iterative Refinement Strategy: Progressively improves estimation accuracy through alternating optimization of modal subspaces.
  3. Hard Thresholding: Uses hard thresholding rather than soft thresholding, avoiding the NP-hardness of tensor nuclear norm approximation.

Experimental Setup

Data Generation Model

  • TT Ranks: J=7,K=9J = 7, K = 9
  • Dimensions: p=q=r=10p = q = r = 10, total dimension d=1000d = 1000
  • Generation Process:
    • Generate random symmetric matrices AjRp×pA_j \in \mathbb{R}^{p \times p}, BjkRq×qB_{jk} \in \mathbb{R}^{q \times q}, CkRr×rC_k \in \mathbb{R}^{r \times r}
    • Random vectors defined as: j=1Jk=1KAj×1Bjk×2Ck×3Eijk\sum_{j=1}^J \sum_{k=1}^K A_j \times_1 B_{jk} \times_2 C_k \times_3 E_{ijk}
    • where EijkE_{ijk} is a standard Gaussian tensor

Evaluation Metrics

Relative Error: S^ΣF/ΣF\|\hat{S} - \Sigma\|_F / \|\Sigma\|_F

Comparison Methods

  1. Sample Mean: Sample covariance estimator
  2. TT-HOSVD: HardTTh algorithm without iterations (T=0T=0)
  3. Tucker: Standard Tucker decomposition
  4. Tucker+HOOI: Tucker decomposition with HOOI iterations
  5. PRLS: Modified regularized least squares

Implementation Details

  • Number of iterations: T=10T = 10
  • PRLS parameters: Tuned on logarithmic scale grid for λ1,λ2\lambda_1, \lambda_2
  • Experiment repetitions: 16-32 repetitions per setting

Experimental Results

Main Results

Sample SizeSample MeanTT-HOSVDHardTThTuckerTucker+HOOIPRLS
n=5001.22±0.020.269±0.0080.238±0.0130.252±0.0070.240±0.0130.238±0.017
n=20000.611±0.0090.154±0.0060.082±0.0050.150±0.0050.082±0.0050.216±0.012
n=40000.430±0.0070.105±0.0080.054±0.0020.105±0.0070.054±0.0020.217±0.015

Key Findings

  1. Necessity of Iterations: HardTTh shows significant improvement over TT-HOSVD, particularly at n=2000 where relative error decreases from 0.154 to 0.082.
  2. Convergence Behavior:
    • At n=500: sinΘ(ImU^0,ImU)1\sin\Theta(\text{Im}\hat{U}_0, \text{Im}U^*) \approx 1, sinΘ(ImU^T,ImU)1\sin\Theta(\text{Im}\hat{U}_T, \text{Im}U^*) \approx 1
    • At n=2000: sinΘ(ImU^0,ImU)1\sin\Theta(\text{Im}\hat{U}_0, \text{Im}U^*) \approx 1, sinΘ(ImU^T,ImU)=0.33±0.08\sin\Theta(\text{Im}\hat{U}_T, \text{Im}U^*) = 0.33±0.08
  3. Computational Efficiency: HardTTh has moderate time complexity, faster than full Tucker decomposition but slower than TT-HOSVD.

Theoretical Validation

Experiments confirm the necessity of theoretical conditions: when singular value conditions are not satisfied (e.g., n=500), the algorithm cannot effectively recover subspaces; when conditions are satisfied (e.g., n≥2000), iterations significantly improve performance.

Kronecker Product Models

  • Single Kronecker Product: Werner et al. (2008) proposed the Σ=ΦΨ\Sigma = \Phi \otimes \Psi model
  • Kronecker Sum: Tsiligkaridis & Hero (2013), Puchkin & Rakhuba (2024) studied the Σ=kΦkΨk\Sigma = \sum_k \Phi_k \otimes \Psi_k model

Tensor Decomposition Methods

  • CP Decomposition: Faces computational complexity issues
  • Tucker Decomposition: Zhang & Xia (2018) and others established dimension-dependent bounds
  • TT Decomposition: Oseledets (2011) proposed it; this paper is the first to apply it to covariance estimation

Theoretical Progress

  • Dimension-Dependent Bounds: Most existing results depend on ambient dimension
  • Dimension-Free Bounds: Limited to simple cases; this paper extends to TT structures

Conclusions and Discussion

Main Conclusions

  1. Model Advantages: The TT-format covariance model provides richer structure than traditional Kronecker models while maintaining computational feasibility.
  2. Algorithm Effectiveness: The HardTTh algorithm achieves polynomial-time complexity and significantly improves estimation quality through iterations.
  3. Theoretical Guarantees: Establishes the first dimension-free convergence bounds for TT-structured estimation, with variance term: v~=96ωΣJr12(Σ)+JKr22(Σ)+Kr32(Σ)+log(48/δ)n\tilde{v} = 96\omega\|\Sigma\|\sqrt{\frac{Jr_1^2(\Sigma) + JKr_2^2(\Sigma) + Kr_3^2(\Sigma) + \log(48/\delta)}{n}}

Limitations

  1. Singular Value Conditions: The algorithm requires σJ(m1(R(Σ)))Σr22(Σ)r32(Σ)/n\sigma_J(m_1(\mathcal{R}(\Sigma))) \gtrsim \|\Sigma\|\sqrt{r_2^2(\Sigma)r_3^2(\Sigma)/n}, which is stronger than theoretically optimal conditions.
  2. Noise Structure: Theoretical analysis assumes specific noise structure, differing from homogeneous noise.
  3. Parameter Selection: Choosing TT ranks (J,K)(J,K) requires prior knowledge or data-driven methods.

Future Directions

  1. Debiasing Methods: Develop debiasing techniques for non-homogeneous noise.
  2. Adaptive Rank Selection: Establish theoretically guaranteed rank selection methods.
  3. Extended Applications: Extend the method to other structured matrix estimation problems.

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First to provide dimension-free theoretical bounds for TT-structured covariance estimation, filling an important theoretical gap.
  2. Practical Methodology: HardTTh algorithm has reasonable computational complexity and avoids NP-hardness.
  3. Comprehensive Experiments: Validates effectiveness through multiple comparison methods and varying sample sizes.
  4. Thorough Analysis: Provides detailed theoretical analysis and algorithm convergence studies.

Weaknesses

  1. Strong Conditions: Theoretical conditions are stricter than known lower bounds, indicating a statistical-computational gap.
  2. Model Limitations: Only applicable to covariance matrices well-approximated by TT format.
  3. Parameter Sensitivity: Performance depends on correct selection of TT rank parameters.

Impact

  1. Theoretical Contribution: Provides new theoretical tools for tensor methods in high-dimensional statistics.
  2. Practical Value: Potential applications in multi-modal data analysis, signal processing, and related fields.
  3. Methodological Significance: Demonstrates effective application of tensor decomposition techniques to statistical estimation problems.

Applicable Scenarios

  1. Multi-Modal Data: Images, videos, and other data with natural tensor structure
  2. Spatio-Temporal Data: Covariance estimation for data with temporal-spatial structure
  3. High-Dimensional Financial Data: Structured covariance modeling of asset returns
  4. Sensor Networks: Covariance estimation for multi-sensor data

References

  1. Werner, K., Jansson, M., & Stoica, P. (2008). On estimation of covariance matrices with Kronecker product structure.
  2. Tsiligkaridis, T., & Hero, A. O. (2013). Covariance estimation in high dimensions via Kronecker product expansions.
  3. Zhang, A., & Xia, D. (2018). Tensor SVD: Statistical and computational limits.
  4. Puchkin, N., & Rakhuba, M. (2024). Dimension-free structured covariance estimation.
  5. Oseledets, I. V. (2011). Tensor-train decomposition.