2025-11-10T02:36:50.165419

A Spatio-temporal CP decomposition analysis of New England region in the US

Sanogo
Spatio temporal data consist of measurement for one or more raster fields such as weather, traffic volume, crime rate, or disease incidents. Advances in modern technology have increased the number of available information for this type of data hence the rise of multidimensional data. In this paper we take advantage of the multidimensional structure of the data but also its temporal and spatial structure. In fact, we will be using the NCAR Climate Data Gateway website which provides data discovery and access services for global and regional climate model data. The daily values of total precipitation (prec), maximum (tmax), and minimum (tmin) temperature are combined to create a multidimensional data called tensor (a multidimensional array). In this paper, we propose a spatio temporal principal component analysis to initialize CP decomposition component. We take full advantage of the spatial and temporal structure of the data in the initialization step for cp component analysis. The performance of our method is tested via comparison with most popular initialization method. We also run a clustering analysis to further show the performance of our analysis.
academic

A Spatio-temporal CP decomposition analysis of New England region in the US

Basic Information

  • Paper ID: 2510.10322
  • Title: A Spatio-temporal CP decomposition analysis of New England region in the US
  • Author: Fatoumata Sanogo (Bates College Mathematics Department)
  • Classification: stat.AP cs.NA math.NA
  • Publication Date: October 11, 2024 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10322

Abstract

Spatio-temporal data comprise measurements of one or more gridded fields, such as weather, traffic flow, crime rates, or disease incidence. Advances in modern technology have increased the volume of available information in such datasets, resulting in multidimensional data. This paper leverages the multidimensional structure of data along with its temporal and spatial characteristics. Using global and regional climate model data provided by the NCAR Climate Data Gateway website, the authors construct a multidimensional data tensor by combining daily values of total precipitation (prec), maximum temperature (tmax), and minimum temperature (tmin). The paper proposes spatio-temporal principal component analysis to initialize CP decomposition components, fully exploiting the spatial and temporal structure of the data during the initialization step of CP component analysis.

Research Background and Motivation

  1. Problem to be Addressed: Traditional tensor decomposition methods (such as CP decomposition) lack initialization strategies specifically tailored to spatio-temporal correlations when processing climate spatio-temporal data, resulting in poor factor identifiability and low reconstruction accuracy.
  2. Problem Significance:
    • Global climate change leads to frequent extreme weather events, necessitating more reliable prediction and diagnostic tools
    • Numerical Earth system models face challenges of lengthy computation times and exponential growth in data dimensionality
    • Statistical and machine learning methods are needed to complement physics-based models
  3. Limitations of Existing Methods:
    • Although PCA can extract dominant variance modes, it processes variables independently and imposes orthogonality constraints, lacking physical interpretability
    • Random initialization and HOSVD initialization do not account for the inherent structure of spatio-temporal data
    • Existing tensor decomposition methods have limited applications in climate research
  4. Research Motivation: Develop CP decomposition initialization strategies that specifically exploit the spatio-temporal correlations in climate data to improve factor identifiability and reconstruction accuracy.

Core Contributions

  1. Proposed a novel initialization procedure: Enhances the reconstruction quality and interpretability of CP decomposition by leveraging spatio-temporal correlations
  2. Constructed empirical evaluation on NCAR precipitation and temperature datasets: Provides benchmark comparisons with common initialization methods
  3. Performed clustering analysis: Demonstrates the interpretive value and model performance of CP-derived factors
  4. Provided a theoretical framework for spatio-temporal tensor decomposition: Offers a scalable analytical framework for climate data analysis

Methodology Details

Task Definition

Given a three-dimensional tensor XRI×J×K\mathcal{X} \in \mathbb{R}^{I \times J \times K}, where II is the temporal dimension, JJ is the spatial dimension, and KK is the variable dimension, the objective is to find the optimal CP decomposition: X=r=1Rarbrcr=[[A,B,C]]\mathcal{X} = \sum_{r=1}^{R} \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r = [[\mathbf{A}, \mathbf{B}, \mathbf{C}]]

Model Architecture

1. Spatio-temporal Principal Component Analysis (STPCA)

  • Data Transformation: Converts the data matrix into a multivariate functional data set through Fourier basis transformation: ϕ0(t)=1T,ϕ2j1(t)=2Tsin(2πjtT),ϕ2j(t)=2Tcos(2πjtT)\phi_0(t) = \frac{1}{\sqrt{T}}, \quad \phi_{2j-1}(t) = \sqrt{\frac{2}{T}}\sin\left(\frac{2\pi j t}{T}\right), \quad \phi_{2j}(t) = \sqrt{\frac{2}{T}}\cos\left(\frac{2\pi j t}{T}\right)
  • Spatial Weight Matrix: Employs Moran's index combined with spatial weight matrix W\mathbf{W} to obtain the spatial correlation matrix
  • Feature Extraction: Extracts eigenvalues that can be either positive or negative along with their corresponding spatio-temporal principal components

2. CP Decomposition Optimization

Optimizes factor matrices using Alternating Least Squares (ALS):

  • Fixes two factor matrices while updating the current factor matrix through gradient descent
  • Uses STPCA results for initialization rather than random initialization or HOSVD initialization

3. K-means Clustering

Applies K-means clustering to the extracted factor matrices: minA,B,C,G,S,TX1TA(SB)TF2+λAGSF2+η(BF2+CF2)\min_{\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{G},\mathbf{S},\mathbf{T}} \|\mathbf{X}_1 - \mathbf{T}\mathbf{A}(\mathbf{S} \odot \mathbf{B})^T\|_F^2 + \lambda\|\mathbf{A} - \mathbf{G}\mathbf{S}\|_F^2 + \eta(\|\mathbf{B}\|_F^2 + \|\mathbf{C}\|_F^2)

Technical Innovations

  1. Spatio-temporal Structure-Aware Initialization: First explicitly incorporates spatio-temporal correlations into the CP decomposition initialization process
  2. Multi-scale Feature Extraction: Simultaneously captures temporal and spatial patterns through Fourier transformation and spatial weight matrices
  3. Elimination of Additional Diagonalization Steps: Avoids the SimDiag step compared to TASD methods, improving computational efficiency

Experimental Setup

Dataset

  • Data Source: NA-CORDEX dataset from NCAR Climate Data Gateway
  • Temporal Range: January 1, 1979 to December 31, 2024 (13,149 days)
  • Spatial Range: New England region of the United States (Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut)
  • Spatial Resolution: 0.22° (50 kilometers), 31×34 grid cells (total of 1,054 grid points)
  • Variables: Total precipitation (prec), maximum temperature (tmax), minimum temperature (tmin)
  • Tensor Dimensions: XR13149×1054×3\mathcal{X} \in \mathbb{R}^{13149 \times 1054 \times 3}

Evaluation Metrics

  1. Reconstruction Relative Error: XestimateX2X2\frac{\|\mathcal{X}_{estimate} - \mathcal{X}\|_2}{\|\mathcal{X}\|_2}
  2. Silhouette Coefficient: bamax(a,b)\frac{b-a}{\max(a,b)}, where aa is intra-cluster distance and bb is nearest cluster distance

Comparison Methods

  1. HOSVD+CPD: CP decomposition initialized with Higher-Order Singular Value Decomposition
  2. Random+CPD: CP decomposition with random initialization
  3. STPCA+CPD: The proposed method

Implementation Details

  • CP decomposition rank: R = 2, 3
  • Clustering analysis k-value range: 2-12
  • Comparative experiments conducted using MATLAB Tensor Toolbox

Experimental Results

Main Results

Reconstruction Error Comparison

Initialization MethodRelative Error (Rank=2)Relative Error (Rank=3)
HOSVD0.49280.3832
Random0.49300.3849
STPCA0.49100.3810

The STPCA method achieves the lowest reconstruction relative error under both rank settings.

Clustering Performance Comparison

Silhouette Coefficients at Rank=2:

Initialization MethodMode 1 SilhouetteOptimal kMode 2 SilhouetteOptimal k
HOSVD0.648420.58722
Random0.65820.62
STPCA0.799020.61844

Silhouette Coefficients at Rank=3:

Initialization MethodMode 1 SilhouetteOptimal kMode 2 SilhouetteOptimal k
HOSVD0.493230.65282
Random0.51330.6482
STPCA0.645620.67212

Experimental Findings

  1. Spatio-temporal Correlation Analysis:
    • Precipitation exhibits weak spatial and temporal correlations
    • Maximum and minimum temperatures demonstrate strong spatio-temporal correlations, particularly pronounced in spring and autumn seasons
    • Temperature variables exhibit highly similar autocorrelation function shapes
  2. Performance Improvement: STPCA initialization outperforms traditional methods across all tested configurations
  3. Computational Efficiency: The STPCA method avoids additional diagonalization steps, resulting in faster computation
  1. Tensor Decomposition Methods: CP decomposition was first introduced by Hitchcock (1927) and later developed by Carroll and Chang (1970) and Harshman (1970)
  2. Spatial PCA: Principal component analysis methods that account for spatial autocorrelation
  3. Climate Data Analysis: Applications of Empirical Orthogonal Function (EOF) analysis in climate science
  4. Deep Learning Methods: Applications of convolutional neural networks and graph neural networks in climate modeling

Conclusions and Discussion

Main Conclusions

  1. The proposed STPCA+CPD method outperforms traditional initialization methods in both reconstruction accuracy and clustering performance
  2. Explicitly leveraging spatio-temporal dependencies significantly improves CP decomposition performance
  3. This framework provides a scalable solution for analyzing multivariate climate datasets

Limitations

  1. Validation has been conducted only on climate data from the New England region; generalization capability requires further verification
  2. Only decompositions with 2 and 3 components have been considered; higher-rank cases require further investigation
  3. The choice of spatial weight matrix may influence results; more in-depth sensitivity analysis is needed

Future Directions

  1. Integrate deep learning architectures to capture complex spatio-temporal dynamics
  2. Investigate more robust spatio-temporal tensor decomposition schemes
  3. Extend the tensor framework to prediction and downscaling applications

In-Depth Evaluation

Strengths

  1. Methodological Innovation: First explicitly incorporates spatio-temporal correlations into CP decomposition initialization with clear theoretical motivation
  2. Experimental Comprehensiveness: Conducts comprehensive comparative experiments and clustering analysis on real climate data
  3. Result Convincingness: Achieves consistent performance improvements across multiple evaluation metrics
  4. Practical Value: Provides new tools and perspectives for climate data analysis

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical analysis of convergence and statistical guarantees
  2. Limited Experimental Scale: Validation conducted only in a single region and with limited decomposition ranks
  3. Parameter Sensitivity: Insufficient discussion of the impact of spatial weight matrix and Fourier basis number selection
  4. Computational Complexity: Lacks detailed computational complexity analysis

Impact

  1. Academic Contribution: Provides a new initialization strategy for tensor decomposition of spatio-temporal data
  2. Application Value: Possesses potential applications in climate science, environmental monitoring, and related fields
  3. Reproducibility: Provides detailed experimental settings, though code has not been publicly released

Applicable Scenarios

  1. Large-scale spatio-temporal climate data analysis
  2. Pattern recognition in environmental monitoring data
  3. Multivariate data dimensionality reduction requiring consideration of spatio-temporal correlations
  4. Regional analysis in climate change research

References

  • Hitchcock, F.L. (1927). The expression of a tensor or a polyadic as a sum of products
  • Carroll, J.D., Chang, J. (1970). Analysis of individual differences in multidimensional scaling
  • Harshman, R. (1970). Foundations of the parafac procedure
  • Krzýsko, M., et al. (2024). Spatio-temporal principal component analysis