2025-11-10T02:36:50.165419

A Spatio-temporal CP decomposition analysis of New England region in the US

Sanogo

Spatio temporal data consist of measurement for one or more raster fields such as weather, traffic volume, crime rate, or disease incidents. Advances in modern technology have increased the number of available information for this type of data hence the rise of multidimensional data. In this paper we take advantage of the multidimensional structure of the data but also its temporal and spatial structure. In fact, we will be using the NCAR Climate Data Gateway website which provides data discovery and access services for global and regional climate model data. The daily values of total precipitation (prec), maximum (tmax), and minimum (tmin) temperature are combined to create a multidimensional data called tensor (a multidimensional array). In this paper, we propose a spatio temporal principal component analysis to initialize CP decomposition component. We take full advantage of the spatial and temporal structure of the data in the initialization step for cp component analysis. The performance of our method is tested via comparison with most popular initialization method. We also run a clustering analysis to further show the performance of our analysis.

academic

A Spatio-temporal CP decomposition analysis of New England region in the US

Basic Information

Paper ID: 2510.10322
Title: A Spatio-temporal CP decomposition analysis of New England region in the US
Author: Fatoumata Sanogo (Bates College Mathematics Department)
Classification: stat.AP cs.NA math.NA
Publication Date: October 11, 2024 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.10322

Abstract

Spatio-temporal data comprise measurements of one or more gridded fields, such as weather, traffic flow, crime rates, or disease incidence. Advances in modern technology have increased the volume of available information in such datasets, resulting in multidimensional data. This paper leverages the multidimensional structure of data along with its temporal and spatial characteristics. Using global and regional climate model data provided by the NCAR Climate Data Gateway website, the authors construct a multidimensional data tensor by combining daily values of total precipitation (prec), maximum temperature (tmax), and minimum temperature (tmin). The paper proposes spatio-temporal principal component analysis to initialize CP decomposition components, fully exploiting the spatial and temporal structure of the data during the initialization step of CP component analysis.

Research Background and Motivation

Problem to be Addressed: Traditional tensor decomposition methods (such as CP decomposition) lack initialization strategies specifically tailored to spatio-temporal correlations when processing climate spatio-temporal data, resulting in poor factor identifiability and low reconstruction accuracy.
Problem Significance:
- Global climate change leads to frequent extreme weather events, necessitating more reliable prediction and diagnostic tools
- Numerical Earth system models face challenges of lengthy computation times and exponential growth in data dimensionality
- Statistical and machine learning methods are needed to complement physics-based models
Limitations of Existing Methods:
- Although PCA can extract dominant variance modes, it processes variables independently and imposes orthogonality constraints, lacking physical interpretability
- Random initialization and HOSVD initialization do not account for the inherent structure of spatio-temporal data
- Existing tensor decomposition methods have limited applications in climate research
Research Motivation: Develop CP decomposition initialization strategies that specifically exploit the spatio-temporal correlations in climate data to improve factor identifiability and reconstruction accuracy.

Core Contributions

Proposed a novel initialization procedure: Enhances the reconstruction quality and interpretability of CP decomposition by leveraging spatio-temporal correlations
Constructed empirical evaluation on NCAR precipitation and temperature datasets: Provides benchmark comparisons with common initialization methods
Performed clustering analysis: Demonstrates the interpretive value and model performance of CP-derived factors
Provided a theoretical framework for spatio-temporal tensor decomposition: Offers a scalable analytical framework for climate data analysis

Methodology Details

Task Definition

Given a three-dimensional tensor $\mathcal{X} \in \mathbb{R}^{I \times J \times K}$ , where $I$ is the temporal dimension, $J$ is the spatial dimension, and $K$ is the variable dimension, the objective is to find the optimal CP decomposition: $\mathcal{X} = \sum_{r=1}^{R} \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r = [[\mathbf{A}, \mathbf{B}, \mathbf{C}]]$

Model Architecture

1. Spatio-temporal Principal Component Analysis (STPCA)

Data Transformation: Converts the data matrix into a multivariate functional data set through Fourier basis transformation: $\phi_0(t) = \frac{1}{\sqrt{T}}, \quad \phi_{2j-1}(t) = \sqrt{\frac{2}{T}}\sin\left(\frac{2\pi j t}{T}\right), \quad \phi_{2j}(t) = \sqrt{\frac{2}{T}}\cos\left(\frac{2\pi j t}{T}\right)$
Spatial Weight Matrix: Employs Moran's index combined with spatial weight matrix $\mathbf{W}$ to obtain the spatial correlation matrix
Feature Extraction: Extracts eigenvalues that can be either positive or negative along with their corresponding spatio-temporal principal components

2. CP Decomposition Optimization

Optimizes factor matrices using Alternating Least Squares (ALS):

Fixes two factor matrices while updating the current factor matrix through gradient descent
Uses STPCA results for initialization rather than random initialization or HOSVD initialization

3. K-means Clustering

Applies K-means clustering to the extracted factor matrices: $\min_{\mathbf{A},\mathbf{B},\mathbf{C},\mathbf{G},\mathbf{S},\mathbf{T}} \|\mathbf{X}_1 - \mathbf{T}\mathbf{A}(\mathbf{S} \odot \mathbf{B})^T\|_F^2 + \lambda\|\mathbf{A} - \mathbf{G}\mathbf{S}\|_F^2 + \eta(\|\mathbf{B}\|_F^2 + \|\mathbf{C}\|_F^2)$

Technical Innovations

Spatio-temporal Structure-Aware Initialization: First explicitly incorporates spatio-temporal correlations into the CP decomposition initialization process
Multi-scale Feature Extraction: Simultaneously captures temporal and spatial patterns through Fourier transformation and spatial weight matrices
Elimination of Additional Diagonalization Steps: Avoids the SimDiag step compared to TASD methods, improving computational efficiency

Experimental Setup

Dataset

Data Source: NA-CORDEX dataset from NCAR Climate Data Gateway
Temporal Range: January 1, 1979 to December 31, 2024 (13,149 days)
Spatial Range: New England region of the United States (Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut)
Spatial Resolution: 0.22° (50 kilometers), 31×34 grid cells (total of 1,054 grid points)
Variables: Total precipitation (prec), maximum temperature (tmax), minimum temperature (tmin)
Tensor Dimensions: $\mathcal{X} \in \mathbb{R}^{13149 \times 1054 \times 3}$

Evaluation Metrics

Reconstruction Relative Error: $\frac{\|\mathcal{X}_{estimate} - \mathcal{X}\|_2}{\|\mathcal{X}\|_2}$
Silhouette Coefficient: $\frac{b-a}{\max(a,b)}$ , where $a$ is intra-cluster distance and $b$ is nearest cluster distance

Comparison Methods

HOSVD+CPD: CP decomposition initialized with Higher-Order Singular Value Decomposition
Random+CPD: CP decomposition with random initialization
STPCA+CPD: The proposed method

Implementation Details

CP decomposition rank: R = 2, 3
Clustering analysis k-value range: 2-12
Comparative experiments conducted using MATLAB Tensor Toolbox

Initialization Method	Relative Error (Rank=2)	Relative Error (Rank=3)
HOSVD	0.4928	0.3832
Random	0.4930	0.3849
STPCA	0.4910	0.3810

The STPCA method achieves the lowest reconstruction relative error under both rank settings.

Clustering Performance Comparison

Silhouette Coefficients at Rank=2:

Initialization Method	Mode 1 Silhouette	Optimal k	Mode 2 Silhouette	Optimal k
HOSVD	0.6484	2	0.5872	2
Random	0.658	2	0.6	2
STPCA	0.7990	2	0.6184	4

Silhouette Coefficients at Rank=3:

Initialization Method	Mode 1 Silhouette	Optimal k	Mode 2 Silhouette	Optimal k
HOSVD	0.4932	3	0.6528	2
Random	0.513	3	0.648	2
STPCA	0.6456	2	0.6721	2

Experimental Findings

Spatio-temporal Correlation Analysis:
- Precipitation exhibits weak spatial and temporal correlations
- Maximum and minimum temperatures demonstrate strong spatio-temporal correlations, particularly pronounced in spring and autumn seasons
- Temperature variables exhibit highly similar autocorrelation function shapes
Performance Improvement: STPCA initialization outperforms traditional methods across all tested configurations
Computational Efficiency: The STPCA method avoids additional diagonalization steps, resulting in faster computation

Tensor Decomposition Methods: CP decomposition was first introduced by Hitchcock (1927) and later developed by Carroll and Chang (1970) and Harshman (1970)
Spatial PCA: Principal component analysis methods that account for spatial autocorrelation
Climate Data Analysis: Applications of Empirical Orthogonal Function (EOF) analysis in climate science
Deep Learning Methods: Applications of convolutional neural networks and graph neural networks in climate modeling

Conclusions and Discussion

Main Conclusions

The proposed STPCA+CPD method outperforms traditional initialization methods in both reconstruction accuracy and clustering performance
Explicitly leveraging spatio-temporal dependencies significantly improves CP decomposition performance
This framework provides a scalable solution for analyzing multivariate climate datasets

Limitations

Validation has been conducted only on climate data from the New England region; generalization capability requires further verification
Only decompositions with 2 and 3 components have been considered; higher-rank cases require further investigation
The choice of spatial weight matrix may influence results; more in-depth sensitivity analysis is needed

Future Directions

Integrate deep learning architectures to capture complex spatio-temporal dynamics
Investigate more robust spatio-temporal tensor decomposition schemes
Extend the tensor framework to prediction and downscaling applications

In-Depth Evaluation

Strengths

Methodological Innovation: First explicitly incorporates spatio-temporal correlations into CP decomposition initialization with clear theoretical motivation
Experimental Comprehensiveness: Conducts comprehensive comparative experiments and clustering analysis on real climate data
Result Convincingness: Achieves consistent performance improvements across multiple evaluation metrics
Practical Value: Provides new tools and perspectives for climate data analysis

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical analysis of convergence and statistical guarantees
Limited Experimental Scale: Validation conducted only in a single region and with limited decomposition ranks
Parameter Sensitivity: Insufficient discussion of the impact of spatial weight matrix and Fourier basis number selection
Computational Complexity: Lacks detailed computational complexity analysis

Impact

Academic Contribution: Provides a new initialization strategy for tensor decomposition of spatio-temporal data
Application Value: Possesses potential applications in climate science, environmental monitoring, and related fields
Reproducibility: Provides detailed experimental settings, though code has not been publicly released

Applicable Scenarios

Large-scale spatio-temporal climate data analysis
Pattern recognition in environmental monitoring data
Multivariate data dimensionality reduction requiring consideration of spatio-temporal correlations
Regional analysis in climate change research

References

Hitchcock, F.L. (1927). The expression of a tensor or a polyadic as a sum of products
Carroll, J.D., Chang, J. (1970). Analysis of individual differences in multidimensional scaling
Harshman, R. (1970). Foundations of the parafac procedure
Krzýsko, M., et al. (2024). Spatio-temporal principal component analysis

A Spatio-temporal CP decomposition analysis of New England region in the US

A Spatio-temporal CP decomposition analysis of New England region in the US

Basic Information

Abstract

Research Background and Motivation

Core Contributions

Methodology Details

Task Definition

Model Architecture

1. Spatio-temporal Principal Component Analysis (STPCA)

2. CP Decomposition Optimization

3. K-means Clustering

Technical Innovations

Experimental Setup

Dataset

Evaluation Metrics

Comparison Methods

Implementation Details

Experimental Results

Main Results

Reconstruction Error Comparison

Clustering Performance Comparison

Experimental Findings

Conclusions and Discussion

Main Conclusions

Limitations

Future Directions

In-Depth Evaluation

Strengths

Weaknesses

Impact

Applicable Scenarios

References