2025-11-20T07:19:14.926764

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

Bhowmick, Ramanathan, Aakur

Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.

academic

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

Basic Information

Paper ID: 2510.09593
Title: STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging
Authors: Disharee Bhowmick, Ranjith Ramanathan, Sathyanarayanan N. Aakur
Classification: cs.LG (Machine Learning), cs.CV (Computer Vision)
Publication Date: October 2025
Paper Link: https://arxiv.org/abs/2510.09593

Abstract

Temporal sequence data typically contains latent temporal structures, such as transitions between local stationary states, recurring patterns, and variability bursts, yet these structures are rarely exploited in standard representation learning pipelines. Existing models typically process raw or fixed-window sequences, treating all time steps as equally important, which leads to inefficiency, poor robustness, and limited scalability in long or noisy sequences. This paper proposes STaTS, a lightweight unsupervised framework for structure-aware temporal sequence summarization, capable of adaptively compressing univariate and multivariate time series into compact, information-preserving token sequences.

Research Background and Motivation

Problem Definition

Temporal sequence data is ubiquitous in finance, Internet of Things, healthcare, and other domains. With advances in sensing technology, the length and complexity of recorded time series have grown rapidly, imposing enormous computational demands on machine learning-based sequence understanding frameworks.

Limitations of Existing Methods

Traditional Methods: Approaches such as PAA (Piecewise Aggregate Approximation), SAX (Symbolic Aggregate approXimation), and DTW (Dynamic Time Warping) achieve effective summarization but rely on uniform windowing or rigid symbolic encoding, ignoring dynamic variations in signal complexity.
Deep Learning Methods: Methods like TS2Vec and TS-TCC process complete sequences or apply sliding windows without considering semantic changes, resulting in redundancy, computational overhead, and misalignment between model tokenization and true signal transitions.

Research Motivation

Existing methods suffer from the following issues:

Fixed window strategies may over-segment stable regions while under-segmenting complex regions
Under noisy conditions, uniformly processed inputs tend to amplify spurious patterns and reduce generalization ability
Lack of structure awareness leads to inefficiency and error propagation

Core Contributions

Proposes STaTS Framework: A structure-aware tokenization framework based on BIC-driven change detection criteria that identifies statistically coherent segments across multiple temporal scales
Modular Lightweight Summarization Pipeline: Compresses time series by over 30× while preserving significant patterns, enabling efficient downstream modeling
Model-Agnostic Unsupervised Method: Requires no architectural changes or gradient-based tuning, directly compatible with existing time series encoders (e.g., TS2Vec)
Unified Interface: Applicable to classification, forecasting, and robustness tasks, serving as a general-purpose time series summarization preprocessing tool

Methodology Details

Task Definition

Given a multivariate time series $X \in \mathbb{R}^{T \times d}$ (where $T$ is the number of time steps and $d$ is the dimensionality), the objective is to transform $X$ into a shorter sequence $\tilde{X} \in \mathbb{R}^{T' \times d}$ where $T' \ll T$ , while preserving the underlying structure required for downstream tasks.

Model Architecture

1. Tokenization Phase

Multi-Scale Coherence Detection:

Uses BIC (Bayesian Information Criterion) to assess statistical similarity between adjacent time windows
For adjacent windows $x_1, x_2 \in \mathbb{R}^{\delta \times d}$ , compute:

$\Delta BIC = -2(\ell_{joint} - \ell_{sep}) + k \log(2\delta)$

where:

$\ell_{sep} = -\frac{\delta}{2}(\log|\Sigma_1| + \log|\Sigma_2|)$
$\ell_{joint} = -\delta \log|\Sigma_{12}|$
$k = d + \frac{d(d+1)}{2}$ (number of free parameters in the full covariance model)

Global Objective Function: $L_{BIC}(\{S_i\}) = \sum_{i=1}^{T'} \left(-\frac{|S_i|}{2}\log|\Sigma_i| + \frac{k}{2}\log|S_i|\right)$

Multi-Scale Evaluation:

Evaluates statistical coherence at each $\delta$ value within a predefined range
Identifies candidate segmentation points using adaptive threshold $\mu_\delta + \alpha \cdot \sigma_\delta$
Eliminates redundant detections through non-maximum suppression

2. Summarization Phase

Summarization Function: $\phi(S_i) = \frac{1}{|S_i|} \sum_{t=\tau_{i-1}}^{\tau_i-1} x_t$

Uses mean pooling as the default summarization operation, capturing first-order statistical properties of segments.

Technical Innovations

Adaptive Segmentation: Unlike fixed window methods, STaTS dynamically adjusts segment boundaries based on local statistical changes
Multivariate Extension: Naturally extends to multivariate time series through full covariance matrices
Multi-Scale Detection: Detects changes at different temporal resolutions, capturing both short-term bursts and long-term gradual changes
Statistical Validity: Under multivariate Gaussian assumptions, segment means are sufficient statistics

Experimental Setup

Datasets

Univariate Classification: UCR-128 (128 datasets) and UCR-85 (85 datasets)
Multivariate Classification: UEA-30 (30 datasets)
Multivariate Forecasting: ETTh1, ETTh2, ETTm1, Electricity

Evaluation Metrics

Classification Tasks: Average accuracy and average ranking
Forecasting Tasks: Normalized Mean Squared Error (nMSE)

Baseline Methods

Classification Baselines: T-Loss, TNC, TS-TCC, TST, DTW, TS2Vec
Compression Variants: TS2Vec (uniform), TS2Vec (GMM)
Forecasting Baselines: Informer, TCN

Implementation Details

Window size range: $\delta \in \{5, 10, ..., 500\}$
Threshold parameter: $\alpha = 2$
Minimum separation distance: $s_{min} = 20$
Numerical stability: Covariance regularization $\epsilon = 10^{-6}$

Experimental Results

Main Results

Univariate Classification Performance

Model	UCR-85 Accuracy	UCR-85 Ranking	UCR-128 Accuracy	UCR-128 Ranking	Avg Length
TS2Vec (ori)	0.829	1.99	0.829	2.02	424.4/534.5
TS2Vec (mean)	0.739	4.82	0.741	4.39	12.1/12.9
TS2Vec (uniform)	0.621	8.21	0.616	8.10	12.1/12.9
TS2Vec (GMM)	0.655	7.35	0.664	6.92	60.7/73.2

Key Findings:

STaTS achieves 33× compression while maintaining approximately 90% of original performance
Significantly outperforms uniform segmentation and GMM baselines

Noise Robustness

Model	UCR-85 (Noisy)	UCR-128 (Noisy)
TS2Vec (ori)	0.336	0.412
TS2Vec (mean)	0.581	0.603
TS2Vec (uniform)	0.475	0.485
TS2Vec (GMM)	0.505	0.522

Important Finding: Under noisy conditions, STaTS not only maintains competitive advantages but also significantly outperforms full-resolution models.

Multivariate Classification

TS2Vec (mean): Accuracy 0.622, Ranking 4.70, 20× compression
Outperforms all compression variants while maintaining competitive performance with the original model

Time Series Forecasting

In long-term forecasting (H=720), STaTS matches or exceeds the original TS2Vec on multiple datasets while achieving 15× compression.

Ablation Studies

Segmentation Strategy Comparison: Statistical segmentation > GMM segmentation > uniform segmentation
Multi-Scale Evaluation: Multi-scale detection outperforms single-scale approaches
Summarization Functions: Mean pooling performs best across most tasks

Case Analysis

Qualitative analysis demonstrates that STaTS better tracks true signal trends in long-term forecasting, reducing oscillation artifacts, particularly excelling in ultra-long forecasting horizons (H=720).

Time Series Classification

Classical Methods: Shapelets, BOSS and other symbol-based approaches
Deep Learning: FCN, ResNet, InceptionTime
Ensemble Methods: HIVE-COTE

Time Series Forecasting

Early Breakthroughs: Sequence-to-sequence LSTM
Modern Methods: DeepAR, N-BEATS, Temporal Fusion Transformer
Attention Mechanisms: Informer and other sparse attention methods

Time Series Summarization

Traditional Methods: PAA, SAX (limited to univariate, fixed-length)
Modern Methods: TICC (computationally expensive, requires optimization solving)
STaTS Advantages: Lightweight, model-agnostic, multivariate support

Conclusions and Discussion

Main Conclusions

STaTS achieves efficient structure-aware time series compression, delivering 30× compression while maintaining 85-90% of original performance
Demonstrates excellent performance under noisy conditions, providing implicit denoising effects
As a model-agnostic preprocessor, seamlessly integrates into existing frameworks

Limitations

Statistical Assumptions: Assumes local statistical coherence within segments, potentially underperforming on dynamic ruptures or chaotic systems
Non-End-to-End: Does not use gradient-based feedback to adapt compression strategies
Parameter Sensitivity: Requires tuning of window size range and threshold parameters

Future Directions

Online/Streaming Settings: Extension to real-time summarization and edge deployment
Multimodal Data: Integration into sensor networks or hierarchical data like videos
Adaptive Learning: End-to-end adaptive learning systems under distribution shift or concept drift

In-Depth Evaluation

Strengths

Methodological Innovation: First application of multi-scale BIC criteria to adaptive segmentation of multivariate time series
Experimental Comprehensiveness: Extensive evaluation on 150+ datasets covering both classification and forecasting tasks
Practical Value: Significant computational efficiency gains (30× compression) with minimal performance loss
Robustness: Excellent performance under noisy conditions demonstrates practical applicability

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical guarantees regarding when and why STaTS outperforms other methods
Parameter Selection: Lacks systematic guidance for choosing multiple hyperparameters
Limited Applicability Scope: Applicability to highly irregular or non-stationary time series insufficiently validated
Computational Complexity Analysis: Lacks detailed time complexity analysis

Impact

Academic Contribution: Provides a new statistical perspective on time series compression
Practical Value: Directly applicable to resource-constrained environments and large-scale time series processing
Reproducibility: Clear method description and sufficient implementation details

Applicable Scenarios

Long Sequence Processing: Particularly suitable for irregular-length time series
Noisy Environments: Excellent performance in high-noise scenarios
Resource-Constrained Settings: Suitable for edge devices or real-time systems with limited computational resources
Preprocessing Tool: Serves as a general-purpose preprocessor for existing time series models

References

The paper cites important works in time series analysis, representation learning, and statistical signal processing, including:

Classical Time Series Methods: PAA, SAX, DTW
Deep Learning Methods: TS2Vec, TS-TCC, InceptionTime
Statistical Segmentation Methods: BIC, TICC
Forecasting Models: Informer, N-BEATS, Temporal Fusion Transformer

Overall Assessment: This is a high-quality time series processing paper that demonstrates excellence in theoretical foundation, experimental validation, and practical value. The proposed STaTS method fills an important gap in structure-aware time series compression and makes significant contributions to the time series analysis field.