2025-11-20T07:19:14.926764

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

Bhowmick, Ramanathan, Aakur
Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.
academic

STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging

Basic Information

  • Paper ID: 2510.09593
  • Title: STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging
  • Authors: Disharee Bhowmick, Ranjith Ramanathan, Sathyanarayanan N. Aakur
  • Classification: cs.LG (Machine Learning), cs.CV (Computer Vision)
  • Publication Date: October 2025
  • Paper Link: https://arxiv.org/abs/2510.09593

Abstract

Temporal sequence data typically contains latent temporal structures, such as transitions between local stationary states, recurring patterns, and variability bursts, yet these structures are rarely exploited in standard representation learning pipelines. Existing models typically process raw or fixed-window sequences, treating all time steps as equally important, which leads to inefficiency, poor robustness, and limited scalability in long or noisy sequences. This paper proposes STaTS, a lightweight unsupervised framework for structure-aware temporal sequence summarization, capable of adaptively compressing univariate and multivariate time series into compact, information-preserving token sequences.

Research Background and Motivation

Problem Definition

Temporal sequence data is ubiquitous in finance, Internet of Things, healthcare, and other domains. With advances in sensing technology, the length and complexity of recorded time series have grown rapidly, imposing enormous computational demands on machine learning-based sequence understanding frameworks.

Limitations of Existing Methods

  1. Traditional Methods: Approaches such as PAA (Piecewise Aggregate Approximation), SAX (Symbolic Aggregate approXimation), and DTW (Dynamic Time Warping) achieve effective summarization but rely on uniform windowing or rigid symbolic encoding, ignoring dynamic variations in signal complexity.
  2. Deep Learning Methods: Methods like TS2Vec and TS-TCC process complete sequences or apply sliding windows without considering semantic changes, resulting in redundancy, computational overhead, and misalignment between model tokenization and true signal transitions.

Research Motivation

Existing methods suffer from the following issues:

  • Fixed window strategies may over-segment stable regions while under-segmenting complex regions
  • Under noisy conditions, uniformly processed inputs tend to amplify spurious patterns and reduce generalization ability
  • Lack of structure awareness leads to inefficiency and error propagation

Core Contributions

  1. Proposes STaTS Framework: A structure-aware tokenization framework based on BIC-driven change detection criteria that identifies statistically coherent segments across multiple temporal scales
  2. Modular Lightweight Summarization Pipeline: Compresses time series by over 30× while preserving significant patterns, enabling efficient downstream modeling
  3. Model-Agnostic Unsupervised Method: Requires no architectural changes or gradient-based tuning, directly compatible with existing time series encoders (e.g., TS2Vec)
  4. Unified Interface: Applicable to classification, forecasting, and robustness tasks, serving as a general-purpose time series summarization preprocessing tool

Methodology Details

Task Definition

Given a multivariate time series XRT×dX \in \mathbb{R}^{T \times d} (where TT is the number of time steps and dd is the dimensionality), the objective is to transform XX into a shorter sequence X~RT×d\tilde{X} \in \mathbb{R}^{T' \times d} where TTT' \ll T, while preserving the underlying structure required for downstream tasks.

Model Architecture

1. Tokenization Phase

Multi-Scale Coherence Detection:

  • Uses BIC (Bayesian Information Criterion) to assess statistical similarity between adjacent time windows
  • For adjacent windows x1,x2Rδ×dx_1, x_2 \in \mathbb{R}^{\delta \times d}, compute:

ΔBIC=2(jointsep)+klog(2δ)\Delta BIC = -2(\ell_{joint} - \ell_{sep}) + k \log(2\delta)

where:

  • sep=δ2(logΣ1+logΣ2)\ell_{sep} = -\frac{\delta}{2}(\log|\Sigma_1| + \log|\Sigma_2|)
  • joint=δlogΣ12\ell_{joint} = -\delta \log|\Sigma_{12}|
  • k=d+d(d+1)2k = d + \frac{d(d+1)}{2} (number of free parameters in the full covariance model)

Global Objective Function: LBIC({Si})=i=1T(Si2logΣi+k2logSi)L_{BIC}(\{S_i\}) = \sum_{i=1}^{T'} \left(-\frac{|S_i|}{2}\log|\Sigma_i| + \frac{k}{2}\log|S_i|\right)

Multi-Scale Evaluation:

  • Evaluates statistical coherence at each δ\delta value within a predefined range
  • Identifies candidate segmentation points using adaptive threshold μδ+ασδ\mu_\delta + \alpha \cdot \sigma_\delta
  • Eliminates redundant detections through non-maximum suppression

2. Summarization Phase

Summarization Function: ϕ(Si)=1Sit=τi1τi1xt\phi(S_i) = \frac{1}{|S_i|} \sum_{t=\tau_{i-1}}^{\tau_i-1} x_t

Uses mean pooling as the default summarization operation, capturing first-order statistical properties of segments.

Technical Innovations

  1. Adaptive Segmentation: Unlike fixed window methods, STaTS dynamically adjusts segment boundaries based on local statistical changes
  2. Multivariate Extension: Naturally extends to multivariate time series through full covariance matrices
  3. Multi-Scale Detection: Detects changes at different temporal resolutions, capturing both short-term bursts and long-term gradual changes
  4. Statistical Validity: Under multivariate Gaussian assumptions, segment means are sufficient statistics

Experimental Setup

Datasets

  1. Univariate Classification: UCR-128 (128 datasets) and UCR-85 (85 datasets)
  2. Multivariate Classification: UEA-30 (30 datasets)
  3. Multivariate Forecasting: ETTh1, ETTh2, ETTm1, Electricity

Evaluation Metrics

  • Classification Tasks: Average accuracy and average ranking
  • Forecasting Tasks: Normalized Mean Squared Error (nMSE)

Baseline Methods

  • Classification Baselines: T-Loss, TNC, TS-TCC, TST, DTW, TS2Vec
  • Compression Variants: TS2Vec (uniform), TS2Vec (GMM)
  • Forecasting Baselines: Informer, TCN

Implementation Details

  • Window size range: δ{5,10,...,500}\delta \in \{5, 10, ..., 500\}
  • Threshold parameter: α=2\alpha = 2
  • Minimum separation distance: smin=20s_{min} = 20
  • Numerical stability: Covariance regularization ϵ=106\epsilon = 10^{-6}

Experimental Results

Main Results

Univariate Classification Performance

ModelUCR-85 AccuracyUCR-85 RankingUCR-128 AccuracyUCR-128 RankingAvg Length
TS2Vec (ori)0.8291.990.8292.02424.4/534.5
TS2Vec (mean)0.7394.820.7414.3912.1/12.9
TS2Vec (uniform)0.6218.210.6168.1012.1/12.9
TS2Vec (GMM)0.6557.350.6646.9260.7/73.2

Key Findings:

  • STaTS achieves 33× compression while maintaining approximately 90% of original performance
  • Significantly outperforms uniform segmentation and GMM baselines

Noise Robustness

ModelUCR-85 (Noisy)UCR-128 (Noisy)
TS2Vec (ori)0.3360.412
TS2Vec (mean)0.5810.603
TS2Vec (uniform)0.4750.485
TS2Vec (GMM)0.5050.522

Important Finding: Under noisy conditions, STaTS not only maintains competitive advantages but also significantly outperforms full-resolution models.

Multivariate Classification

  • TS2Vec (mean): Accuracy 0.622, Ranking 4.70, 20× compression
  • Outperforms all compression variants while maintaining competitive performance with the original model

Time Series Forecasting

In long-term forecasting (H=720), STaTS matches or exceeds the original TS2Vec on multiple datasets while achieving 15× compression.

Ablation Studies

  1. Segmentation Strategy Comparison: Statistical segmentation > GMM segmentation > uniform segmentation
  2. Multi-Scale Evaluation: Multi-scale detection outperforms single-scale approaches
  3. Summarization Functions: Mean pooling performs best across most tasks

Case Analysis

Qualitative analysis demonstrates that STaTS better tracks true signal trends in long-term forecasting, reducing oscillation artifacts, particularly excelling in ultra-long forecasting horizons (H=720).

Time Series Classification

  • Classical Methods: Shapelets, BOSS and other symbol-based approaches
  • Deep Learning: FCN, ResNet, InceptionTime
  • Ensemble Methods: HIVE-COTE

Time Series Forecasting

  • Early Breakthroughs: Sequence-to-sequence LSTM
  • Modern Methods: DeepAR, N-BEATS, Temporal Fusion Transformer
  • Attention Mechanisms: Informer and other sparse attention methods

Time Series Summarization

  • Traditional Methods: PAA, SAX (limited to univariate, fixed-length)
  • Modern Methods: TICC (computationally expensive, requires optimization solving)
  • STaTS Advantages: Lightweight, model-agnostic, multivariate support

Conclusions and Discussion

Main Conclusions

  1. STaTS achieves efficient structure-aware time series compression, delivering 30× compression while maintaining 85-90% of original performance
  2. Demonstrates excellent performance under noisy conditions, providing implicit denoising effects
  3. As a model-agnostic preprocessor, seamlessly integrates into existing frameworks

Limitations

  1. Statistical Assumptions: Assumes local statistical coherence within segments, potentially underperforming on dynamic ruptures or chaotic systems
  2. Non-End-to-End: Does not use gradient-based feedback to adapt compression strategies
  3. Parameter Sensitivity: Requires tuning of window size range and threshold parameters

Future Directions

  1. Online/Streaming Settings: Extension to real-time summarization and edge deployment
  2. Multimodal Data: Integration into sensor networks or hierarchical data like videos
  3. Adaptive Learning: End-to-end adaptive learning systems under distribution shift or concept drift

In-Depth Evaluation

Strengths

  1. Methodological Innovation: First application of multi-scale BIC criteria to adaptive segmentation of multivariate time series
  2. Experimental Comprehensiveness: Extensive evaluation on 150+ datasets covering both classification and forecasting tasks
  3. Practical Value: Significant computational efficiency gains (30× compression) with minimal performance loss
  4. Robustness: Excellent performance under noisy conditions demonstrates practical applicability

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical guarantees regarding when and why STaTS outperforms other methods
  2. Parameter Selection: Lacks systematic guidance for choosing multiple hyperparameters
  3. Limited Applicability Scope: Applicability to highly irregular or non-stationary time series insufficiently validated
  4. Computational Complexity Analysis: Lacks detailed time complexity analysis

Impact

  1. Academic Contribution: Provides a new statistical perspective on time series compression
  2. Practical Value: Directly applicable to resource-constrained environments and large-scale time series processing
  3. Reproducibility: Clear method description and sufficient implementation details

Applicable Scenarios

  1. Long Sequence Processing: Particularly suitable for irregular-length time series
  2. Noisy Environments: Excellent performance in high-noise scenarios
  3. Resource-Constrained Settings: Suitable for edge devices or real-time systems with limited computational resources
  4. Preprocessing Tool: Serves as a general-purpose preprocessor for existing time series models

References

The paper cites important works in time series analysis, representation learning, and statistical signal processing, including:

  • Classical Time Series Methods: PAA, SAX, DTW
  • Deep Learning Methods: TS2Vec, TS-TCC, InceptionTime
  • Statistical Segmentation Methods: BIC, TICC
  • Forecasting Models: Informer, N-BEATS, Temporal Fusion Transformer

Overall Assessment: This is a high-quality time series processing paper that demonstrates excellence in theoretical foundation, experimental validation, and practical value. The proposed STaTS method fills an important gap in structure-aware time series compression and makes significant contributions to the time series analysis field.