Time series data often contain latent temporal structure, transitions between locally stationary regimes, repeated motifs, and bursts of variability, that are rarely leveraged in standard representation learning pipelines. Existing models typically operate on raw or fixed-window sequences, treating all time steps as equally informative, which leads to inefficiencies, poor robustness, and limited scalability in long or noisy sequences. We propose STaTS, a lightweight, unsupervised framework for Structure-Aware Temporal Summarization that adaptively compresses both univariate and multivariate time series into compact, information-preserving token sequences. STaTS detects change points across multiple temporal resolutions using a BIC-based statistical divergence criterion, then summarizes each segment using simple functions like the mean or generative models such as GMMs. This process achieves up to 30x sequence compression while retaining core temporal dynamics. STaTS operates as a model-agnostic preprocessor and can be integrated with existing unsupervised time series encoders without retraining. Extensive experiments on 150+ datasets, including classification tasks on the UCR-85, UCR-128, and UEA-30 archives, and forecasting on ETTh1 and ETTh2, ETTm1, and Electricity, demonstrate that STaTS enables 85-90\% of the full-model performance while offering dramatic reductions in computational cost. Moreover, STaTS improves robustness under noise and preserves discriminative structure, outperforming uniform and clustering-based compression baselines. These results position STaTS as a principled, general-purpose solution for efficient, structure-aware time series modeling.
- Paper ID: 2510.09593
- Title: STaTS: Structure-Aware Temporal Sequence Summarization via Statistical Window Merging
- Authors: Disharee Bhowmick, Ranjith Ramanathan, Sathyanarayanan N. Aakur
- Classification: cs.LG (Machine Learning), cs.CV (Computer Vision)
- Publication Date: October 2025
- Paper Link: https://arxiv.org/abs/2510.09593
Temporal sequence data typically contains latent temporal structures, such as transitions between local stationary states, recurring patterns, and variability bursts, yet these structures are rarely exploited in standard representation learning pipelines. Existing models typically process raw or fixed-window sequences, treating all time steps as equally important, which leads to inefficiency, poor robustness, and limited scalability in long or noisy sequences. This paper proposes STaTS, a lightweight unsupervised framework for structure-aware temporal sequence summarization, capable of adaptively compressing univariate and multivariate time series into compact, information-preserving token sequences.
Temporal sequence data is ubiquitous in finance, Internet of Things, healthcare, and other domains. With advances in sensing technology, the length and complexity of recorded time series have grown rapidly, imposing enormous computational demands on machine learning-based sequence understanding frameworks.
- Traditional Methods: Approaches such as PAA (Piecewise Aggregate Approximation), SAX (Symbolic Aggregate approXimation), and DTW (Dynamic Time Warping) achieve effective summarization but rely on uniform windowing or rigid symbolic encoding, ignoring dynamic variations in signal complexity.
- Deep Learning Methods: Methods like TS2Vec and TS-TCC process complete sequences or apply sliding windows without considering semantic changes, resulting in redundancy, computational overhead, and misalignment between model tokenization and true signal transitions.
Existing methods suffer from the following issues:
- Fixed window strategies may over-segment stable regions while under-segmenting complex regions
- Under noisy conditions, uniformly processed inputs tend to amplify spurious patterns and reduce generalization ability
- Lack of structure awareness leads to inefficiency and error propagation
- Proposes STaTS Framework: A structure-aware tokenization framework based on BIC-driven change detection criteria that identifies statistically coherent segments across multiple temporal scales
- Modular Lightweight Summarization Pipeline: Compresses time series by over 30× while preserving significant patterns, enabling efficient downstream modeling
- Model-Agnostic Unsupervised Method: Requires no architectural changes or gradient-based tuning, directly compatible with existing time series encoders (e.g., TS2Vec)
- Unified Interface: Applicable to classification, forecasting, and robustness tasks, serving as a general-purpose time series summarization preprocessing tool
Given a multivariate time series X∈RT×d (where T is the number of time steps and d is the dimensionality), the objective is to transform X into a shorter sequence X~∈RT′×d where T′≪T, while preserving the underlying structure required for downstream tasks.
Multi-Scale Coherence Detection:
- Uses BIC (Bayesian Information Criterion) to assess statistical similarity between adjacent time windows
- For adjacent windows x1,x2∈Rδ×d, compute:
ΔBIC=−2(ℓjoint−ℓsep)+klog(2δ)
where:
- ℓsep=−2δ(log∣Σ1∣+log∣Σ2∣)
- ℓjoint=−δlog∣Σ12∣
- k=d+2d(d+1) (number of free parameters in the full covariance model)
Global Objective Function:
LBIC({Si})=∑i=1T′(−2∣Si∣log∣Σi∣+2klog∣Si∣)
Multi-Scale Evaluation:
- Evaluates statistical coherence at each δ value within a predefined range
- Identifies candidate segmentation points using adaptive threshold μδ+α⋅σδ
- Eliminates redundant detections through non-maximum suppression
Summarization Function:
ϕ(Si)=∣Si∣1∑t=τi−1τi−1xt
Uses mean pooling as the default summarization operation, capturing first-order statistical properties of segments.
- Adaptive Segmentation: Unlike fixed window methods, STaTS dynamically adjusts segment boundaries based on local statistical changes
- Multivariate Extension: Naturally extends to multivariate time series through full covariance matrices
- Multi-Scale Detection: Detects changes at different temporal resolutions, capturing both short-term bursts and long-term gradual changes
- Statistical Validity: Under multivariate Gaussian assumptions, segment means are sufficient statistics
- Univariate Classification: UCR-128 (128 datasets) and UCR-85 (85 datasets)
- Multivariate Classification: UEA-30 (30 datasets)
- Multivariate Forecasting: ETTh1, ETTh2, ETTm1, Electricity
- Classification Tasks: Average accuracy and average ranking
- Forecasting Tasks: Normalized Mean Squared Error (nMSE)
- Classification Baselines: T-Loss, TNC, TS-TCC, TST, DTW, TS2Vec
- Compression Variants: TS2Vec (uniform), TS2Vec (GMM)
- Forecasting Baselines: Informer, TCN
- Window size range: δ∈{5,10,...,500}
- Threshold parameter: α=2
- Minimum separation distance: smin=20
- Numerical stability: Covariance regularization ϵ=10−6
| Model | UCR-85 Accuracy | UCR-85 Ranking | UCR-128 Accuracy | UCR-128 Ranking | Avg Length |
|---|
| TS2Vec (ori) | 0.829 | 1.99 | 0.829 | 2.02 | 424.4/534.5 |
| TS2Vec (mean) | 0.739 | 4.82 | 0.741 | 4.39 | 12.1/12.9 |
| TS2Vec (uniform) | 0.621 | 8.21 | 0.616 | 8.10 | 12.1/12.9 |
| TS2Vec (GMM) | 0.655 | 7.35 | 0.664 | 6.92 | 60.7/73.2 |
Key Findings:
- STaTS achieves 33× compression while maintaining approximately 90% of original performance
- Significantly outperforms uniform segmentation and GMM baselines
| Model | UCR-85 (Noisy) | UCR-128 (Noisy) |
|---|
| TS2Vec (ori) | 0.336 | 0.412 |
| TS2Vec (mean) | 0.581 | 0.603 |
| TS2Vec (uniform) | 0.475 | 0.485 |
| TS2Vec (GMM) | 0.505 | 0.522 |
Important Finding: Under noisy conditions, STaTS not only maintains competitive advantages but also significantly outperforms full-resolution models.
- TS2Vec (mean): Accuracy 0.622, Ranking 4.70, 20× compression
- Outperforms all compression variants while maintaining competitive performance with the original model
In long-term forecasting (H=720), STaTS matches or exceeds the original TS2Vec on multiple datasets while achieving 15× compression.
- Segmentation Strategy Comparison: Statistical segmentation > GMM segmentation > uniform segmentation
- Multi-Scale Evaluation: Multi-scale detection outperforms single-scale approaches
- Summarization Functions: Mean pooling performs best across most tasks
Qualitative analysis demonstrates that STaTS better tracks true signal trends in long-term forecasting, reducing oscillation artifacts, particularly excelling in ultra-long forecasting horizons (H=720).
- Classical Methods: Shapelets, BOSS and other symbol-based approaches
- Deep Learning: FCN, ResNet, InceptionTime
- Ensemble Methods: HIVE-COTE
- Early Breakthroughs: Sequence-to-sequence LSTM
- Modern Methods: DeepAR, N-BEATS, Temporal Fusion Transformer
- Attention Mechanisms: Informer and other sparse attention methods
- Traditional Methods: PAA, SAX (limited to univariate, fixed-length)
- Modern Methods: TICC (computationally expensive, requires optimization solving)
- STaTS Advantages: Lightweight, model-agnostic, multivariate support
- STaTS achieves efficient structure-aware time series compression, delivering 30× compression while maintaining 85-90% of original performance
- Demonstrates excellent performance under noisy conditions, providing implicit denoising effects
- As a model-agnostic preprocessor, seamlessly integrates into existing frameworks
- Statistical Assumptions: Assumes local statistical coherence within segments, potentially underperforming on dynamic ruptures or chaotic systems
- Non-End-to-End: Does not use gradient-based feedback to adapt compression strategies
- Parameter Sensitivity: Requires tuning of window size range and threshold parameters
- Online/Streaming Settings: Extension to real-time summarization and edge deployment
- Multimodal Data: Integration into sensor networks or hierarchical data like videos
- Adaptive Learning: End-to-end adaptive learning systems under distribution shift or concept drift
- Methodological Innovation: First application of multi-scale BIC criteria to adaptive segmentation of multivariate time series
- Experimental Comprehensiveness: Extensive evaluation on 150+ datasets covering both classification and forecasting tasks
- Practical Value: Significant computational efficiency gains (30× compression) with minimal performance loss
- Robustness: Excellent performance under noisy conditions demonstrates practical applicability
- Insufficient Theoretical Analysis: Lacks theoretical guarantees regarding when and why STaTS outperforms other methods
- Parameter Selection: Lacks systematic guidance for choosing multiple hyperparameters
- Limited Applicability Scope: Applicability to highly irregular or non-stationary time series insufficiently validated
- Computational Complexity Analysis: Lacks detailed time complexity analysis
- Academic Contribution: Provides a new statistical perspective on time series compression
- Practical Value: Directly applicable to resource-constrained environments and large-scale time series processing
- Reproducibility: Clear method description and sufficient implementation details
- Long Sequence Processing: Particularly suitable for irregular-length time series
- Noisy Environments: Excellent performance in high-noise scenarios
- Resource-Constrained Settings: Suitable for edge devices or real-time systems with limited computational resources
- Preprocessing Tool: Serves as a general-purpose preprocessor for existing time series models
The paper cites important works in time series analysis, representation learning, and statistical signal processing, including:
- Classical Time Series Methods: PAA, SAX, DTW
- Deep Learning Methods: TS2Vec, TS-TCC, InceptionTime
- Statistical Segmentation Methods: BIC, TICC
- Forecasting Models: Informer, N-BEATS, Temporal Fusion Transformer
Overall Assessment: This is a high-quality time series processing paper that demonstrates excellence in theoretical foundation, experimental validation, and practical value. The proposed STaTS method fills an important gap in structure-aware time series compression and makes significant contributions to the time series analysis field.