2025-11-25T18:04:18.517311

COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing

Shang, Chang
Reconstruction-based methods are a dominant paradigm in time series anomaly detection (TSAD), however, their near-universal reliance on Mean Squared Error (MSE) loss results in statistically flawed reconstruction residuals. This fundamental weakness leads to noisy, unstable anomaly scores with a poor signal-to-noise ratio, hindering reliable detection. To address this, we propose Constrained Gaussian-Noise Optimization and Smoothing (COGNOS), a universal, model-agnostic enhancement framework that tackles this issue at its source. COGNOS introduces a novel Gaussian-White Noise Regularization strategy during training, which directly constrains the model's output residuals to conform to a Gaussian white noise distribution. This engineered statistical property creates the ideal precondition for our second contribution: a Kalman Smoothing Post-processor that provably operates as a statistically optimal estimator to denoise the raw anomaly scores. The synergy between these two components allows COGNOS to robustly separate the true anomaly signal from random fluctuations. Extensive experiments demonstrate that COGNOS is highly effective, delivering an average F-score uplift of 57.9% when applied to 12 diverse backbone models across multiple real-world benchmark datasets. Our work reveals that directly regularizing output statistics is a powerful and generalizable strategy for significantly improving anomaly detection systems.
academic

COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing

Basic Information

  • Paper ID: 2511.06894
  • Title: COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing
  • Authors: Wenlong Shang, Peng Chang (Beijing University of Technology)
  • Categories: cs.LG cs.AI
  • Submission Date: November 10, 2025 to arXiv
  • Paper Link: https://arxiv.org/abs/2511.06894

Abstract

This paper addresses a fundamental issue in reconstruction-based methods for time series anomaly detection (TSAD): statistically defective reconstruction residuals caused by MSE loss. We propose the COGNOS framework, which directly constrains model output residuals to follow a Gaussian white noise distribution through Gaussian white noise regularization (GWNR) during training, combined with a Kalman smoothing post-processor for optimal denoising. Across 12 different backbone models and multiple real-world datasets, COGNOS achieves an average F-score improvement of 57.9%, demonstrating that direct regularization of output statistical properties is a powerful and generalizable strategy.

Research Background and Motivation

1. Core Problem

Time series anomaly detection is critical in industrial manufacturing monitoring, financial system security, and IT infrastructure maintenance. Reconstruction-based self-supervised methods have become the mainstream paradigm but suffer from fundamental defects:

  • Statistically defective residuals: Reconstruction residuals from standard MSE training exhibit undesirable statistical properties (non-Gaussian, temporal correlations)
  • Low signal-to-noise ratio: Original anomaly scores are noisy and unstable, making it difficult to distinguish true anomalies from random fluctuations
  • Incomplete modeling: Models fail to fully separate deterministic patterns from random noise

2. Problem Significance

As shown in Figure 1, standard MSE-trained Transformers on the SWaT dataset exhibit three critical issues:

  • Anomaly scores are highly noisy with poor signal-to-noise ratio
  • Q-Q plots reveal strongly non-Gaussian residuals
  • Autocorrelation plots show significant temporal correlations in residuals

These statistical defects directly impact anomaly detection performance, resulting in high false positive and false negative rates.

3. Limitations of Existing Methods

  • Contrastive learning methods: While capable of learning more discriminative representations, they are typically coupled with specific architectures and do not directly address the statistical properties of final residuals
  • Filtering and regularization techniques:
    • Methods integrating filters create new hybrid architectures lacking generality
    • Latent space regularization (e.g., SVD, periodicity consistency) does not directly act on output residuals
  • Lack of theoretically optimal post-processing solutions

4. Research Motivation

This paper addresses the problem at its source: directly engineering the statistical properties of output residuals to create ideal preconditions for subsequent optimal denoising.

Core Contributions

  1. Proposes Gaussian White Noise Regularization (GWNR) strategy: First to directly constrain reconstruction residuals to follow Gaussian white noise distribution, fundamentally different from existing representation-focused contrastive methods
  2. Designs Kalman smoothing post-processor: Works synergistically with GWNR to achieve theoretically optimal denoising by leveraging engineered residual properties, significantly improving anomaly score stability
  3. Demonstrates model-agnostic effectiveness:
    • Universal enhancement framework applicable to any reconstruction model
    • Average F-score improvement of 57.9% across 12 different architectures (attention-based, time-frequency fusion, CNN-MLP)
    • Validation on 4 real-world benchmark datasets (MSL, SMAP, SWaT, PSM)
  4. Reveals new improvement direction: Proves that direct regularization of output statistical properties is more effective than traditional architectural or representation improvements

Method Details

Task Definition

Input: Multivariate time series xRL×D\mathbf{x} \in \mathbb{R}^{L \times D} (length LL, dimension DD)
Training: Learn data manifold using only normal data
Output: Anomaly score for each time point to identify deviations from normal patterns
Objective: Generate high signal-to-noise ratio, statistically optimal anomaly scores

Model Architecture

COGNOS is a two-stage framework (Figure 2):

Stage 1: Training Phase - Gaussian White Noise Regularization (GWNR)

Overall objective function: LTotal=LAWL(LMSE,LMMD,LACF)L_{Total} = L_{AWL}(L_{MSE}, L_{MMD}, L_{ACF})

where Automatic Weighted Loss (AWL) dynamically balances three components.

1. Reconstruction Loss (LMSEL_{MSE}): LMSE=1RrRr2L_{MSE} = \frac{1}{|R|}\sum_{r \in R} r^2 where R=xx^R = \mathbf{x} - \hat{\mathbf{x}} is the reconstruction residual, ensuring high-fidelity reconstruction.

2. Gaussianity Regularization (LMMDL_{MMD}): Uses Maximum Mean Discrepancy (MMD) to constrain residual distribution to approximate target Gaussian distribution N(0,σ2)\mathcal{N}(0, \sigma^{*2}):

LMMD=1R2pi,pjRκ(pi,pj)+1S2qi,qjSκ(qi,qj)2RSpiR,qjSκ(pi,qj)L_{MMD} = \frac{1}{|R|^2}\sum_{p_i,p_j \in R}\kappa(p_i, p_j) + \frac{1}{|S|^2}\sum_{q_i,q_j \in S}\kappa(q_i, q_j) - \frac{2}{|R||S|}\sum_{p_i \in R, q_j \in S}\kappa(p_i, q_j)

Kernel function uses multi-bandwidth RBF: κ(a,b)=j=1Mexp(ab22(Bjσ)2)\kappa(a,b) = \sum_{j=1}^M \exp\left(-\frac{\|a-b\|^2}{2(B_j\sigma^*)^2}\right)

Bandwidth multipliers {Bj}={0.1,0.5,1.0,2.0,5.0}\{B_j\} = \{0.1, 0.5, 1.0, 2.0, 5.0\}, σ=eω\sigma^* = e^\omega (learnable parameter).

Innovation points:

  • Non-parametric method with strong robustness
  • Adaptively learns noise level
  • Penalizes systematic bias and complex structures

3. White Noise Regularization (LACFL_{ACF}): Penalizes temporal correlations by summing squared autocorrelation coefficients for the first 10 lags:

LACF=kNlagEb,d[(ρk,b,d)2]L_{ACF} = \sum_{k \in N_{lag}} \mathbb{E}_{b,d}[(\rho_{k,b,d})^2]

where autocorrelation coefficient at lag kk: ρk,b,d=l=k+1L(rb,l,dμb,d)(rb,lk,dμb,d)l=1L(rb,l,dμb,d)2\rho_{k,b,d} = \frac{\sum_{l=k+1}^L (r_{b,l,d} - \mu_{b,d})(r_{b,l-k,d} - \mu_{b,d})}{\sum_{l=1}^L (r_{b,l,d} - \mu_{b,d})^2}

Design rationale: Empirical observation shows most significant correlations occur at early lags; Nlag={1,...,10}N_{lag}=\{1,...,10\} balances effectiveness and computational cost.

Stage 2: Inference Phase - Kalman Smoothing Post-Processor

Theoretical foundation: Kalman filter is the provably optimal linear estimator when the noise process is zero-mean, uncorrelated (white noise), and Gaussian. The residuals created by GWNR satisfy exactly these conditions.

State space model:

s_t = Fs_{t-1} + w_t, & w_t \sim \mathcal{N}(0, Q_p) \\ r_t = Hs_t + v_t, & v_t \sim \mathcal{N}(0, R_m) \end{cases}$$ where: - $s_t$: latent "true" anomaly state - $r_t$: observed raw residual - $F=I, H=I$: simple random walk model - $R_m$: empirically estimated from training set residual variance - $Q_p = \lambda R_m$: $\lambda$ is bias-variance trade-off hyperparameter **Forward Kalman filtering**: 1. Prediction step: $$\begin{cases} \hat{s}_{t|t-1} = F\hat{s}_{t-1|t-1} \\ P_{t|t-1} = FP_{t-1|t-1}F^T + Q_p \end{cases}$$ 2. Update step: $$\begin{cases} K_t = P_{t|t-1}H^T(HP_{t|t-1}H^T + R_m)^{-1} \\ \hat{s}_{t|t} = \hat{s}_{t|t-1} + K_t(r_t - H\hat{s}_{t|t-1}) \\ P_{t|t} = (I - K_tH)P_{t|t-1} \end{cases}$$ **Backward RTS smoothing**: Backward propagation from $t=T-1$ to $0$: $$G_t = P_{t|t}F^T(P_{t+1|t})^{-1}$$ $$\hat{s}_{t|T} = \hat{s}_{t|t} + G_t(\hat{s}_{t+1|T} - \hat{s}_{t+1|t})$$ The term $(\hat{s}_{t+1|T} - \hat{s}_{t+1|t})$ represents new information gained from future data. **Final anomaly score**: $$\text{Anomaly Score}_t = (\hat{s}_{t|T})^2$$ Each channel is processed independently, then multivariate scores are aggregated. ### Technical Innovation Points 1. **Direct output regularization vs. latent space regularization**: - Traditional methods (e.g., Floss) constrain latent representations - COGNOS directly acts on final output residuals - More directly addresses anomaly score quality 2. **Synergistic design**: - GWNR creates ideal statistical conditions - Kalman smoothing is theoretically optimal under these conditions - Two components form powerful synergy 3. **Model-agnostic nature**: - Does not modify backbone architecture - Plug-and-play integration with any reconstruction model - Universal enhancement framework 4. **Theoretical guarantees**: - Kalman filter optimality has mathematical proof - Prerequisite conditions engineered through GWNR - Not a heuristic method ## Experimental Setup ### Datasets Four widely-adopted real-world benchmark datasets: | Dataset | Dimensions | Training | Validation | Testing | Category | |---------|-----------|----------|-----------|---------|----------| | **MSL** | 55 | 44,653 | 11,664 | 73,729 | Spacecraft | | **SMAP** | 25 | 108,146 | 27,037 | 427,617 | Spacecraft | | **SWaT** | 51 | 396,000 | 99,000 | 449,919 | Water treatment | | **PSM** | 25 | 105,984 | 26,497 | 87,841 | Server | - **MSL/SMAP**: Expert-annotated ISA reports from Mars Science Laboratory and Soil Moisture Active Passive satellite - **PSM**: Anonymized monitoring data from eBay internal multi-application server nodes - **SWaT**: Small-scale fully functional water treatment testbed designed by Singapore's Public Utilities Board ### Evaluation Metrics Two time series-specific evaluation strategies: 1. **Point-Adjustment strategy**: If any point within a segment is identified, the entire anomalous segment is considered detected 2. **Affiliation Metrics**: Extend precision and recall by measuring temporal distance, insensitive to minor temporal misalignments Reported metrics: - **Average Precision (AP)** - **Average Recall (AR)** - **Average F-score (AF)** ### Comparison Methods **12 backbone models** spanning multiple architectural paradigms: 1. **Attention models**: AnomalyTransformer, Autoformer, PatchTsT, Pyraformer, Transformer, iTransformer 2. **Time-frequency fusion models**: TimesNet, TimeMixer, FiLM 3. **CNN-MLP models**: MICN, LightTS, DLinear **Baseline comparisons**: - Vanilla MSE: Standard MSE training and inference - Floss: Regularization method enforcing periodicity consistency in latent representation space ### Implementation Details - **Hardware**: AMD EPYC 7002 CPU (48GB RAM) + NVIDIA RTX 4090 GPU (24GB VRAM) - **Software**: Python 3.10, PyTorch 2.3.0, CUDA 12.1, Ubuntu 22.04 - **Hyperparameters**: - Sequence length: 100 - $d_{model}$: 128, $d_{MLP}$: 128 - Number of layers: 3, Top-k: 3 - Learning rate: $10^{-4}$ - Batch size: 128 - Training epochs: 10 (MSL/SMAP/PSM), 3 (SWaT) - **Critical hyperparameter $\lambda$**: - MSL/SMAP/PSM: 1.0 (short-duration anomalies prevalent) - SWaT: 0.1 (long-duration anomalies prevalent) - **Random seed**: 2021 (ensures reproducibility) ## Experimental Results ### Main Results **Tables 1-2 key findings**: 1. **Significant overall improvement**: - Average F-score improvement across 12 backbone models: **57.9%** - Consistent improvements across all tested architectures and datasets 2. **Improvements by architecture**: - Attention models: average +62.5% - Time-frequency fusion models: average +50.7% - CNN-MLP models: average +42.6% 3. **Specific cases** (Table 1): - **FiLM**: Maximum improvement 95.4% (PSM dataset) - **DLinear**: Minimum but still significant improvement 37.4% - **Transformer on SWaT**: F-score improved from 0.426 to 0.847 (+98.8%) 4. **Cross-dataset performance** (Tables 1-2 average): - SWaT: 0.596→0.869 (+45.8%) - MSL: 0.535→0.944 (+76.4%) - PSM: 0.714→0.910 (+27.5%) - SMAP: 0.489→0.824 (+68.5%) ### Ablation Study **Table 3 key findings** (average on MSL and PSM datasets): | Configuration | Average F-score | Relative Decrease from COGNOS | |---------------|-----------------|-------------------------------| | **COGNOS (complete)** | **0.927** | - | | w/GWNR+MA | 0.882 | -4.9% | | w/GWNR+LP | 0.857 | -7.5% | | w/o GWNR+KS | 0.875 | -5.6% | | w/GWNR+w/o Filter | 0.683 | -26.3% | | w/o GWNR+w/o Filter | 0.714 | -23.0% | **Key insights**: 1. **Superiority of Kalman smoother**: - Replacement with moving average (MA): 4.9% performance drop - Replacement with low-pass filter (LP): 7.5% performance drop - Heuristic filters cannot achieve theoretical optimality 2. **Fundamental role of GWNR**: - Removing GWNR while keeping KS: 5.6% performance drop - Demonstrates importance of statistical property engineering - Residual quality directly impacts post-processing effectiveness 3. **Synergistic effects**: - Complete COGNOS significantly outperforms any single component - Validates necessity of two-stage design ### Comparison with Other Methods **Table 4: COGNOS vs Floss** (representative backbones) TimesNet on PSM example: - MSE baseline: AF=0.833 - Floss: AF=0.743 (-10.8%) - **COGNOS**: AF=0.942 (+13.1%) Transformer on SWaT example: - MSE baseline: AF=0.426 - Floss: AF=0.398 (-6.6%) - **COGNOS**: AF=0.847 (+98.8%) **Key advantages**: - Floss sometimes performs worse than baseline - COGNOS significantly outperforms both in all cases - Proves superiority of direct output regularization over latent space regularization ### Case Analysis **Figures 3 and 14: Anomaly score visualization** **SWaT dataset (Transformer backbone)**: - **Vanilla**: Scores fluctuate dramatically in normal regions with extreme noise - **COGNOS**: Scores are stable, anomalous regions clearly stand out - Signal-to-noise ratio significantly improved **PSM dataset (LightTS backbone)**: - **Vanilla**: Still contains numerous false peaks on log scale - **COGNOS**: Anomalous events maintain high scores, normal regions stable and low **Statistical property improvements** (Figures 4 and 6-11): FiLM on PSM example: - **Q-Q plot**: Variance reduced from $10^6$ to $10^2$ (4 orders of magnitude) - **ACF plot**: All lag autocorrelation coefficients fall within 95% confidence interval - Residual distribution closer to theoretical Gaussian line ### Hyperparameter Sensitivity **Figure 5: Impact of $\lambda$ on performance** Test range: $\lambda \in \{0.1, 0.3, 0.5, 0.7, 1.0, 3.0, 5.0, 10.0\}$ **Findings**: - **Broad stable interval**: Performance stable for $\lambda \in [0.3, 5.0]$ - **MSL dataset**: Lower $\lambda$ (e.g., 0.1) shows slight performance decrease (over-smoothing) - **SWaT dataset**: Lower $\lambda$ (0.1) performs best (long-duration anomalies) - **Practicality**: Performance insensitive to $\lambda$, easy to tune ## Related Work ### Time Series Anomaly Detection Models 1. **Reconstruction method evolution**: - Classical: Autoencoder, LSTM - Advanced: Transformer architectures (AnomalyTransformer) - Time-frequency fusion: TimesNet, FiLM - Latest: Frequency patching (CATCH), graph neural networks 2. **Contrastive learning direction**: - Temporal neighborhood sampling (TNC) - Cross-view prediction (TS-TCC) - Hierarchical contrast (TS2Vec) - Limitations: Main innovations in architecture or latent space, not directly addressing residual statistics ### Filtering and Regularization Techniques 1. **Integrated filters**: - Deep filter preprocessing inputs - Kalman filter hybrid architectures (KalmanAE) - Limitations: Create new architectures, not universal enhancement 2. **Regularization methods**: - SVD-constrained feature learning (SVD-AE) - Periodicity consistency (Floss) - Limitations: Act on latent representations, not final output ### COGNOS's Uniqueness - **Paradigm shift**: Direct regularization of output residual statistical properties - **Theoretical foundation**: Leverages Kalman filter optimality theory - **Universality**: Model-agnostic, enhances any reconstruction method - **Synergistic design**: Regularization and post-processing tightly coupled ## Conclusions and Discussion ### Main Conclusions 1. **Core finding**: Reconstruction models trained with MSE produce statistically defective residuals, which is the fundamental bottleneck in anomaly detection performance 2. **Effective solution**: COGNOS addresses the problem at its source through two-stage strategy: - GWNR engineers ideal statistical properties - Kalman smoothing achieves theoretically optimal denoising 3. **Universality verification**: Consistent large improvements across 12 different architectures and 4 real datasets (average +57.9%) prove method generality 4. **New research direction**: Direct regularization of output statistical properties is a more powerful strategy than architectural innovation or representation learning ### Limitations 1. **Univariate processing**: - Currently applies Kalman smoothing independently to each channel - Does not exploit cross-channel dependencies in multivariate time series - May lose some information 2. **Hyperparameter $\lambda$**: - While insensitive to $\lambda$, still requires adjustment based on anomaly duration characteristics - Short-duration anomalies (MSL) need higher $\lambda$ - Long-duration anomalies (SWaT) need lower $\lambda$ 3. **Computational overhead**: - Training phase adds MMD and ACF computation - Inference phase requires two Kalman passes - While paper doesn't report detailed timing, theoretically has additional cost 4. **Theoretical assumptions**: - Kalman filter assumes linear dynamics - Complex nonlinear anomaly patterns may require extensions ### Future Directions Paper explicitly proposes: 1. **Multivariate extension**: - Develop multivariate Kalman smoothing considering cross-channel correlations - Possibly using vector autoregressive (VAR) state space models 2. **Video anomaly detection**: - Extend framework to higher-dimensional data - Joint spatial-temporal modeling 3. **Implicit directions**: - Nonlinear filters (extended Kalman filter, unscented Kalman filter) - Adaptive $\lambda$ learning - Combination with other enhancement techniques ## In-Depth Evaluation ### Strengths 1. **Theoretical innovation (9/10)**: - First systematic application of statistical signal processing theory to deep anomaly detection - Synergistic design of engineering prerequisites + theoretically optimal post-processing is highly innovative - Provides new perspective by re-examining problem from statistical angle 2. **Method universality (10/10)**: - Truly model-agnostic framework, plug-and-play - Validated across 12 different architectures spanning multiple paradigms - No backbone modification required, extremely practical 3. **Experimental sufficiency (9/10)**: - 4 real datasets covering multiple application domains - 12 backbone models with strong representativeness - Thorough ablation studies clearly showing component contributions - Comprehensive visualizations (residual statistics, anomaly score comparisons) - Complete hyperparameter sensitivity analysis 4. **Result convincingness (10/10)**: - 57.9% average improvement is highly significant - Consistent improvements across all backbones and datasets - Clear statistical significance (Tables 11-12 provide detailed values) - Visualizations intuitively demonstrate improvements 5. **Writing clarity (9/10)**: - Problem motivation clearly articulated (Figure 1 powerfully demonstrates issue) - Method description detailed, mathematical derivations complete - Experimental setup transparent, appendix provides all details - Logical flow, easy to understand ### Shortcomings 1. **Missing computational cost analysis (important)**: - No reported training and inference time overhead - Complexity of MMD and ACF computation not discussed - Lacking efficiency comparison with baseline - Practical deployment feasibility unclear 2. **Multivariate modeling limitations (moderate)**: - Univariate Kalman smoothing ignores inter-channel dependencies - Potentially suboptimal for strongly coupled multivariate systems - While results already excellent, theoretical improvement space exists 3. **Insufficient hyperparameter selection guidance (minor)**: - $\lambda$ selection depends on prior knowledge (anomaly duration) - Lacks automatic $\lambda$ selection strategy - While sensitivity is low, still requires manual tuning 4. **Limited comparison with latest methods (minor)**: - Only compared with Floss - Lacking detailed comparison with other recent regularization methods (e.g., SVD-AE) - While backbone models are recent, comparison baselines relatively limited 5. **Limited theoretical analysis depth (minor)**: - While leveraging Kalman filter optimality, convergence analysis not provided - Theoretical explanation for GWNR effectiveness insufficient - MMD loss convergence properties not discussed ### Impact Assessment 1. **Contribution to field (high)**: - Pioneering application of signal processing theory to deep anomaly detection - Provides new research paradigm: direct output statistical regularization - May inspire more statistics-driven deep learning methods 2. **Practical value (high)**: - Plug-and-play nature enables easy integration into existing systems - Significant performance improvements directly translate to practical value - Direct application potential in critical domains (industrial monitoring, financial security, etc.) 3. **Reproducibility (high)**: - Uses public datasets and open-source backbone models - Detailed hyperparameter settings (Table 6) - Complete experimental details in appendix - Fixed random seed - Only caveat: Paper doesn't mention code open-sourcing plans 4. **Academic impact prediction**: - Likely to become new baseline for time series anomaly detection - 57.9% improvement sufficient to attract widespread attention - May spawn follow-up work: multivariate extensions, nonlinear filters, other task applications ### Applicable Scenarios **Most suitable scenarios**: 1. **Industrial monitoring systems**: - Sensor data anomaly detection - Equipment fault prediction - Quality control 2. **IT infrastructure**: - Server performance monitoring (e.g., PSM dataset) - Network traffic anomaly detection - System log analysis 3. **Aerospace**: - Spacecraft telemetry monitoring (e.g., MSL/SMAP) - Aircraft health management - Critical mission systems 4. **Financial systems**: - Transaction anomaly detection - Fraud identification - Risk monitoring **Constraints**: 1. **Requires training data**: Self-supervised method, needs sufficient normal data 2. **Real-time requirements**: If computational overhead is significant, may not suit ultra-low latency scenarios 3. **Anomaly types**: Primarily targets point and segment anomalies; collective anomalies may require adjustments ### Potential Extension Directions 1. **Technical extensions**: - Multivariate state space models - Nonlinear filters (particle filtering, neural network-enhanced Kalman filtering) - Online learning and adaptive regularization 2. **Application extensions**: - Video anomaly detection (authors already mentioned) - Audio anomaly detection - Medical signal monitoring (ECG, EEG) 3. **Theoretical extensions**: - Convergence and generalization bound analysis - Extensions for non-Gaussian noise distributions - Integration with causal inference ## Key References 1. **Kalman, R. E. (1960)**. A new approach to linear filtering and prediction problems. - Original Kalman filter paper, theoretical foundation 2. **Rauch, H. E., Tung, F., & Striebel, C. T. (1965)**. Maximum likelihood estimates of linear dynamic systems. - RTS smoother 3. **Xu et al. (2022)**. Anomaly Transformer. ICLR. - Representative Transformer anomaly detection method 4. **Yang et al. (2023)**. Floss: Frequency domain regularization. - Main comparison method 5. **Kendall, Gal, & Cipolla (2018)**. Multi-task learning using uncertainty to weigh losses. CVPR. - Automatic weighted loss 6. **Huet, Navarro, & Rossi (2022)**. Local evaluation of time series anomaly detection algorithms. KDD. - Affiliation metrics ## Summary COGNOS is a high-quality research work that successfully combines classical signal processing theory with modern deep learning, providing a novel and effective solution for time series anomaly detection. Its core innovation lies in re-examining the problem from a statistical perspective, achieving theoretically optimal post-processing by engineering ideal prerequisite conditions. The 57.9% average performance improvement and consistent improvements across 12 models fully demonstrate the method's effectiveness and universality. Despite some limitations (univariate processing, unknown computational costs), the strengths far outweigh the weaknesses. This work not only provides a practical enhancement framework but, more importantly, opens a new research direction that may have profound impact on time series analysis. For critical applications requiring highly reliable anomaly detection (industrial, aerospace, financial sectors), COGNOS provides a plug-and-play solution with significant performance gains and high practical value.