2025-11-23T20:22:17.730418

Glitch noise classification in KAGRA O3GK observing data using unsupervised machine learning

Oshino, Sakai, Meyer-Conde et al.

Gravitational wave interferometers are disrupted by various types of nonstationary noise, referred to as glitch noise, that affect data analysis and interferometer sensitivity. The accurate identification and classification of glitch noise are essential for improving the reliability of gravitational wave observations. In this study, we demonstrated the effectiveness of unsupervised machine learning for classifying images with nonstationary noise in the KAGRA O3GK data. Using a variational autoencoder (VAE) combined with spectral clustering, we identified eight distinct glitch noise categories. The latent variables obtained from VAE were dimensionally compressed, visualized in three-dimensional space, and classified using spectral clustering to better understand the glitch noise characteristics of KAGRA during the O3GK period. Our results highlight the potential of unsupervised learning for efficient glitch noise classification, which may in turn potentially facilitate interferometer upgrades and the development of future third-generation gravitational wave observatories.

academic

Glitch noise classification in KAGRA O3GK observing data using unsupervised machine learning

Basic Information

Paper ID: 2510.14291
Title: Glitch noise classification in KAGRA O3GK observing data using unsupervised machine learning
Authors: Shoichi Oshino, Yusuke Sakai, Marco Meyer-Conde, Takashi Uchiyama, Yousuke Itoh, Yutaka Shikano, Yoshikazu Terada, Hirotaka Takahashi
Categories: gr-qc (General Relativity and Quantum Cosmology), astro-ph.IM (Instrumentation and Methods for Astrophysics)
Publication Date: October 16, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.14291

Abstract

Gravitational wave interferometers are subject to interference from various types of non-stationary noise (referred to as glitch noise), which affects data analysis and interferometer sensitivity. Accurate identification and classification of glitch noise is crucial for improving the reliability of gravitational wave observations. This study demonstrates the effectiveness of unsupervised machine learning in classifying non-stationary noise images in KAGRA O3GK data. Using a Variational Autoencoder (VAE) combined with spectral clustering, eight distinct glitch noise categories are identified. Latent variables obtained from the VAE are compressed through dimensionality reduction, visualized in three-dimensional space, and classified using spectral clustering to better understand the characteristics of glitch noise in KAGRA during O3GK.

Research Background and Motivation

Problem Definition

Gravitational wave detectors experience interference from various environmental and instrumental transient noise sources during observations, such as ground vibration, lightning, suspension control signals, and laser fluctuations. These non-stationary, non-Gaussian noise events are termed "glitches" and mix with gravitational wave data, affecting data analysis quality.

Problem Significance

The importance of glitch noise detection and classification manifests in three aspects:

Signal Separation: Glitch detection techniques can separate glitch noise from gravitational waves produced by astrophysical phenomena
Source Identification: Glitch classification techniques help identify the sources of glitch noise
Performance Enhancement: Identifying glitch noise sources facilitates their elimination, increasing the amount of data available for analysis and improving interferometer sensitivity

Limitations of Existing Methods

Although the LIGO Gravity Spy project achieved high-precision supervised learning classification of 22 glitch types through citizen scientist-annotated training data, this approach faces the following challenges on KAGRA:

Lack of Manual Annotation: KAGRA lacks citizen scientist assistance for manual classification and annotation like the Gravity Spy project
Interferometer Differences: KAGRA and LIGO have different interferometer configurations, and identical glitch noise may manifest differently
Sensitivity Differences: KAGRA and LIGO interferometers have different sensitivities, potentially leading to differences in glitch noise characteristics

Research Motivation

Based on the above challenges, this study is the first to focus on using unsupervised learning methods to classify glitch noise in KAGRA O3GK data, addressing the problem of lacking annotated data.

Core Contributions

First Application of Unsupervised Learning to KAGRA Data: Validates the effectiveness and generalization capability of VAE architecture in KAGRA glitch noise classification
Establishment of Complete Unsupervised Classification Framework: Proposes a complete pipeline from data preprocessing to final classification, including VAE feature extraction, UMAP dimensionality reduction visualization, and spectral clustering classification
Identification of KAGRA-Specific Glitch Noise Types: Identifies 8 distinct glitch noise categories in O3GK data, establishing a baseline for KAGRA's noise characteristics
Provision of Practical Noise Analysis Tools: Provides effective glitch noise analysis methods for future KAGRA upgrades and development of third-generation gravitational wave observatories

Methodology Details

Task Definition

Input: Strain data time series from KAGRA O3GK observations Output: Classification labels for glitch noise events (8 categories) Constraint: Unsupervised learning environment without manually annotated data

Model Architecture

1. Data Preprocessing Pipeline

Omicron Trigger Detection: Uses Omicron software to identify transient noise events from strain data, generating GPS timestamp database
Q-transform: Applies Omega Scan pipeline to create time-frequency spectrograms with four time windows (0.5s, 1.0s, 2.0s, 4.0s)
Image Processing: Rescales original 800×600 pixel images to 224×224 pixels, stacks four time windows to form 4×224×224 input data, and converts to grayscale

2. VAE Architecture Design

Encoder Structure:

Input: 4-channel image (4, 224, 224)
EncoderBlock(64, ks=7, s=2, p=3) + Max-pooling
EncoderBlock(128, ks=3, s=2, p=1)
EncoderBlock(256, ks=3, s=2, p=1)
EncoderBlock(512, ks=3, s=2, p=1)
Adaptive average pooling layer
Linear layer outputting latent variable z ∈ R^dz

Decoder Structure:

Input: Latent variable z
Linear layer: R^dz → R^(dz×7×7)
Batch normalization + ReLU + Upsampling
Four DecoderBlock layers progressively reconstructing the image

3. UMAP Dimensionality Reduction Visualization

Uses UMAP to reduce high-dimensional latent variables to 3D space for visualization:

Distance Metric: Euclidean distance
Number of Neighbors: k = 10
Compactness Parameter: δ = 0.05

4. Spectral Clustering Classification

Uses Gaussian kernel function to compute adjacency matrix: $a_{ij} = \exp\left(-\frac{||x_i - x_j||^2}{2\sigma^2}\right)$

Employs median heuristic method to select σ²: $\sigma^2_{MH} = \text{Median}\{||x_i - x_j||^2 | 1 \leq i < j \leq n\}$

Technical Innovations

Multi-timescale Feature Fusion: Captures glitch noise characteristics at different timescales by stacking spectrograms from four different time windows
High-dimensional Latent Space: Employs 512-dimensional latent variables, providing stronger expressiveness compared to traditional low-dimensional representations
Spectral Clustering Optimization: Compared to k-means++, spectral clustering better handles non-convex data distributions, suitable for complex glitch noise patterns

Experimental Setup

Dataset

Data Source: KAGRA O3GK observational data, approximately 178 hours
Detection Parameters: Peak frequency 10-2048 Hz, signal-to-noise ratio > 7.5
Number of Glitch Events: 45,345 glitch noise events, detection rate 4.63 events/minute
Data Split: 80% training set, 20% test set

Evaluation Metrics

Davies-Bouldin Index (DBI): Evaluates clustering quality; values closer to 0 indicate better segmentation
Silhouette Coefficient: Quantifies sample conformity with assigned clusters; values close to 1 indicate tight and well-separated clustering

Comparison Methods

k-means++: Serves as baseline clustering method for comparison

Implementation Details

VAE Hyperparameters: Latent variable dimension 512, batch size 96, training epochs 100, learning rate 5×10⁻⁴
Optimizer: Adam optimizer
Number of Clusters: Tests 4-12 cluster numbers

Experimental Results

Main Results

Clustering Quality Assessment

Optimal Number of Clusters: Based on DBI evaluation, spectral clustering achieves best performance with 8 categories
Method Comparison: Spectral clustering significantly outperforms k-means++ in DBI evaluation, with the latter showing continuous DBI decline as cluster numbers increase
Silhouette Coefficient Validation: Silhouette coefficient results align with DBI assessment, confirming the reasonableness of 8 clusters

Glitch Noise Classification Results

Eight identified glitch noise categories and their distribution:

Category	Count (Percentage)	Noise Shape	Description
0	621 (1.4%)	Middle line	Central linear structure
1	294 (0.6%)	Lower line	Bottom linear structure
2	35925 (79.2%)	Blips	Teardrop-shaped, most common type
3	44 (0.1%)	Complex	Complex shape
4	4016 (8.9%)	Blip & Line	Vertical line plus horizontal line
5	4358 (9.6%)	Separated Blips	Separated blips
6	60 (1.3%)	Loud	Loud noise
7	27 (0.6%)	Scattered Light	Scattered light

Key Findings

Dominant Noise Type: Category #2 (Blips) accounts for 79.2% of total noise, representing the most common glitch noise during KAGRA O3GK
LIGO Comparison: KAGRA-identified glitch types (8 types) are fewer than LIGO Gravity Spy project's 22 types, possibly related to KAGRA's lower sensitivity during O3GK
Noise Characteristics: Successfully identified "Scattered Light" type similar to LIGO, validating method effectiveness

Visualization Analysis

UMAP 3D visualization shows:

Glitch noise exhibits clear clustering structure
Contains several small clusters and 1-2 large clusters
Obvious differences in segmentation effects under different cluster number settings

Gravitational Wave Glitch Detection Field

Gravity Spy Project: LIGO-developed supervised learning glitch classification system achieving high-precision classification of 22 glitch types through citizen scientist annotation
KAGRA Noise Analysis: Previous research primarily focused on preliminary noise understanding of O3GK data, lacking systematic classification methods

Unsupervised Learning Applications

Sakai et al.'s Work: First application of VAE+UMAP+clustering method to Gravity Spy data; this paper represents the first application and validation of this method on KAGRA data

Technical Methods

VAE Applications in Astrophysics: Increasing applications of variational autoencoders in astrophysical data analysis
Spectral Clustering: Superior to traditional clustering methods in handling complex data distributions

Conclusions and Discussion

Main Conclusions

Method Effectiveness: Unsupervised learning methods successfully applied to KAGRA data, with VAE architecture demonstrating good generalization capability across different datasets
Noise Characteristic Identification: Eight distinct glitch noise categories identified in O3GK data, establishing baseline for KAGRA noise characteristics
Practical Value: Provides effective analysis tools for KAGRA upgrades and development of future third-generation gravitational wave observatories

Limitations

Data Constraints: Uses only O3GK period data with relatively short time span (178 hours)
Sensitivity Impact: KAGRA's lower sensitivity during O3GK may mask some weak glitch noise types
Missing Validation: Lacks comparison with expert manual classification results

Future Directions

O4 Data Application: Apply the same method to current O4 observational data, studying the impact of interferometer configuration changes on glitch noise topology
Real-time Analysis: Develop real-time glitch noise clustering systems utilizing UMAP's incremental learning capability
Multi-detector Fusion: Extend to glitch noise analysis in LIGO-Virgo-KAGRA joint network

In-Depth Evaluation

Strengths

Methodological Innovation: First successful application of mature unsupervised learning framework to KAGRA data, addressing the practical problem of lacking annotated data
Technical Completeness: Provides complete technical pipeline from raw data to final classification with strong reproducibility
Experimental Sufficiency: Validates results through multiple evaluation metrics (DBI, silhouette coefficient) and comparison methods
Practical Value: Provides practical tools and methods for noise analysis in gravitational wave detectors

Weaknesses

Validation Limitations: Lacks comparison with manual expert classification, making it difficult to assess classification accuracy
Parameter Sensitivity: Insufficient sensitivity analysis for UMAP and spectral clustering parameter selection
Physical Interpretation: Insufficient analysis of physical origins of glitch noise, primarily focusing on morphological features

Impact

Academic Contribution: Provides new unsupervised learning paradigm for gravitational wave data analysis field
Practical Value: Directly serves KAGRA detector performance optimization and data quality improvement
Scalability: Method demonstrates good scalability, applicable to other gravitational wave detectors

Applicable Scenarios

New Detector Commissioning: Suitable for newly built gravitational wave detectors lacking historical annotated data
Noise Monitoring: Can be used for real-time noise monitoring and classification during detector operation
Detector Upgrades: Provides tools for analyzing noise characteristic changes following detector upgrades

References

Key references cited in the paper include:

Zevin et al. (2017, 2024): Core literature of Gravity Spy project
Sakai et al. (2022, 2024): Pioneering work on unsupervised learning in gravitational wave glitch classification
Kingma and Welling (2013): Original variational autoencoder paper
McInnes et al. (2018): UMAP dimensionality reduction method
von Luxburg (2007): Classical tutorial on spectral clustering

Overall Assessment: This is a technically solid, application-oriented high-quality paper that successfully addresses the practical problem of glitch noise classification in the KAGRA detector. While relatively limited in theoretical innovation, its practical value and contribution to the gravitational wave detection field are significant. The paper's methodology is rigorous, experimental design is reasonable, and it provides valuable reference for related research.