Gravitational wave interferometers are disrupted by various types of nonstationary noise, referred to as glitch noise, that affect data analysis and interferometer sensitivity. The accurate identification and classification of glitch noise are essential for improving the reliability of gravitational wave observations. In this study, we demonstrated the effectiveness of unsupervised machine learning for classifying images with nonstationary noise in the KAGRA O3GK data. Using a variational autoencoder (VAE) combined with spectral clustering, we identified eight distinct glitch noise categories. The latent variables obtained from VAE were dimensionally compressed, visualized in three-dimensional space, and classified using spectral clustering to better understand the glitch noise characteristics of KAGRA during the O3GK period. Our results highlight the potential of unsupervised learning for efficient glitch noise classification, which may in turn potentially facilitate interferometer upgrades and the development of future third-generation gravitational wave observatories.
- Paper ID: 2510.14291
- Title: Glitch noise classification in KAGRA O3GK observing data using unsupervised machine learning
- Authors: Shoichi Oshino, Yusuke Sakai, Marco Meyer-Conde, Takashi Uchiyama, Yousuke Itoh, Yutaka Shikano, Yoshikazu Terada, Hirotaka Takahashi
- Categories: gr-qc (General Relativity and Quantum Cosmology), astro-ph.IM (Instrumentation and Methods for Astrophysics)
- Publication Date: October 16, 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.14291
Gravitational wave interferometers are subject to interference from various types of non-stationary noise (referred to as glitch noise), which affects data analysis and interferometer sensitivity. Accurate identification and classification of glitch noise is crucial for improving the reliability of gravitational wave observations. This study demonstrates the effectiveness of unsupervised machine learning in classifying non-stationary noise images in KAGRA O3GK data. Using a Variational Autoencoder (VAE) combined with spectral clustering, eight distinct glitch noise categories are identified. Latent variables obtained from the VAE are compressed through dimensionality reduction, visualized in three-dimensional space, and classified using spectral clustering to better understand the characteristics of glitch noise in KAGRA during O3GK.
Gravitational wave detectors experience interference from various environmental and instrumental transient noise sources during observations, such as ground vibration, lightning, suspension control signals, and laser fluctuations. These non-stationary, non-Gaussian noise events are termed "glitches" and mix with gravitational wave data, affecting data analysis quality.
The importance of glitch noise detection and classification manifests in three aspects:
- Signal Separation: Glitch detection techniques can separate glitch noise from gravitational waves produced by astrophysical phenomena
- Source Identification: Glitch classification techniques help identify the sources of glitch noise
- Performance Enhancement: Identifying glitch noise sources facilitates their elimination, increasing the amount of data available for analysis and improving interferometer sensitivity
Although the LIGO Gravity Spy project achieved high-precision supervised learning classification of 22 glitch types through citizen scientist-annotated training data, this approach faces the following challenges on KAGRA:
- Lack of Manual Annotation: KAGRA lacks citizen scientist assistance for manual classification and annotation like the Gravity Spy project
- Interferometer Differences: KAGRA and LIGO have different interferometer configurations, and identical glitch noise may manifest differently
- Sensitivity Differences: KAGRA and LIGO interferometers have different sensitivities, potentially leading to differences in glitch noise characteristics
Based on the above challenges, this study is the first to focus on using unsupervised learning methods to classify glitch noise in KAGRA O3GK data, addressing the problem of lacking annotated data.
- First Application of Unsupervised Learning to KAGRA Data: Validates the effectiveness and generalization capability of VAE architecture in KAGRA glitch noise classification
- Establishment of Complete Unsupervised Classification Framework: Proposes a complete pipeline from data preprocessing to final classification, including VAE feature extraction, UMAP dimensionality reduction visualization, and spectral clustering classification
- Identification of KAGRA-Specific Glitch Noise Types: Identifies 8 distinct glitch noise categories in O3GK data, establishing a baseline for KAGRA's noise characteristics
- Provision of Practical Noise Analysis Tools: Provides effective glitch noise analysis methods for future KAGRA upgrades and development of third-generation gravitational wave observatories
Input: Strain data time series from KAGRA O3GK observations
Output: Classification labels for glitch noise events (8 categories)
Constraint: Unsupervised learning environment without manually annotated data
- Omicron Trigger Detection: Uses Omicron software to identify transient noise events from strain data, generating GPS timestamp database
- Q-transform: Applies Omega Scan pipeline to create time-frequency spectrograms with four time windows (0.5s, 1.0s, 2.0s, 4.0s)
- Image Processing: Rescales original 800×600 pixel images to 224×224 pixels, stacks four time windows to form 4×224×224 input data, and converts to grayscale
Encoder Structure:
- Input: 4-channel image (4, 224, 224)
- EncoderBlock(64, ks=7, s=2, p=3) + Max-pooling
- EncoderBlock(128, ks=3, s=2, p=1)
- EncoderBlock(256, ks=3, s=2, p=1)
- EncoderBlock(512, ks=3, s=2, p=1)
- Adaptive average pooling layer
- Linear layer outputting latent variable z ∈ R^dz
Decoder Structure:
- Input: Latent variable z
- Linear layer: R^dz → R^(dz×7×7)
- Batch normalization + ReLU + Upsampling
- Four DecoderBlock layers progressively reconstructing the image
Uses UMAP to reduce high-dimensional latent variables to 3D space for visualization:
- Distance Metric: Euclidean distance
- Number of Neighbors: k = 10
- Compactness Parameter: δ = 0.05
Uses Gaussian kernel function to compute adjacency matrix:
aij=exp(−2σ2∣∣xi−xj∣∣2)
Employs median heuristic method to select σ²:
σMH2=Median{∣∣xi−xj∣∣2∣1≤i<j≤n}
- Multi-timescale Feature Fusion: Captures glitch noise characteristics at different timescales by stacking spectrograms from four different time windows
- High-dimensional Latent Space: Employs 512-dimensional latent variables, providing stronger expressiveness compared to traditional low-dimensional representations
- Spectral Clustering Optimization: Compared to k-means++, spectral clustering better handles non-convex data distributions, suitable for complex glitch noise patterns
- Data Source: KAGRA O3GK observational data, approximately 178 hours
- Detection Parameters: Peak frequency 10-2048 Hz, signal-to-noise ratio > 7.5
- Number of Glitch Events: 45,345 glitch noise events, detection rate 4.63 events/minute
- Data Split: 80% training set, 20% test set
- Davies-Bouldin Index (DBI): Evaluates clustering quality; values closer to 0 indicate better segmentation
- Silhouette Coefficient: Quantifies sample conformity with assigned clusters; values close to 1 indicate tight and well-separated clustering
- k-means++: Serves as baseline clustering method for comparison
- VAE Hyperparameters: Latent variable dimension 512, batch size 96, training epochs 100, learning rate 5×10⁻⁴
- Optimizer: Adam optimizer
- Number of Clusters: Tests 4-12 cluster numbers
- Optimal Number of Clusters: Based on DBI evaluation, spectral clustering achieves best performance with 8 categories
- Method Comparison: Spectral clustering significantly outperforms k-means++ in DBI evaluation, with the latter showing continuous DBI decline as cluster numbers increase
- Silhouette Coefficient Validation: Silhouette coefficient results align with DBI assessment, confirming the reasonableness of 8 clusters
Eight identified glitch noise categories and their distribution:
| Category | Count (Percentage) | Noise Shape | Description |
|---|
| 0 | 621 (1.4%) | Middle line | Central linear structure |
| 1 | 294 (0.6%) | Lower line | Bottom linear structure |
| 2 | 35925 (79.2%) | Blips | Teardrop-shaped, most common type |
| 3 | 44 (0.1%) | Complex | Complex shape |
| 4 | 4016 (8.9%) | Blip & Line | Vertical line plus horizontal line |
| 5 | 4358 (9.6%) | Separated Blips | Separated blips |
| 6 | 60 (1.3%) | Loud | Loud noise |
| 7 | 27 (0.6%) | Scattered Light | Scattered light |
- Dominant Noise Type: Category #2 (Blips) accounts for 79.2% of total noise, representing the most common glitch noise during KAGRA O3GK
- LIGO Comparison: KAGRA-identified glitch types (8 types) are fewer than LIGO Gravity Spy project's 22 types, possibly related to KAGRA's lower sensitivity during O3GK
- Noise Characteristics: Successfully identified "Scattered Light" type similar to LIGO, validating method effectiveness
UMAP 3D visualization shows:
- Glitch noise exhibits clear clustering structure
- Contains several small clusters and 1-2 large clusters
- Obvious differences in segmentation effects under different cluster number settings
- Gravity Spy Project: LIGO-developed supervised learning glitch classification system achieving high-precision classification of 22 glitch types through citizen scientist annotation
- KAGRA Noise Analysis: Previous research primarily focused on preliminary noise understanding of O3GK data, lacking systematic classification methods
- Sakai et al.'s Work: First application of VAE+UMAP+clustering method to Gravity Spy data; this paper represents the first application and validation of this method on KAGRA data
- VAE Applications in Astrophysics: Increasing applications of variational autoencoders in astrophysical data analysis
- Spectral Clustering: Superior to traditional clustering methods in handling complex data distributions
- Method Effectiveness: Unsupervised learning methods successfully applied to KAGRA data, with VAE architecture demonstrating good generalization capability across different datasets
- Noise Characteristic Identification: Eight distinct glitch noise categories identified in O3GK data, establishing baseline for KAGRA noise characteristics
- Practical Value: Provides effective analysis tools for KAGRA upgrades and development of future third-generation gravitational wave observatories
- Data Constraints: Uses only O3GK period data with relatively short time span (178 hours)
- Sensitivity Impact: KAGRA's lower sensitivity during O3GK may mask some weak glitch noise types
- Missing Validation: Lacks comparison with expert manual classification results
- O4 Data Application: Apply the same method to current O4 observational data, studying the impact of interferometer configuration changes on glitch noise topology
- Real-time Analysis: Develop real-time glitch noise clustering systems utilizing UMAP's incremental learning capability
- Multi-detector Fusion: Extend to glitch noise analysis in LIGO-Virgo-KAGRA joint network
- Methodological Innovation: First successful application of mature unsupervised learning framework to KAGRA data, addressing the practical problem of lacking annotated data
- Technical Completeness: Provides complete technical pipeline from raw data to final classification with strong reproducibility
- Experimental Sufficiency: Validates results through multiple evaluation metrics (DBI, silhouette coefficient) and comparison methods
- Practical Value: Provides practical tools and methods for noise analysis in gravitational wave detectors
- Validation Limitations: Lacks comparison with manual expert classification, making it difficult to assess classification accuracy
- Parameter Sensitivity: Insufficient sensitivity analysis for UMAP and spectral clustering parameter selection
- Physical Interpretation: Insufficient analysis of physical origins of glitch noise, primarily focusing on morphological features
- Academic Contribution: Provides new unsupervised learning paradigm for gravitational wave data analysis field
- Practical Value: Directly serves KAGRA detector performance optimization and data quality improvement
- Scalability: Method demonstrates good scalability, applicable to other gravitational wave detectors
- New Detector Commissioning: Suitable for newly built gravitational wave detectors lacking historical annotated data
- Noise Monitoring: Can be used for real-time noise monitoring and classification during detector operation
- Detector Upgrades: Provides tools for analyzing noise characteristic changes following detector upgrades
Key references cited in the paper include:
- Zevin et al. (2017, 2024): Core literature of Gravity Spy project
- Sakai et al. (2022, 2024): Pioneering work on unsupervised learning in gravitational wave glitch classification
- Kingma and Welling (2013): Original variational autoencoder paper
- McInnes et al. (2018): UMAP dimensionality reduction method
- von Luxburg (2007): Classical tutorial on spectral clustering
Overall Assessment: This is a technically solid, application-oriented high-quality paper that successfully addresses the practical problem of glitch noise classification in the KAGRA detector. While relatively limited in theoretical innovation, its practical value and contribution to the gravitational wave detection field are significant. The paper's methodology is rigorous, experimental design is reasonable, and it provides valuable reference for related research.