Detecting wide binaries using machine learning algorithms
Ashesh, Kaur, Aashish
We present a machine learning (ML) framework for the detection of wide binary star systems using Gaia DR3 data. By training supervised ML models on established wide binary catalogues, we efficiently classify wide binaries and employ clustering and nearest neighbour search to pair candidate systems. Our approach incorporates data preprocessing techniques such as SMOTE, correlation analysis, and PCA, and achieves high accuracy and recall in the task of wide binary classification. The resulting publicly available code enables rapid, scalable, and customizable analysis of wide binaries, complementing conventional analyses and providing a valuable resource for future astrophysical studies.
academic
Detecting Wide Binaries Using Machine Learning Algorithms
Title: Detecting wide binaries using machine learning algorithms
Authors: Amoy Ashesh (Indian Institute of Technology Patna & Trinity College Dublin), Harsimran Kaur (Indian Institute of Technology Patna), Sandeep Aashish (Indian Institute of Technology Patna)
This paper proposes a machine learning framework for detecting wide binary systems using Gaia DR3 data. By training supervised machine learning models on established wide binary catalogs, the researchers efficiently classify wide binaries and employ clustering and nearest neighbor search to pair candidate systems. The methodology integrates data preprocessing techniques including SMOTE, correlation analysis, and PCA, achieving high accuracy and recall rates in wide binary classification tasks. The publicly available code provided enables rapid, scalable, and customizable analysis of wide binaries, offering an effective complement to traditional analytical methods and providing valuable resources for future astrophysical research.
Wide binary systems consist of pairs of stars gravitationally bound at distances ranging from thousands to tens of thousands of astronomical units. These systems operate in low-acceleration environments and serve as ideal laboratories for testing modified gravity theories and standard gravitational deviations.
Computational Complexity: Traditional statistical methods rely on Monte Carlo simulations and complex probabilistic analysis, incurring high computational costs
Noise and Contamination: Identifying true gravitationally-bound pairs and detecting their dynamical anomalies is complicated by noise, contamination, and data scale
Chance Alignments: As separation distance increases, the number of chance alignments increases, presenting challenges for accurate identification
Machine learning methods provide scalable alternatives that can efficiently predict binary systems from noisy background populations through clustering algorithms and nearest neighbor search techniques, offering tools for discovering new physics.
Input: Stellar records from raw Gaia DR3 data
Output: Binary classification labels (wide binary system membership) + binary pairing
Constraint: Supervised learning based on the wide binary catalog established by El-Badry et al.
The original data distribution is highly imbalanced (494,664 vs 5,336). SMOTE technology generates synthetic minority class samples through interpolation, significantly improving model performance.
Clustering for dimensionality reduction is performed first, followed by nearest neighbor search within each cluster, effectively reducing the O(n²) pairing complexity.
El-Badry et al. (2021) - Foundational work on wide binary catalog construction
Chawla et al. (2002) - Original SMOTE technique paper
Breiman (2001) - Random Forest algorithm
Baron (2019) - Survey of machine learning applications in astronomy
Overall Assessment: This is a technically solid and highly practical application paper. The authors successfully apply machine learning techniques to a specific astrophysical problem, achieving significant performance improvements. While relatively limited in theoretical innovation, its open-source tools and systematic methodology make substantial contributions to the field. This work provides an important foundation for subsequent gravity theory testing and anomalous wide binary detection.