2025-11-22T19:25:16.707578

Selecting Clusters and Protoclusters via Stellar Mass Density: I. Method and tests on Mock HSC-SSP catalogs

Vicentin, Araya-Araya, Sodré et al.
We present an algorithm designed to identify galaxy (proto)clusters in wide-area photometric surveys by first selecting their dominant galaxy-i.e., the Brightest Cluster Galaxy (BCG) or protoBCG-through the local stellar mass density traced by massive galaxies. We focus on its application to the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) Wide Survey to detect candidates up to $\rm z \sim 2$. In this work, we apply the method to mock galaxy catalogs that replicate the observational constraints of the HSC-SSP Wide Survey. We derive functions that describe the probability of a massive galaxy being the dominant galaxy in a structure as a function of its stellar mass density contrast within a given redshift interval. We show that galaxies with probabilities greater than 50\% yield a sample of BCGs/protoBCGs with $\gtrsim 65\%$ purity, where most of the contamination arises from galaxies in massive groups below our cluster threshold. Using the same threshold, the resulting (proto)cluster sample achieves 80\% purity and 50\% completeness for halos with $M_{\rm{halo}} \geq 10^{14} \ M_{\odot}$, reaching nearly 100\% completeness for $M_{\rm{halo}} \geq 10^{14.5} \ M_{\odot}$. We also assign probabilistic membership to surrounding galaxies based on stellar mass and distance to the dominant galaxy, from which we define the cluster richness as the number of galaxies more likely to be true members than contaminants. This allows us to derive a halo mass-richness relation. In a companion paper, we apply the algorithm to the HSC-SSP data and compare our catalog with others based on different cluster-finding techniques and X-ray detections.
academic

Selecting Clusters and Protoclusters via Stellar Mass Density: I. Method and tests on Mock HSC-SSP catalogs

Basic Information

  • Paper ID: 2510.10735
  • Title: Selecting Clusters and Protoclusters via Stellar Mass Density: I. Method and tests on Mock HSC-SSP catalogs
  • Authors: Marcelo C. Vicentin, Pablo Araya-Araya, Laerte Sodré Jr., Michael A. Strauss
  • Classification: astro-ph.CO astro-ph.GA
  • Publication Date: October 14, 2025 (Draft version)
  • Paper Link: https://arxiv.org/abs/2510.10735

Abstract

This paper presents a novel algorithm for identifying galaxy (proto)clusters through stellar mass density. The method first selects dominant galaxies (i.e., brightest cluster galaxies BCGs or proto-BCGs) by tracing local stellar mass density defined by massive galaxies. The research focuses on applying this method to the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) Wide Survey for detecting candidates at redshift z ~ 2. By applying the method to mock galaxy catalogs constrained by HSC-SSP Wide Survey observations, the authors derive a function describing the probability that a massive galaxy becomes the dominant galaxy of a structure within a given redshift interval. Results show that galaxies with probability > 50% yield BCG/proto-BCG samples with ≳65% purity, with most contamination coming from massive galaxies in groups below the cluster threshold. Using the same threshold, the resulting (proto)cluster sample achieves 80% purity and 50% completeness for halos with Mhalo ≥ 10^14 M⊙, with completeness approaching 100% for Mhalo ≥ 10^14.5 M⊙.

Research Background and Motivation

Scientific Questions

Galaxy clusters are the largest gravitationally bound structures in the universe, tracing the densest regions on cosmic scales. Their identification is crucial for understanding large-scale structure evolution and galaxy evolution mechanisms. However, identifying (proto)clusters at high redshift (z > 1) faces significant challenges:

  1. Evolutionary Status: Within the redshift range 1 < z < 2, a substantial fraction of structures are still forming, termed protoclusters, and remain unrelaxed
  2. Observational Limitations: Spectroscopic samples at high redshift are sparse and inhomogeneous, posing challenges for photometric redshift estimation and red sequence color calibration
  3. Spatial Distribution: Galaxies in protoclusters may be dispersed across several comoving megaparsec-scale regions

Limitations of Existing Methods

Existing cluster detection algorithms primarily include:

  • redMaPPer and CAMIRA: Focus on detecting the "red sequence" in the color-magnitude diagram of galaxy clusters
  • Other Methods: Rely solely on galaxy distribution in angular separation and photometric redshift

These methods achieve success rates exceeding 60% at z < 1, but performance degrades at high redshift because the red sequence is often not well-defined.

Research Innovations

The proposed method offers the following advantages:

  1. Color-Independent: Does not directly depend on galaxy color assumptions
  2. Dominant Galaxy Priority: Identifies dominant galaxies first rather than overall structural features
  3. High-Redshift Applicability: Particularly suitable for high-redshift regions where protoclusters are more common

Core Contributions

  1. Novel Galaxy Cluster Detection Algorithm: A new method for identifying dominant galaxies based on stellar mass density contrast
  2. Probabilistic Model: Derives a function describing the probability that a galaxy becomes the dominant galaxy of a structure
  3. Algorithm Validation: Verifies the algorithm on PCcones simulated data, achieving >65% BCG/proto-BCG purity
  4. Member Selection Method: Provides probabilistic member selection based on stellar mass and distance to the dominant galaxy
  5. Mass-Abundance Relation: Establishes halo mass-abundance relations for mass estimation

Methodology Details

Task Definition

Input: Galaxy catalog from photometric surveys (position, photometric redshift, stellar mass) Output: (Proto)cluster candidates with dominant galaxies, member galaxies, and abundance Constraints: Applicable to HSC-SSP Wide Survey observational limits, detection redshift range 0.1 < z < 2

Algorithm Architecture

1. Dominant Galaxy Candidate Pre-selection

For each pre-selected massive dominant galaxy candidate (i), compute the stellar mass within a cylindrical volume centered on this candidate:

  • Cylinder Definition: Radius r = 1 Mpc, height corresponding to comoving distance within the redshift slice
  • Redshift Slice: Δzi = zi - σz(1 + zi), zi + σz(1 + zi)

2. Density Contrast Calculation

Divide the cylindrical volume into three equally-spaced concentric annuli, applying a weight factor based on the inverse of projected radial distance:

ρ̂i = Σj=1³ [M★,i,j^tot / (π(rj² - rj-1²)dc(Δzi))] × (rj/cMpc)^(-w) / Σj=1³ (rj/cMpc)^(-w)

where w = 0.8 is the optimized weight.

Density contrast is defined as:

δρi = (ρ̂i - ρ̄) / ρ̄

3. Probabilistic Modeling

Construct a probabilistic model based on the density contrast distribution, using a modified sigmoid function:

f(δρ; a,b,c,d) = a / {1 + exp[-b(δρ - c)]} + d

Member Selection Algorithm

  1. Feature Selection: Based on photometric stellar mass and distance to BCG
  2. Probability Calculation: P(Member|M★,phot, ddominant, zdominant) vs P(Cont|M★,phot, ddominant, zdominant)
  3. Member Determination: Galaxies with P(Member) > P(Cont) are classified as members
  4. Abundance Definition: Number of galaxies satisfying the condition

Experimental Setup

Datasets

PCcones Mock Light Cones:

  • Based on L-GALAXIES semi-analytic model from Millennium simulation
  • 10 light cones, each 36 deg²
  • Simulates HSC-SSP Wide Survey observational constraints
  • Includes grizY filters + W1/W2 infrared data

Observational Verification Data:

  • CAMIRA HSC-SSP wide cluster candidates
  • Wen & Han 2021 HSC-SSP wide catalog
  • redMaPPer cluster candidates

Evaluation Metrics

  • Completeness: Fraction of true dominant galaxies correctly identified
  • Purity: Fraction of true dominant galaxies in the identified sample
  • Contamination Rate: Proportion of different contamination sources

Implementation Details

  • Redshift Intervals: 6 photometric redshift bins [0.1,0.45), [0.45,0.7), [0.7,1.05), [1.05,1.3), [1.3,1.5), [1.5,2)
  • Mass Thresholds: log(M★,phot/M⊙) = 11, 11, 11, 10.5, 10.5, 10.5 for respective redshift bins
  • Structure Definition: Mhalo ≥ 10^14 M⊙ as cluster threshold

Experimental Results

Main Results

BCG/Proto-BCG Identification Performance

  • Pdominant > 50% Threshold: BCG/proto-BCG purity ≥ 65%
  • Primary Contamination Source: Galaxies in massive groups (below cluster threshold)
  • Redshift Evolution: BCG selection fraction decreases from 52% (low z) to 33% (z > 1.5)

Cluster Detection Performance

  • Pdominant ≥ 0.5:
    • Mhalo ≥ 10^14 M⊙: 80% purity, 50% completeness
    • Mhalo ≥ 10^14.5 M⊙: Completeness approaching 100%
  • Pdominant ≥ 0.8: Purity increases to ~95%

Mass-Abundance Relations

Establishes halo mass-abundance relations for different redshift intervals:

  • Adopts log-linear relation: log(Mhalo) = α × λ + β
  • Slope α range: 0.022-0.053
  • Intercept β range: 13.140-13.769

Consistency Verification

Comparison with observational catalogs shows:

  • BCG Properties: i-band magnitude, r-i color, stellar mass consistent with observations
  • Spatial Distribution: Velocity dispersion and radial profiles consistent with redMaPPer (KS test p=0.245)

Traditional Methods

  1. Red Sequence Methods: redMaPPer, CAMIRA detect based on red sequence of passive galaxies
  2. Density Methods: Friends-of-Friends methods based on galaxy spatial and redshift distribution
  3. Multi-wavelength Methods: X-ray, SZ effect detection

Innovations in This Work

Compared to traditional methods, this algorithm:

  • Does not depend on red sequence assumptions, applicable at high redshift
  • Prioritizes identifying dominant galaxies rather than overall structures
  • Combines stellar mass density information to improve identification accuracy

Conclusions and Discussion

Main Conclusions

  1. Algorithm Effectiveness: The new algorithm effectively identifies (proto)clusters within z < 2
  2. Performance Metrics: Achieves good performance under reasonable purity-completeness balance
  3. High-Redshift Applicability: Particularly suitable for high-redshift regions dominated by protoclusters
  4. Member Selection: Probabilistic member selection method effectively determines cluster abundance

Limitations

  1. Photometric Precision Dependence: Photometric redshift and stellar mass estimation accuracy decrease at high redshift
  2. W1/W2 Coverage: Reduced infrared band coverage at high redshift affects performance
  3. Projection Effects: Projection effects at high redshift may increase contamination
  4. Mass Threshold: Choice of 10^14 M⊙ threshold may affect protocluster definition

Future Directions

  1. Real Data Application: Apply the algorithm to actual HSC-SSP observational data
  2. Multi-wavelength Integration: Combine multi-wavelength information (e.g., X-ray) for verification
  3. Method Generalization: Adapt to other survey projects (DES, LSST, etc.)
  4. High-Redshift Optimization: Further optimize the algorithm for z > 1.5 regions

In-Depth Evaluation

Strengths

  1. Methodological Innovation: Proposes a unique "dominant galaxy priority" identification strategy
  2. Comprehensive Experiments: Thorough validation on detailed simulated data
  3. Strong Performance: Achieves good balance between purity and completeness
  4. High Applicability: Particularly suitable for high-redshift protocluster detection
  5. Clear Writing: Detailed method description and well-designed experiments

Weaknesses

  1. Simulation Dependence: Results primarily based on simulated data; real-world application effectiveness remains to be verified
  2. Parameter Tuning: Selection of multiple empirical parameters (e.g., w=0.8) lacks theoretical justification
  3. Computational Complexity: Insufficient analysis of computational cost for density calculations
  4. Systematic Error Robustness: Limited analysis of robustness to photometric systematic errors

Impact

  1. Academic Value: Provides methodological contribution to galaxy cluster detection
  2. Practical Value: Promising for important applications in upcoming large surveys
  3. Reproducibility: Detailed method description facilitates reproduction and improvement
  4. Field Advancement: Provides powerful tools for high-redshift galaxy cluster research

Applicable Scenarios

  1. Large Photometric Surveys: HSC-SSP, DES, LSST, etc.
  2. High-Redshift Detection: Particularly suitable for searching protoclusters at z > 1
  3. Spectroscopic Follow-up: Provides target selection for spectroscopic instruments like PFS
  4. Cosmological Research: Large-scale structure evolution and galaxy evolution studies

References

The paper cites extensive relevant research, primarily including:

  • Cluster detection algorithms: Rykoff et al. (2014), Oguri (2014)
  • HSC-SSP survey: Aihara et al. (2018, 2022)
  • Simulation work: Henriques et al. (2015), Springel (2005)
  • Protocluster research: Overzier (2016), Toshikawa et al. (2018)

Summary: This is a high-quality astrophysical methodology paper that proposes an innovative cluster detection algorithm with comprehensive validation. The method has unique advantages for high-redshift protocluster detection and provides valuable tools for upcoming large survey projects. The paper's main contributions lie in methodological innovation and systematic verification, making important contributions to the field's development.