2025-11-24T13:58:17.726959

Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors

Robles, Sagar, Yankelevich et al.
NOvA is a long-baseline neutrino oscillation experiment that detects neutrino particles from the NuMI beam at Fermilab. Before data from this experiment can be used in analyses, raw hits in the detector must be matched to their source particles, and the type of each particle must be identified. This task has commonly been done using a mix of traditional clustering approaches and convolutional neural networks (CNNs). Due to the construction of the detector, the data is presented as two sparse 2D images: an XZ and a YZ view of the detector, rather than a 3D representation. We propose a point set neural network that operates on the sparse matrices with an operation that mixes information from both views. Our model uses less than 10% of the memory required using previous methods while achieving a 96.8% AUC score, a higher score than obtained when both views are processed independently (85.4%).
academic

Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors

Basic Information

  • Paper ID: 2510.09659
  • Title: Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors
  • Authors: Edgar E. Robles, Dikshant Sagar, Alejandro Yankelevich, Jianming Bian, Pierre Baldi (University of California, Irvine) for the NOvA Collaboration
  • Classification: cs.LG (Machine Learning), hep-ex (High Energy Physics - Experiment)
  • Publication Date: October 7, 2025 (Preprint)
  • Paper Link: https://arxiv.org/abs/2510.09659v1

Abstract

NOvA is a long-baseline neutrino oscillation experiment designed to detect neutrino particles from Fermilab's NuMI beam. Before experimental data can be used for analysis, raw hit signals in the detector must be matched to their source particles, and the type of each particle must be identified. Traditionally, this task has been accomplished through a combination of conventional clustering methods and convolutional neural networks (CNNs). Due to the detector's structural characteristics, data is presented as two sparse 2D images—the XZ and YZ views of the detector—rather than as a 3D representation. This paper proposes a point set neural network that operates on sparse matrices and processes data through operations that fuse information from both views. The model uses less than 10% of the memory of previous methods while achieving an AUC score of 96.8%, surpassing the 85.4% score obtained when processing the two views independently.

Research Background and Motivation

Problem Definition

The core problems addressed in this research are particle trajectory segmentation and classification tasks in the NOvA neutrino experiment, specifically including:

  1. Instance Segmentation: Matching raw hit signals in the detector to corresponding source particles and separating different particle trajectories (prongs)
  2. Semantic Segmentation: Identifying the type of each particle (e.g., muons, electrons, protons, photons, pions, etc.)

Problem Significance

  • NOvA is an important neutrino physics experiment requiring processing of large volumes of sparse data
  • Accurate particle identification and segmentation form the foundation for subsequent physics analysis
  • Traditional methods face bottlenecks in computational resources and accuracy

Limitations of Existing Methods

  1. Traditional CNN Approaches: Require converting sparse matrices to dense matrices, resulting in high memory consumption
  2. Independent View Processing: Existing methods process XZ and YZ views through separate CNNs or treat each view as an image channel, failing to effectively fuse cross-view information
  3. Computational Efficiency: Even using sparse convolution operations like MinkowskiEngine still requires approximate convolutions to conserve memory

Research Motivation

The unique construction of the NOvA detector restricts data representation to two 2D planes rather than complete 3D representation. Existing methods fail to fully exploit complementary information across views. This work aims to design an efficient neural network architecture capable of effectively fusing multi-view information.

Core Contributions

  1. Proposed Heterogeneous Point Set Transformers (HPST): First extension of point set transformers to multi-view particle detector data processing
  2. Designed Heterogeneous Attention Mechanism: Innovatively implemented cross-view information fusion, enabling information flow between different views
  3. Significantly Improved Performance and Efficiency:
    • AUC improved from 85.4% to 96.8%
    • Memory usage reduced to less than 10% of previous methods
  4. Provided Complete Multi-task Learning Framework: Simultaneously addresses instance and semantic segmentation tasks

Methodology Details

Task Definition

Given the NOvA detector dataset X containing N samples, each sample X^(i) represents a particle detection event. Each event is divided into M=2 views (XZ and YZ), with each view X^(i,j) containing a variable number of detections K^(i,j). Each detection is described by coordinates x_k^(i,j) ∈ R^c and values v_k^(i,j) ∈ R^d.

Objectives:

  • Instance Segmentation: Group detection points into different particle trajectories
  • Semantic Segmentation: Assign particle type labels to each detection point

Model Architecture

Overall Architecture Design

HPST adopts a UNet-like encoder-decoder structure:

  • Encoder: n stages, each containing m attention blocks followed by pooling operations
  • Decoder: n stages, each followed by unpooling operations and skip connections
  • Feature Dimensions: Progressively doubled during encoding stages, progressively halved during decoding stages

Heterogeneous Attention Mechanism

The core innovation lies in the heterogeneous attention mechanism, comprising:

  1. Intra-view Attention: Conventional self-attention mechanism processing points within the same view
  2. Inter-view Attention: Key component for cross-view information fusion

Inter-view Attention Computation:

  • Query: Q_k^(i,j'→j) query of point k from view j' to view j
  • Key-Value: K_{k'}^(i,j'→j) and V_{k'}^(i,j'→j) corresponding keys and values
  • Attention Weights: w_{kk'}^(i,j'→j) = Q_k^(i,j'→j)T K_{k'}^(i,j'→j)
  • Output: h'k^(i,j) = Σ{k'} softmax(w_{kk'}^(i,j'→j))V_{k'}^(i,j'→j)

Distance Definition and Graph Construction

  • Intra-view Distance: d_(x_k^(i,j), x_{k'}^(i,j)) distance between points in the same view
  • Inter-view Distance: d_{jj'}(x_k^(i,j), x_{k'}^(i,j'}) distance between points in different views
  • Graph connections constructed based on k-nearest neighbors

Pooling and Unpooling

  • Pooling: Voxel pooling method creating grids within the same view and averaging point values within grid cells
  • Unpooling: Using skip connections to upsample points to previous coordinates

Technical Innovations

  1. Cross-view Information Fusion: First implementation of effective multi-view point cloud attention mechanism in particle physics
  2. Efficient Sparse Data Processing: Direct operation on point cloud representations, avoiding sparse-to-dense matrix conversion
  3. Multi-scale Feature Learning: Implementing local-to-global information mixing through UNet architecture
  4. Joint Optimization Framework: Unified handling of segmentation and classification tasks

Experimental Setup

Dataset

  • Data Source: Neutrino interaction simulation data generated by the NOvA collaboration
  • Data Scale: 9,246,712 events
  • Data Characteristics:
    • Average 70 hit points per event
    • Image dimensions: 2×80×100
    • Highly sparse data distribution

Evaluation Metrics

  1. Classification Performance:
    • AUC (Area Under Curve)
    • OVR AUC (One-vs-Rest AUC)
  2. Segmentation Performance:
    • Efficiency/Recall: Proportion of correctly identified particle trajectories
    • Purity/Precision: Accuracy of predicted trajectories
    • Segmentation accuracy
  3. Computational Efficiency:
    • Memory usage (MiB)
    • Processing time per sample (seconds)

Comparison Methods

  1. Mask R-CNN: Region-based convolutional neural network
  2. GAT (Graph Attention Networks): Graph attention networks
  3. HPST: Heterogeneous point set transformer proposed in this work

Implementation Details

  • Hardware Environment: Intel Xeon E5-2640 v4 @ 2.40GHz, 503G RAM, 4×NVIDIA Titan V
  • Hyperparameter Search:
    • Neighbor connections: {4, 8}
    • Network stages: {2, 3, 4}
    • Embedding dimensions: {128, 256, 512}
    • Learning rate: 1e-4 to 1e-1
  • Training Settings:
    • Hyperparameter search: 8 epochs, 1% data
    • Final training: 24 epochs

Experimental Results

Main Results

ModelMemory Usage (MiB)Time per Sample (s)OVR AUCSegmentation Accuracy
R-CNN282.4±37.43265.33±2.010.7320.343
GAT29.8±0.401.74±0.0010.8540.659
HPST34.7±1.007.05±0.0010.9680.835

Key Findings:

  • HPST significantly outperforms baseline methods across all performance metrics
  • Compared to independent view processing (85.4% AUC), HPST's cross-view fusion improves AUC to 96.8%
  • Memory usage is approximately 12% of Mask R-CNN

Performance Analysis by Particle Type

Efficiency:

  • Muons: 0.95 (best)
  • Electrons: 0.93
  • Protons: 0.82
  • Photons: 0.75
  • Pions: 0.71 (most challenging)

Purity:

  • Muons: 0.90
  • Electrons: 0.88
  • Protons: 0.78
  • Photons: 0.72
  • Pions: 0.69

Analysis: Primary particle types (muons and electrons) show the best segmentation performance, while secondary particles are more challenging due to fewer hit points.

Case Study

The paper presents a typical neutrino interaction event including:

  • Primary electron cascade
  • Multiple secondary particles
  • Comparison of HPST predictions with ground truth labels showing good classification performance, with minor confusion only on secondary particles with very few hit points

Machine Learning in Particle Physics

  1. Traditional Methods: Clustering algorithms combined with hand-crafted features
  2. CNN Applications:
    • Aurisano et al.'s neutrino event classifier
    • Baldi et al.'s energy reconstruction regression CNN
    • Psihas et al.'s context-enhanced particle identification

Sparse Data Processing

  1. Sparse Convolution: Frameworks like MinkowskiEngine
  2. Point Cloud Methods: Point Transformers applications in 3D vision
  3. Graph Neural Networks: GAT and similar approaches on irregular data

Multi-view Learning

Existing NOvA methods primarily employ independent CNN processing or channel fusion. This work is the first to implement true cross-view attention mechanisms.

Conclusions and Discussion

Main Conclusions

  1. HPST is Effective: HPST successfully addresses segmentation and classification of multi-view particle detector data
  2. Cross-view Fusion is Critical: Significant performance improvements over independent processing through inter-view information fusion
  3. Superior Computational Efficiency: Substantial memory reduction while improving performance

Limitations

  1. Data Dependency: Efficiency advantages of sparse representation may diminish with higher data density
  2. Computational Complexity: Point set operation complexity may increase with point count, potentially slowing the algorithm
  3. Domain Specificity: Method designed for NOvA's specific dual-view structure

Future Directions

  1. Extension to other multi-view particle detector experiments
  2. Exploration of more complex cross-view attention mechanisms
  3. Integration of physics priors for further performance enhancement

In-Depth Evaluation

Strengths

  1. Strong Innovation: First application of heterogeneous attention mechanisms to particle physics data processing
  2. High Practical Value: Significant performance improvements and efficiency gains are important for actual experiments
  3. Comprehensive Experiments: Thorough comparative experiments and detailed performance analysis
  4. Clear Writing: Accurate technical descriptions and clear architecture diagrams

Weaknesses

  1. Limited Theoretical Analysis: Lacks in-depth theoretical analysis of why cross-view attention is effective
  2. Insufficient Ablation Studies: Incomplete analysis of specific contributions from different components (e.g., distance definitions, attention mechanisms)
  3. Limited Generalization Validation: Verification only on NOvA data; lacks validation on other similar tasks

Impact

  1. Academic Value: Provides novel solutions for multi-view sparse data processing
  2. Practical Value: Directly applicable to NOvA experiment data processing pipelines
  3. Inspirational Significance: Provides reference for data processing in other particle physics experiments

Applicable Scenarios

  1. Multi-view particle detector data processing
  2. 2D multi-view reconstruction of sparse 3D data
  3. Point cloud analysis tasks requiring cross-view information fusion
  4. Large-scale scientific data processing with limited computational resources

References

The paper cites important works from particle physics, machine learning, and computer vision domains, including technical reports related to the NOvA experiment, applications of deep learning in science, and classical papers on graph neural networks and attention mechanisms. Particularly noteworthy are citations to related technologies such as MinkowskiEngine, Mask R-CNN, and Graph Attention Networks, reflecting the authors' deep understanding of the field's current state.