NOvA is a long-baseline neutrino oscillation experiment that detects neutrino particles from the NuMI beam at Fermilab. Before data from this experiment can be used in analyses, raw hits in the detector must be matched to their source particles, and the type of each particle must be identified. This task has commonly been done using a mix of traditional clustering approaches and convolutional neural networks (CNNs). Due to the construction of the detector, the data is presented as two sparse 2D images: an XZ and a YZ view of the detector, rather than a 3D representation. We propose a point set neural network that operates on the sparse matrices with an operation that mixes information from both views. Our model uses less than 10% of the memory required using previous methods while achieving a 96.8% AUC score, a higher score than obtained when both views are processed independently (85.4%).
Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors
- Paper ID: 2510.09659
- Title: Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors
- Authors: Edgar E. Robles, Dikshant Sagar, Alejandro Yankelevich, Jianming Bian, Pierre Baldi (University of California, Irvine) for the NOvA Collaboration
- Classification: cs.LG (Machine Learning), hep-ex (High Energy Physics - Experiment)
- Publication Date: October 7, 2025 (Preprint)
- Paper Link: https://arxiv.org/abs/2510.09659v1
NOvA is a long-baseline neutrino oscillation experiment designed to detect neutrino particles from Fermilab's NuMI beam. Before experimental data can be used for analysis, raw hit signals in the detector must be matched to their source particles, and the type of each particle must be identified. Traditionally, this task has been accomplished through a combination of conventional clustering methods and convolutional neural networks (CNNs). Due to the detector's structural characteristics, data is presented as two sparse 2D images—the XZ and YZ views of the detector—rather than as a 3D representation. This paper proposes a point set neural network that operates on sparse matrices and processes data through operations that fuse information from both views. The model uses less than 10% of the memory of previous methods while achieving an AUC score of 96.8%, surpassing the 85.4% score obtained when processing the two views independently.
The core problems addressed in this research are particle trajectory segmentation and classification tasks in the NOvA neutrino experiment, specifically including:
- Instance Segmentation: Matching raw hit signals in the detector to corresponding source particles and separating different particle trajectories (prongs)
- Semantic Segmentation: Identifying the type of each particle (e.g., muons, electrons, protons, photons, pions, etc.)
- NOvA is an important neutrino physics experiment requiring processing of large volumes of sparse data
- Accurate particle identification and segmentation form the foundation for subsequent physics analysis
- Traditional methods face bottlenecks in computational resources and accuracy
- Traditional CNN Approaches: Require converting sparse matrices to dense matrices, resulting in high memory consumption
- Independent View Processing: Existing methods process XZ and YZ views through separate CNNs or treat each view as an image channel, failing to effectively fuse cross-view information
- Computational Efficiency: Even using sparse convolution operations like MinkowskiEngine still requires approximate convolutions to conserve memory
The unique construction of the NOvA detector restricts data representation to two 2D planes rather than complete 3D representation. Existing methods fail to fully exploit complementary information across views. This work aims to design an efficient neural network architecture capable of effectively fusing multi-view information.
- Proposed Heterogeneous Point Set Transformers (HPST): First extension of point set transformers to multi-view particle detector data processing
- Designed Heterogeneous Attention Mechanism: Innovatively implemented cross-view information fusion, enabling information flow between different views
- Significantly Improved Performance and Efficiency:
- AUC improved from 85.4% to 96.8%
- Memory usage reduced to less than 10% of previous methods
- Provided Complete Multi-task Learning Framework: Simultaneously addresses instance and semantic segmentation tasks
Given the NOvA detector dataset X containing N samples, each sample X^(i) represents a particle detection event. Each event is divided into M=2 views (XZ and YZ), with each view X^(i,j) containing a variable number of detections K^(i,j). Each detection is described by coordinates x_k^(i,j) ∈ R^c and values v_k^(i,j) ∈ R^d.
Objectives:
- Instance Segmentation: Group detection points into different particle trajectories
- Semantic Segmentation: Assign particle type labels to each detection point
HPST adopts a UNet-like encoder-decoder structure:
- Encoder: n stages, each containing m attention blocks followed by pooling operations
- Decoder: n stages, each followed by unpooling operations and skip connections
- Feature Dimensions: Progressively doubled during encoding stages, progressively halved during decoding stages
The core innovation lies in the heterogeneous attention mechanism, comprising:
- Intra-view Attention: Conventional self-attention mechanism processing points within the same view
- Inter-view Attention: Key component for cross-view information fusion
Inter-view Attention Computation:
- Query: Q_k^(i,j'→j) query of point k from view j' to view j
- Key-Value: K_{k'}^(i,j'→j) and V_{k'}^(i,j'→j) corresponding keys and values
- Attention Weights: w_{kk'}^(i,j'→j) = Q_k^(i,j'→j)T K_{k'}^(i,j'→j)
- Output: h'k^(i,j) = Σ{k'} softmax(w_{kk'}^(i,j'→j))V_{k'}^(i,j'→j)
- Intra-view Distance: d_(x_k^(i,j), x_{k'}^(i,j)) distance between points in the same view
- Inter-view Distance: d_{jj'}(x_k^(i,j), x_{k'}^(i,j'}) distance between points in different views
- Graph connections constructed based on k-nearest neighbors
- Pooling: Voxel pooling method creating grids within the same view and averaging point values within grid cells
- Unpooling: Using skip connections to upsample points to previous coordinates
- Cross-view Information Fusion: First implementation of effective multi-view point cloud attention mechanism in particle physics
- Efficient Sparse Data Processing: Direct operation on point cloud representations, avoiding sparse-to-dense matrix conversion
- Multi-scale Feature Learning: Implementing local-to-global information mixing through UNet architecture
- Joint Optimization Framework: Unified handling of segmentation and classification tasks
- Data Source: Neutrino interaction simulation data generated by the NOvA collaboration
- Data Scale: 9,246,712 events
- Data Characteristics:
- Average 70 hit points per event
- Image dimensions: 2×80×100
- Highly sparse data distribution
- Classification Performance:
- AUC (Area Under Curve)
- OVR AUC (One-vs-Rest AUC)
- Segmentation Performance:
- Efficiency/Recall: Proportion of correctly identified particle trajectories
- Purity/Precision: Accuracy of predicted trajectories
- Segmentation accuracy
- Computational Efficiency:
- Memory usage (MiB)
- Processing time per sample (seconds)
- Mask R-CNN: Region-based convolutional neural network
- GAT (Graph Attention Networks): Graph attention networks
- HPST: Heterogeneous point set transformer proposed in this work
- Hardware Environment: Intel Xeon E5-2640 v4 @ 2.40GHz, 503G RAM, 4×NVIDIA Titan V
- Hyperparameter Search:
- Neighbor connections: {4, 8}
- Network stages: {2, 3, 4}
- Embedding dimensions: {128, 256, 512}
- Learning rate: 1e-4 to 1e-1
- Training Settings:
- Hyperparameter search: 8 epochs, 1% data
- Final training: 24 epochs
| Model | Memory Usage (MiB) | Time per Sample (s) | OVR AUC | Segmentation Accuracy |
|---|
| R-CNN | 282.4±37.43 | 265.33±2.01 | 0.732 | 0.343 |
| GAT | 29.8±0.40 | 1.74±0.001 | 0.854 | 0.659 |
| HPST | 34.7±1.00 | 7.05±0.001 | 0.968 | 0.835 |
Key Findings:
- HPST significantly outperforms baseline methods across all performance metrics
- Compared to independent view processing (85.4% AUC), HPST's cross-view fusion improves AUC to 96.8%
- Memory usage is approximately 12% of Mask R-CNN
Efficiency:
- Muons: 0.95 (best)
- Electrons: 0.93
- Protons: 0.82
- Photons: 0.75
- Pions: 0.71 (most challenging)
Purity:
- Muons: 0.90
- Electrons: 0.88
- Protons: 0.78
- Photons: 0.72
- Pions: 0.69
Analysis: Primary particle types (muons and electrons) show the best segmentation performance, while secondary particles are more challenging due to fewer hit points.
The paper presents a typical neutrino interaction event including:
- Primary electron cascade
- Multiple secondary particles
- Comparison of HPST predictions with ground truth labels showing good classification performance, with minor confusion only on secondary particles with very few hit points
- Traditional Methods: Clustering algorithms combined with hand-crafted features
- CNN Applications:
- Aurisano et al.'s neutrino event classifier
- Baldi et al.'s energy reconstruction regression CNN
- Psihas et al.'s context-enhanced particle identification
- Sparse Convolution: Frameworks like MinkowskiEngine
- Point Cloud Methods: Point Transformers applications in 3D vision
- Graph Neural Networks: GAT and similar approaches on irregular data
Existing NOvA methods primarily employ independent CNN processing or channel fusion. This work is the first to implement true cross-view attention mechanisms.
- HPST is Effective: HPST successfully addresses segmentation and classification of multi-view particle detector data
- Cross-view Fusion is Critical: Significant performance improvements over independent processing through inter-view information fusion
- Superior Computational Efficiency: Substantial memory reduction while improving performance
- Data Dependency: Efficiency advantages of sparse representation may diminish with higher data density
- Computational Complexity: Point set operation complexity may increase with point count, potentially slowing the algorithm
- Domain Specificity: Method designed for NOvA's specific dual-view structure
- Extension to other multi-view particle detector experiments
- Exploration of more complex cross-view attention mechanisms
- Integration of physics priors for further performance enhancement
- Strong Innovation: First application of heterogeneous attention mechanisms to particle physics data processing
- High Practical Value: Significant performance improvements and efficiency gains are important for actual experiments
- Comprehensive Experiments: Thorough comparative experiments and detailed performance analysis
- Clear Writing: Accurate technical descriptions and clear architecture diagrams
- Limited Theoretical Analysis: Lacks in-depth theoretical analysis of why cross-view attention is effective
- Insufficient Ablation Studies: Incomplete analysis of specific contributions from different components (e.g., distance definitions, attention mechanisms)
- Limited Generalization Validation: Verification only on NOvA data; lacks validation on other similar tasks
- Academic Value: Provides novel solutions for multi-view sparse data processing
- Practical Value: Directly applicable to NOvA experiment data processing pipelines
- Inspirational Significance: Provides reference for data processing in other particle physics experiments
- Multi-view particle detector data processing
- 2D multi-view reconstruction of sparse 3D data
- Point cloud analysis tasks requiring cross-view information fusion
- Large-scale scientific data processing with limited computational resources
The paper cites important works from particle physics, machine learning, and computer vision domains, including technical reports related to the NOvA experiment, applications of deep learning in science, and classical papers on graph neural networks and attention mechanisms. Particularly noteworthy are citations to related technologies such as MinkowskiEngine, Mask R-CNN, and Graph Attention Networks, reflecting the authors' deep understanding of the field's current state.