2025-11-24T13:58:17.726959

Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors

Robles, Sagar, Yankelevich et al.

NOvA is a long-baseline neutrino oscillation experiment that detects neutrino particles from the NuMI beam at Fermilab. Before data from this experiment can be used in analyses, raw hits in the detector must be matched to their source particles, and the type of each particle must be identified. This task has commonly been done using a mix of traditional clustering approaches and convolutional neural networks (CNNs). Due to the construction of the detector, the data is presented as two sparse 2D images: an XZ and a YZ view of the detector, rather than a 3D representation. We propose a point set neural network that operates on the sparse matrices with an operation that mixes information from both views. Our model uses less than 10% of the memory required using previous methods while achieving a 96.8% AUC score, a higher score than obtained when both views are processed independently (85.4%).

academic

Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors

Basic Information

Paper ID: 2510.09659
Title: Heterogeneous Point Set Transformers for Segmentation of Multiple View Particle Detectors
Authors: Edgar E. Robles, Dikshant Sagar, Alejandro Yankelevich, Jianming Bian, Pierre Baldi (University of California, Irvine) for the NOvA Collaboration
Classification: cs.LG (Machine Learning), hep-ex (High Energy Physics - Experiment)
Publication Date: October 7, 2025 (Preprint)
Paper Link: https://arxiv.org/abs/2510.09659v1

Abstract

NOvA is a long-baseline neutrino oscillation experiment designed to detect neutrino particles from Fermilab's NuMI beam. Before experimental data can be used for analysis, raw hit signals in the detector must be matched to their source particles, and the type of each particle must be identified. Traditionally, this task has been accomplished through a combination of conventional clustering methods and convolutional neural networks (CNNs). Due to the detector's structural characteristics, data is presented as two sparse 2D images—the XZ and YZ views of the detector—rather than as a 3D representation. This paper proposes a point set neural network that operates on sparse matrices and processes data through operations that fuse information from both views. The model uses less than 10% of the memory of previous methods while achieving an AUC score of 96.8%, surpassing the 85.4% score obtained when processing the two views independently.

Research Background and Motivation

Problem Definition

The core problems addressed in this research are particle trajectory segmentation and classification tasks in the NOvA neutrino experiment, specifically including:

Instance Segmentation: Matching raw hit signals in the detector to corresponding source particles and separating different particle trajectories (prongs)
Semantic Segmentation: Identifying the type of each particle (e.g., muons, electrons, protons, photons, pions, etc.)

Problem Significance

NOvA is an important neutrino physics experiment requiring processing of large volumes of sparse data
Accurate particle identification and segmentation form the foundation for subsequent physics analysis
Traditional methods face bottlenecks in computational resources and accuracy

Limitations of Existing Methods

Traditional CNN Approaches: Require converting sparse matrices to dense matrices, resulting in high memory consumption
Independent View Processing: Existing methods process XZ and YZ views through separate CNNs or treat each view as an image channel, failing to effectively fuse cross-view information
Computational Efficiency: Even using sparse convolution operations like MinkowskiEngine still requires approximate convolutions to conserve memory

Research Motivation

The unique construction of the NOvA detector restricts data representation to two 2D planes rather than complete 3D representation. Existing methods fail to fully exploit complementary information across views. This work aims to design an efficient neural network architecture capable of effectively fusing multi-view information.

Core Contributions

Proposed Heterogeneous Point Set Transformers (HPST): First extension of point set transformers to multi-view particle detector data processing
Designed Heterogeneous Attention Mechanism: Innovatively implemented cross-view information fusion, enabling information flow between different views
Significantly Improved Performance and Efficiency:
- AUC improved from 85.4% to 96.8%
- Memory usage reduced to less than 10% of previous methods
Provided Complete Multi-task Learning Framework: Simultaneously addresses instance and semantic segmentation tasks

Methodology Details

Task Definition

Given the NOvA detector dataset X containing N samples, each sample X^(i) represents a particle detection event. Each event is divided into M=2 views (XZ and YZ), with each view X^(i,j) containing a variable number of detections K^(i,j). Each detection is described by coordinates x_k^(i,j) ∈ R^c and values v_k^(i,j) ∈ R^d.

Objectives:

Instance Segmentation: Group detection points into different particle trajectories
Semantic Segmentation: Assign particle type labels to each detection point

Model Architecture

Overall Architecture Design

HPST adopts a UNet-like encoder-decoder structure:

Encoder: n stages, each containing m attention blocks followed by pooling operations
Decoder: n stages, each followed by unpooling operations and skip connections
Feature Dimensions: Progressively doubled during encoding stages, progressively halved during decoding stages

Heterogeneous Attention Mechanism

The core innovation lies in the heterogeneous attention mechanism, comprising:

Intra-view Attention: Conventional self-attention mechanism processing points within the same view
Inter-view Attention: Key component for cross-view information fusion

Inter-view Attention Computation:

Query: Q_k^(i,j'→j) query of point k from view j' to view j
Key-Value: K_{k'}^(i,j'→j) and V_{k'}^(i,j'→j) corresponding keys and values
Attention Weights: w_{kk'}^(i,j'→j) = Q_k^(i,j'→j)T K_{k'}^(i,j'→j)
Output: h'k^(i,j) = Σ{k'} softmax(w_{kk'}^(i,j'→j))V_{k'}^(i,j'→j)

Distance Definition and Graph Construction

Intra-view Distance: d_(x_k^(i,j), x_{k'}^(i,j)) distance between points in the same view
Inter-view Distance: d_{jj'}(x_k^(i,j), x_{k'}^(i,j'}) distance between points in different views
Graph connections constructed based on k-nearest neighbors

Pooling and Unpooling

Pooling: Voxel pooling method creating grids within the same view and averaging point values within grid cells
Unpooling: Using skip connections to upsample points to previous coordinates

Technical Innovations

Cross-view Information Fusion: First implementation of effective multi-view point cloud attention mechanism in particle physics
Efficient Sparse Data Processing: Direct operation on point cloud representations, avoiding sparse-to-dense matrix conversion
Multi-scale Feature Learning: Implementing local-to-global information mixing through UNet architecture
Joint Optimization Framework: Unified handling of segmentation and classification tasks

Experimental Setup

Dataset

Data Source: Neutrino interaction simulation data generated by the NOvA collaboration
Data Scale: 9,246,712 events
Data Characteristics:
- Average 70 hit points per event
- Image dimensions: 2×80×100
- Highly sparse data distribution

Evaluation Metrics

Classification Performance:
- AUC (Area Under Curve)
- OVR AUC (One-vs-Rest AUC)
Segmentation Performance:
- Efficiency/Recall: Proportion of correctly identified particle trajectories
- Purity/Precision: Accuracy of predicted trajectories
- Segmentation accuracy
Computational Efficiency:
- Memory usage (MiB)
- Processing time per sample (seconds)

Comparison Methods

Mask R-CNN: Region-based convolutional neural network
GAT (Graph Attention Networks): Graph attention networks
HPST: Heterogeneous point set transformer proposed in this work

Implementation Details

Hardware Environment: Intel Xeon E5-2640 v4 @ 2.40GHz, 503G RAM, 4×NVIDIA Titan V
Hyperparameter Search:
- Neighbor connections: {4, 8}
- Network stages: {2, 3, 4}
- Embedding dimensions: {128, 256, 512}
- Learning rate: 1e-4 to 1e-1
Training Settings:
- Hyperparameter search: 8 epochs, 1% data
- Final training: 24 epochs

Experimental Results

Main Results

Model	Memory Usage (MiB)	Time per Sample (s)	OVR AUC	Segmentation Accuracy
R-CNN	282.4±37.43	265.33±2.01	0.732	0.343
GAT	29.8±0.40	1.74±0.001	0.854	0.659
HPST	34.7±1.00	7.05±0.001	0.968	0.835

Key Findings:

HPST significantly outperforms baseline methods across all performance metrics
Compared to independent view processing (85.4% AUC), HPST's cross-view fusion improves AUC to 96.8%
Memory usage is approximately 12% of Mask R-CNN

Performance Analysis by Particle Type

Efficiency:

Muons: 0.95 (best)
Electrons: 0.93
Protons: 0.82
Photons: 0.75
Pions: 0.71 (most challenging)

Purity:

Muons: 0.90
Electrons: 0.88
Protons: 0.78
Photons: 0.72
Pions: 0.69

Analysis: Primary particle types (muons and electrons) show the best segmentation performance, while secondary particles are more challenging due to fewer hit points.

Case Study

The paper presents a typical neutrino interaction event including:

Primary electron cascade
Multiple secondary particles
Comparison of HPST predictions with ground truth labels showing good classification performance, with minor confusion only on secondary particles with very few hit points

Machine Learning in Particle Physics

Traditional Methods: Clustering algorithms combined with hand-crafted features
CNN Applications:
- Aurisano et al.'s neutrino event classifier
- Baldi et al.'s energy reconstruction regression CNN
- Psihas et al.'s context-enhanced particle identification

Sparse Data Processing

Sparse Convolution: Frameworks like MinkowskiEngine
Point Cloud Methods: Point Transformers applications in 3D vision
Graph Neural Networks: GAT and similar approaches on irregular data

Multi-view Learning

Existing NOvA methods primarily employ independent CNN processing or channel fusion. This work is the first to implement true cross-view attention mechanisms.

Conclusions and Discussion

Main Conclusions

HPST is Effective: HPST successfully addresses segmentation and classification of multi-view particle detector data
Cross-view Fusion is Critical: Significant performance improvements over independent processing through inter-view information fusion
Superior Computational Efficiency: Substantial memory reduction while improving performance

Limitations

Data Dependency: Efficiency advantages of sparse representation may diminish with higher data density
Computational Complexity: Point set operation complexity may increase with point count, potentially slowing the algorithm
Domain Specificity: Method designed for NOvA's specific dual-view structure

Future Directions

Extension to other multi-view particle detector experiments
Exploration of more complex cross-view attention mechanisms
Integration of physics priors for further performance enhancement

In-Depth Evaluation

Strengths

Strong Innovation: First application of heterogeneous attention mechanisms to particle physics data processing
High Practical Value: Significant performance improvements and efficiency gains are important for actual experiments
Comprehensive Experiments: Thorough comparative experiments and detailed performance analysis
Clear Writing: Accurate technical descriptions and clear architecture diagrams

Weaknesses

Limited Theoretical Analysis: Lacks in-depth theoretical analysis of why cross-view attention is effective
Insufficient Ablation Studies: Incomplete analysis of specific contributions from different components (e.g., distance definitions, attention mechanisms)
Limited Generalization Validation: Verification only on NOvA data; lacks validation on other similar tasks

Impact

Academic Value: Provides novel solutions for multi-view sparse data processing
Practical Value: Directly applicable to NOvA experiment data processing pipelines
Inspirational Significance: Provides reference for data processing in other particle physics experiments

Applicable Scenarios

Multi-view particle detector data processing
2D multi-view reconstruction of sparse 3D data
Point cloud analysis tasks requiring cross-view information fusion
Large-scale scientific data processing with limited computational resources

References

The paper cites important works from particle physics, machine learning, and computer vision domains, including technical reports related to the NOvA experiment, applications of deep learning in science, and classical papers on graph neural networks and attention mechanisms. Particularly noteworthy are citations to related technologies such as MinkowskiEngine, Mask R-CNN, and Graph Attention Networks, reflecting the authors' deep understanding of the field's current state.