Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation
Sui, Lichau, Lefèvre et al.
Recent studies of multimodal industrial anomaly detection (IAD) based on 3D point clouds and RGB images have highlighted the importance of exploiting the redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multimodal IAD in practical production lines remains a work in progress. It is essential to consider the trade-offs between the costs and benefits associated with the introduction of new modalities while ensuring compatibility with current processes. Existing quality control processes combine rapid in-line inspections, such as optical and infrared imaging with high-resolution but time-consuming near-line characterization techniques, including industrial CT and electron microscopy to manually or semi-automatically locate and analyze defects in the production of Li-ion batteries and composite materials. Given the cost and time limitations, only a subset of the samples can be inspected by all in-line and near-line methods, and the remaining samples are only evaluated through one or two forms of in-line inspection. To fully exploit data for deep learning-driven automatic defect detection, the models must have the ability to leverage multimodal training and handle incomplete modalities during inference. In this paper, we propose CMDIAD, a Cross-Modal Distillation framework for IAD to demonstrate the feasibility of a Multi-modal Training, Few-modal Inference (MTFI) pipeline. Our findings show that the MTFI pipeline can more effectively utilize incomplete multimodal information compared to applying only a single modality for training and inference. Moreover, we investigate the reasons behind the asymmetric performance improvement using point clouds or RGB images as the main modality of inference. This provides a foundation for our future multimodal dataset construction with additional modalities from manufacturing scenarios.
academic
Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation
This paper addresses a practical challenge in industrial anomaly detection: in real production lines, complete multimodal detection of all samples is infeasible due to cost and time constraints. The authors propose the CMDIAD framework, which implements a multimodal training with few-modal inference (MTFI) pipeline. Through cross-modal knowledge distillation techniques, the model can leverage complete multimodal data during training while achieving superior performance using only partial modalities during inference.
In industrial anomaly detection, existing multimodal methods typically require complete modal information during both training and inference. However, in real production environments:
Cost Constraints: High-resolution detection technologies (e.g., industrial CT, electron microscopy) are expensive and time-consuming
Practical Limitations: Only a subset of samples can undergo full-modal detection, while most samples can only be assessed through 1-2 rapid online detection methods
Insufficient Data Utilization: Existing methods cannot fully leverage multimodal information from the training phase to improve single-modal inference performance
First Incomplete Multimodal IAD: To the authors' knowledge, this is the first work addressing industrial anomaly detection with incomplete multimodal data
CMDIAD Framework: Proposes a novel multimodal IAD framework based on cross-modal distillation, enabling multimodal training with few-modal inference
MTFI Pipeline: Demonstrates the feasibility and effectiveness of the multimodal training, few-modal inference pipeline
Modal Correlation Analysis: Provides in-depth analysis of information transfer mechanisms between different modalities, offering guidance for future dataset construction
Point Cloud Feature Extraction: Uses Point-MAE to extract point cloud features, obtaining RGB-aligned feature maps through FPS sampling and IDW interpolation
Cross-Modal Hallucination Generation: Learns cross-modal mappings to generate "hallucinated" features of missing modalities during inference
Multi-Path Distillation Strategy: Provides three distillation methods at different levels, balancing computational complexity and performance
Asymmetric Performance Analysis: Provides in-depth analysis of performance differences across different distillation directions and their underlying causes
This paper cites 67 relevant references, primarily including:
Classical methods in industrial anomaly detection (PatchCore, M3DM, etc.)
Related work on cross-modal knowledge distillation
Foundational methods in 3D point cloud processing and multimodal learning
Original papers of important datasets such as MVTec 3D-AD
Overall Assessment: This is a high-quality paper addressing practical industrial problems. The proposed CMDIAD framework possesses significant theoretical and practical value. While there is room for improvement in theoretical analysis and real-world scenario validation, its innovation and practicality make it an important contribution to the field.