Reconstruction-based methods, particularly those leveraging autoencoders, have been widely adopted for anomaly detection task in brain MRI. Unlike most existing works try to improve the task accuracy through architectural or algorithmic innovations, we tackle this task from image quality assessment (IQA) perspective, an under-explored direction in the field. Due to the limitations of conventional metrics such as l1 in capturing the nuanced differences in reconstructed images for medical anomaly detection, we propose fusion quality, a novel metric that wisely integrates the structure-level sensitivity of Structural Similarity Index Measure (SSIM) with the pixel-level precision of l1. The metric offers a more comprehensive assessment of reconstruction quality, considering intensity (subtractive property of l1 and divisive property of SSIM), contrast, and structural similarity. Furthermore, the proposed metric makes subtle regional variations more impactful in the final assessment. Thus, considering the inherent divisive properties of SSIM, we design an average intensity ratio (AIR)-based data transformation that amplifies the divisive discrepancies between normal and abnormal regions, thereby enhancing anomaly detection. By fusing the aforementioned two components, we devise the IQA approach. Experimental results on two distinct brain MRI datasets show that our IQA approach significantly enhances medical anomaly detection performance when integrated with state-of-the-art baselines.
- Paper ID: 2408.08228
- Title: Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective
- Authors: Zixuan Pan, Jun Xia, Zheyu Yan, Guoyue Xu, Yifan Qin, Xueyang Li, Yawen Wu, Zhenge Jia, Jianxu Chen, Yiyu Shi
- Classification: eess.IV cs.CV
- Publication Date: August 2024 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2408.08228
This paper revisits the task of anomaly detection in brain MRI from the perspective of image quality assessment (IQA). Addressing the limitations of traditional ℓ1 loss in capturing subtle differences in reconstructed images, the authors propose a fusion quality metric that cleverly combines the structural-level sensitivity of the Structural Similarity Index (SSIM) with the pixel-level precision of ℓ1. This metric provides more comprehensive reconstruction quality assessment across three dimensions: intensity, contrast, and structural similarity. Furthermore, considering the inherent divisive nature of SSIM, a data transformation based on Average Intensity Ratio (AIR) is designed to amplify differences between normal and anomalous regions. Experimental results demonstrate that this IQA-based approach significantly improves medical anomaly detection performance.
Brain MRI anomaly detection (e.g., tumor identification) is an important task in medical image analysis. Traditional supervised learning methods require large amounts of annotated data, while obtaining precise annotations of medical images (such as tumor segmentation masks) is both difficult and expensive.
- Scarcity of Annotated Data: Medical image annotation requires professional expertise, is costly, and time-consuming
- Limitations of Existing Methods: Reconstruction-based anomaly detection methods primarily focus on architectural and algorithmic innovations while neglecting the importance of reconstruction quality assessment metrics
- Insufficient Evaluation Metrics: Traditional ℓ1 loss assumes pixel independence and ignores spatial relationships, making it difficult to capture subtle anomalies
As shown in Figure 1, even with identical reconstruction results, using SSIM to compute anomaly maps better identifies tumor regions compared to ℓ1 loss, which motivates the necessity of reconsidering anomaly detection from an IQA perspective.
- First IQA Perspective: Introduces image quality assessment into medical anomaly detection and proposes a fusion quality loss
- Novel Evaluation Metric: Combines the advantages of SSIM and ℓ1 loss to provide more comprehensive reconstruction quality assessment
- Data Augmentation Strategy: Designs AIR-based transformation to amplify differences between normal and anomalous regions
- Significant Performance Improvement: 15.86% DICE improvement on BraTS21 T2 and 21.41% improvement on MSLUB T2
- Good Generalization: The method is applicable to different modalities and different baseline models
Given a normal dataset Xn={xin∈Xn}i=1N, train a reconstruction model fθ(⋅):
minθN1∑i=1NLtrain(xin,x^in),x^in=fθ(xin′)
At test time, the anomaly score map is defined as:
Λj=Ltest(xja,x^ja),x^ja=fθ∗(xja′)
SSIM assesses three dimensions: luminance, contrast, and structure:
l(x,y)=μx2+μy2+C12μxμy+C1,c(x,y)=σx2+σy2+C22σxσy+C2s(x,y)=σxσy+C3σxy+C3
SSIM(x,y)=l(x,y)⋅c(x,y)⋅s(x,y)
Local SSIM loss:
LSSIM(x,x^)=21−K1∑k=1KSSIM(xk,x^k)
Combining the advantages of SSIM and ℓ1 loss:
LFQ=αLSSIM+(1−α)Lℓ1,α∈[0,1]
where α = 0.84, selected based on recommendations from prior research 21.
AIR(X)=(μXa+μXn)−∣μXa−μXn∣(μXa+μXn)+∣μXa−μXn∣
where μXa and μXn are the average pixel intensities of anomalous and normal regions, respectively.
Based on statistical analysis of four modalities in the BraTS dataset:
- 0<μXn<μXa<1 holds across all modalities
- μXn>0.5 in T1, FLAIR, and T1-CE
- μXa<0.5 in T2
The transformation function is designed as:
p(x)=x⋅I(μXn≤0.5)+(1−x)⋅I(0.5<μXn)
This transformation ensures AIR(Xˉ)≥AIR(X).
- Multi-dimensional Quality Assessment: Fuses pixel-level (ℓ1) and structure-level (SSIM) information
- Adaptive Weighting Mechanism: SSIM's divisive nature makes structural relationships more important
- Data-Driven Preprocessing: Transformation strategy designed based on dataset statistical properties
- End-to-End Optimization: Unified use of fusion quality loss in both training and inference phases
- BraTS21: 1,251 brain tumor MRI scans with four modalities (T1, T1-CE, T2, FLAIR)
- MSLUB: 30 multiple sclerosis patients' T1, T2, FLAIR scans
- IXI: 560 healthy brain T1-T2 scan pairs
- Cross-dataset Setting: Training on IXI healthy data, testing on BraTS21 and MSLUB
- In-dataset Setting: Five-fold cross-validation on BraTS21 FLAIR and T1-CE
- Preprocessing: Resampling, skull stripping, registration
- DICE Coefficient: Measures segmentation accuracy
- AUPRC: Area under the Precision-Recall curve
Nine baseline methods including Thresh, AE, VAE, SVAE, DAE, f-AnoGAN, DDPM, mDDPM, pDDPM, etc.
- Optimizer: Adam, learning rate 1e-4, batch size 32
- Training epochs: 1,600
- Noise levels: 500 for BraTS21 (T2), 750 for others
- Post-processing: Median filtering (kernel size 5) + brain mask erosion (3 iterations)
Results on T2 modality in cross-dataset setting:
| Method | BraTS21 (T2) | | MSLUB (T2) | |
|---|
| DICE % | AUPRC % | DICE % | AUPRC % |
| pDDPM | 49.41±0.66 | 54.76±0.83 | 10.65±1.05 | 10.37±0.51 |
| pDDPM-IQA | 59.45±0.37 | 62.99±0.37 | 12.93±0.67 | 11.51±0.50 |
| Relative Improvement | +20.32% | +15.03% | +21.41% | +10.99% |
pDDPM-IQA achieves significant improvements across multiple modalities including BraTS T1, MSLUB T1, BraTS FLAIR, and T1-CE (p < 0.05).
- LFQ Only: Shows significant improvement over baseline
- LFQ + AIR: Further performance enhancement
- Both components work synergistically for optimal results
Applying the IQA method to the DDPM baseline (DDPM-IQA) achieves consistent performance improvements across all tested datasets and modalities.
Sensitivity analysis of the α parameter shows that the method maintains robust performance even with suboptimal α = 0.84.
Figure 3 presents qualitative results, showing that pDDPM-IQA generates more precise anomaly maps with clearer tumor localization, sharper boundaries, and fewer false positives compared to other methods.
- Autoencoder Methods: AE, VAE suffer from reconstruction blur
- Improvement Strategies: Vector Quantized VAE, adversarial autoencoders, denoising autoencoders
- GAN Methods: AnoGAN, f-AnoGAN, but with stability issues
- Diffusion Models: anoDDPM, pDDPM, mDDPM and recent advances
- SSIM replacing ℓ2 loss in industrial defect detection
- Latent space SSIM loss
- Ensemble SSIM methods
First to combine SSIM with ℓ1 loss throughout the entire training and inference process in medical anomaly detection.
- IQA Perspective is Effective: Significantly improves anomaly detection performance from an image quality assessment perspective
- Fusion Strategy is Superior: Fusion quality loss combining SSIM and ℓ1 outperforms single metrics
- Data Transformation is Important: AIR-based transformation effectively amplifies differences between normal and anomalous regions
- Broad Applicability: The method is effective across multiple modalities and baselines
- Fixed Parameters: α = 0.84 is not optimized for different settings
- Transformation Specificity: AIR transformation is designed based on specific dataset statistics
- Computational Complexity: SSIM computation adds computational overhead
- Insufficient Theoretical Analysis: Lacks theoretical convergence analysis of fusion quality loss
- Novel Metrics Exploration: Research better anomaly capture metrics than current fusion quality loss
- Adaptive Weighting: Design mechanisms for dynamically adjusting α
- Theoretical Analysis: Provide theoretical guarantees for fusion loss
- Extended Applications: Generalize to other medical imaging tasks
- Novel Perspective: First systematic study of medical anomaly detection from an IQA perspective
- Simple and Effective Method: Well-designed fusion quality loss with straightforward implementation
- Comprehensive Experiments: Full validation across multiple datasets, modalities, and baselines
- Significant Performance Gains: Relative improvements exceeding 15-20% with practical value
- Good Generalization: Applicable to different architectures and modalities
- Weak Theoretical Foundation: Lacks in-depth theoretical analysis of why SSIM + ℓ1 combination is effective
- Subjective Parameter Selection: α = 0.84 selection lacks sufficient verification
- Missing Computational Cost Analysis: No reported additional computational time overhead
- AIR Transformation Limitations: Transformation strategy overly dependent on specific dataset statistics
- Incomplete Comparisons: Lacks comparison with other IQA metrics (e.g., LPIPS)
- Academic Value: Opens new research direction in medical anomaly detection
- Practical Value: Significant performance improvements with clinical application potential
- Method Generality: Generalizable to other medical imaging tasks
- Reproducibility: Code implementation provided for easy reproduction and extension
- Medical Anomaly Detection: Brain tumors, multiple sclerosis, and other disease detection
- Unsupervised Learning: Medical imaging tasks with scarce annotated data
- Quality Assessment: Medical image reconstruction quality evaluation
- Method Improvement: Performance enhancement of existing reconstruction-based methods
The paper cites 42 relevant references covering deep learning, medical image analysis, anomaly detection, image quality assessment, and other important works across multiple domains, providing a solid theoretical foundation for the research.
Overall Assessment: This is an innovative and practically valuable work in the field of medical anomaly detection. By introducing an IQA perspective and cleverly combining SSIM and ℓ1 loss, it achieves significant performance improvements across multiple datasets. Although there are certain limitations in theoretical analysis and parameter selection, its pioneering research approach and strong experimental results make it an important contribution to the field.