2025-11-24T19:34:16.534360

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

Pan, Xia, Yan et al.
Reconstruction-based methods, particularly those leveraging autoencoders, have been widely adopted for anomaly detection task in brain MRI. Unlike most existing works try to improve the task accuracy through architectural or algorithmic innovations, we tackle this task from image quality assessment (IQA) perspective, an under-explored direction in the field. Due to the limitations of conventional metrics such as l1 in capturing the nuanced differences in reconstructed images for medical anomaly detection, we propose fusion quality, a novel metric that wisely integrates the structure-level sensitivity of Structural Similarity Index Measure (SSIM) with the pixel-level precision of l1. The metric offers a more comprehensive assessment of reconstruction quality, considering intensity (subtractive property of l1 and divisive property of SSIM), contrast, and structural similarity. Furthermore, the proposed metric makes subtle regional variations more impactful in the final assessment. Thus, considering the inherent divisive properties of SSIM, we design an average intensity ratio (AIR)-based data transformation that amplifies the divisive discrepancies between normal and abnormal regions, thereby enhancing anomaly detection. By fusing the aforementioned two components, we devise the IQA approach. Experimental results on two distinct brain MRI datasets show that our IQA approach significantly enhances medical anomaly detection performance when integrated with state-of-the-art baselines.
academic

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

Basic Information

  • Paper ID: 2408.08228
  • Title: Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective
  • Authors: Zixuan Pan, Jun Xia, Zheyu Yan, Guoyue Xu, Yifan Qin, Xueyang Li, Yawen Wu, Zhenge Jia, Jianxu Chen, Yiyu Shi
  • Classification: eess.IV cs.CV
  • Publication Date: August 2024 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2408.08228

Abstract

This paper revisits the task of anomaly detection in brain MRI from the perspective of image quality assessment (IQA). Addressing the limitations of traditional ℓ1 loss in capturing subtle differences in reconstructed images, the authors propose a fusion quality metric that cleverly combines the structural-level sensitivity of the Structural Similarity Index (SSIM) with the pixel-level precision of ℓ1. This metric provides more comprehensive reconstruction quality assessment across three dimensions: intensity, contrast, and structural similarity. Furthermore, considering the inherent divisive nature of SSIM, a data transformation based on Average Intensity Ratio (AIR) is designed to amplify differences between normal and anomalous regions. Experimental results demonstrate that this IQA-based approach significantly improves medical anomaly detection performance.

Research Background and Motivation

Problem Definition

Brain MRI anomaly detection (e.g., tumor identification) is an important task in medical image analysis. Traditional supervised learning methods require large amounts of annotated data, while obtaining precise annotations of medical images (such as tumor segmentation masks) is both difficult and expensive.

Research Motivation

  1. Scarcity of Annotated Data: Medical image annotation requires professional expertise, is costly, and time-consuming
  2. Limitations of Existing Methods: Reconstruction-based anomaly detection methods primarily focus on architectural and algorithmic innovations while neglecting the importance of reconstruction quality assessment metrics
  3. Insufficient Evaluation Metrics: Traditional ℓ1 loss assumes pixel independence and ignores spatial relationships, making it difficult to capture subtle anomalies

Core Observation

As shown in Figure 1, even with identical reconstruction results, using SSIM to compute anomaly maps better identifies tumor regions compared to ℓ1 loss, which motivates the necessity of reconsidering anomaly detection from an IQA perspective.

Core Contributions

  1. First IQA Perspective: Introduces image quality assessment into medical anomaly detection and proposes a fusion quality loss
  2. Novel Evaluation Metric: Combines the advantages of SSIM and ℓ1 loss to provide more comprehensive reconstruction quality assessment
  3. Data Augmentation Strategy: Designs AIR-based transformation to amplify differences between normal and anomalous regions
  4. Significant Performance Improvement: 15.86% DICE improvement on BraTS21 T2 and 21.41% improvement on MSLUB T2
  5. Good Generalization: The method is applicable to different modalities and different baseline models

Methodology Details

Task Definition

Given a normal dataset Xn={xinXn}i=1NX^n = \{x^n_i \in X^n\}^N_{i=1}, train a reconstruction model fθ()f_θ(·): minθ1Ni=1NLtrain(xin,x^in),x^in=fθ(xin)\min_θ \frac{1}{N}\sum_{i=1}^N L_{train}(x^n_i, \hat{x}^n_i), \quad \hat{x}^n_i = f_θ(x^{n'}_i)

At test time, the anomaly score map is defined as: Λj=Ltest(xja,x^ja),x^ja=fθ(xja)Λ_j = L_{test}(x^a_j, \hat{x}^a_j), \quad \hat{x}^a_j = f^*_θ(x^{a'}_j)

Fusion Quality Loss

SSIM Loss Design

SSIM assesses three dimensions: luminance, contrast, and structure: l(x,y)=2μxμy+C1μx2+μy2+C1,c(x,y)=2σxσy+C2σx2+σy2+C2l(x,y) = \frac{2μ_xμ_y + C_1}{μ^2_x + μ^2_y + C_1}, \quad c(x,y) = \frac{2σ_xσ_y + C_2}{σ^2_x + σ^2_y + C_2}s(x,y)=σxy+C3σxσy+C3s(x,y) = \frac{σ_{xy} + C_3}{σ_xσ_y + C_3}

SSIM(x,y)=l(x,y)c(x,y)s(x,y)SSIM(x,y) = l(x,y) · c(x,y) · s(x,y)

Local SSIM loss: LSSIM(x,x^)=11Kk=1KSSIM(xk,x^k)2L_{SSIM}(x, \hat{x}) = \frac{1-\frac{1}{K}\sum^K_{k=1}SSIM(x_k, \hat{x}_k)}{2}

Fusion Quality Loss

Combining the advantages of SSIM and ℓ1 loss: LFQ=αLSSIM+(1α)L1,α[0,1]L_{FQ} = αL_{SSIM} + (1-α)L_{ℓ1}, \quad α ∈ [0,1]

where α = 0.84, selected based on recommendations from prior research 21.

Average Intensity Ratio (AIR) Data Transformation

AIR Definition

AIR(X)=(μXa+μXn)+μXaμXn(μXa+μXn)μXaμXnAIR(X) = \frac{(μ^a_X + μ^n_X) + |μ^a_X - μ^n_X|}{(μ^a_X + μ^n_X) - |μ^a_X - μ^n_X|}

where μXaμ^a_X and μXnμ^n_X are the average pixel intensities of anomalous and normal regions, respectively.

Transformation Strategy

Based on statistical analysis of four modalities in the BraTS dataset:

  • 0<μXn<μXa<10 < μ^n_X < μ^a_X < 1 holds across all modalities
  • μXn>0.5μ^n_X > 0.5 in T1, FLAIR, and T1-CE
  • μXa<0.5μ^a_X < 0.5 in T2

The transformation function is designed as: p(x)=xI(μXn0.5)+(1x)I(0.5<μXn)p(x) = x · I(μ^n_X ≤ 0.5) + (1-x) · I(0.5 < μ^n_X)

This transformation ensures AIR(Xˉ)AIR(X)AIR(\bar{X}) ≥ AIR(X).

Technical Innovations

  1. Multi-dimensional Quality Assessment: Fuses pixel-level (ℓ1) and structure-level (SSIM) information
  2. Adaptive Weighting Mechanism: SSIM's divisive nature makes structural relationships more important
  3. Data-Driven Preprocessing: Transformation strategy designed based on dataset statistical properties
  4. End-to-End Optimization: Unified use of fusion quality loss in both training and inference phases

Experimental Setup

Datasets

  1. BraTS21: 1,251 brain tumor MRI scans with four modalities (T1, T1-CE, T2, FLAIR)
  2. MSLUB: 30 multiple sclerosis patients' T1, T2, FLAIR scans
  3. IXI: 560 healthy brain T1-T2 scan pairs

Experimental Configuration

  • Cross-dataset Setting: Training on IXI healthy data, testing on BraTS21 and MSLUB
  • In-dataset Setting: Five-fold cross-validation on BraTS21 FLAIR and T1-CE
  • Preprocessing: Resampling, skull stripping, registration

Evaluation Metrics

  • DICE Coefficient: Measures segmentation accuracy
  • AUPRC: Area under the Precision-Recall curve

Baseline Methods

Nine baseline methods including Thresh, AE, VAE, SVAE, DAE, f-AnoGAN, DDPM, mDDPM, pDDPM, etc.

Implementation Details

  • Optimizer: Adam, learning rate 1e-4, batch size 32
  • Training epochs: 1,600
  • Noise levels: 500 for BraTS21 (T2), 750 for others
  • Post-processing: Median filtering (kernel size 5) + brain mask erosion (3 iterations)

Experimental Results

Main Results

Results on T2 modality in cross-dataset setting:

MethodBraTS21 (T2)MSLUB (T2)
DICE %AUPRC %DICE %AUPRC %
pDDPM49.41±0.6654.76±0.8310.65±1.0510.37±0.51
pDDPM-IQA59.45±0.3762.99±0.3712.93±0.6711.51±0.50
Relative Improvement+20.32%+15.03%+21.41%+10.99%

Ablation Studies

Multi-modality Performance Verification

pDDPM-IQA achieves significant improvements across multiple modalities including BraTS T1, MSLUB T1, BraTS FLAIR, and T1-CE (p < 0.05).

Component Contribution Analysis

  • LFQ Only: Shows significant improvement over baseline
  • LFQ + AIR: Further performance enhancement
  • Both components work synergistically for optimal results

Generalization Verification

Applying the IQA method to the DDPM baseline (DDPM-IQA) achieves consistent performance improvements across all tested datasets and modalities.

Parameter Sensitivity

Sensitivity analysis of the α parameter shows that the method maintains robust performance even with suboptimal α = 0.84.

Case Analysis

Figure 3 presents qualitative results, showing that pDDPM-IQA generates more precise anomaly maps with clearer tumor localization, sharper boundaries, and fewer false positives compared to other methods.

Reconstruction-Based Anomaly Detection

  1. Autoencoder Methods: AE, VAE suffer from reconstruction blur
  2. Improvement Strategies: Vector Quantized VAE, adversarial autoencoders, denoising autoencoders
  3. GAN Methods: AnoGAN, f-AnoGAN, but with stability issues
  4. Diffusion Models: anoDDPM, pDDPM, mDDPM and recent advances

Evaluation Metrics Research

  • SSIM replacing ℓ2 loss in industrial defect detection
  • Latent space SSIM loss
  • Ensemble SSIM methods

Novel Contribution of This Work

First to combine SSIM with ℓ1 loss throughout the entire training and inference process in medical anomaly detection.

Conclusions and Discussion

Main Conclusions

  1. IQA Perspective is Effective: Significantly improves anomaly detection performance from an image quality assessment perspective
  2. Fusion Strategy is Superior: Fusion quality loss combining SSIM and ℓ1 outperforms single metrics
  3. Data Transformation is Important: AIR-based transformation effectively amplifies differences between normal and anomalous regions
  4. Broad Applicability: The method is effective across multiple modalities and baselines

Limitations

  1. Fixed Parameters: α = 0.84 is not optimized for different settings
  2. Transformation Specificity: AIR transformation is designed based on specific dataset statistics
  3. Computational Complexity: SSIM computation adds computational overhead
  4. Insufficient Theoretical Analysis: Lacks theoretical convergence analysis of fusion quality loss

Future Directions

  1. Novel Metrics Exploration: Research better anomaly capture metrics than current fusion quality loss
  2. Adaptive Weighting: Design mechanisms for dynamically adjusting α
  3. Theoretical Analysis: Provide theoretical guarantees for fusion loss
  4. Extended Applications: Generalize to other medical imaging tasks

In-Depth Evaluation

Strengths

  1. Novel Perspective: First systematic study of medical anomaly detection from an IQA perspective
  2. Simple and Effective Method: Well-designed fusion quality loss with straightforward implementation
  3. Comprehensive Experiments: Full validation across multiple datasets, modalities, and baselines
  4. Significant Performance Gains: Relative improvements exceeding 15-20% with practical value
  5. Good Generalization: Applicable to different architectures and modalities

Weaknesses

  1. Weak Theoretical Foundation: Lacks in-depth theoretical analysis of why SSIM + ℓ1 combination is effective
  2. Subjective Parameter Selection: α = 0.84 selection lacks sufficient verification
  3. Missing Computational Cost Analysis: No reported additional computational time overhead
  4. AIR Transformation Limitations: Transformation strategy overly dependent on specific dataset statistics
  5. Incomplete Comparisons: Lacks comparison with other IQA metrics (e.g., LPIPS)

Impact

  1. Academic Value: Opens new research direction in medical anomaly detection
  2. Practical Value: Significant performance improvements with clinical application potential
  3. Method Generality: Generalizable to other medical imaging tasks
  4. Reproducibility: Code implementation provided for easy reproduction and extension

Applicable Scenarios

  1. Medical Anomaly Detection: Brain tumors, multiple sclerosis, and other disease detection
  2. Unsupervised Learning: Medical imaging tasks with scarce annotated data
  3. Quality Assessment: Medical image reconstruction quality evaluation
  4. Method Improvement: Performance enhancement of existing reconstruction-based methods

References

The paper cites 42 relevant references covering deep learning, medical image analysis, anomaly detection, image quality assessment, and other important works across multiple domains, providing a solid theoretical foundation for the research.


Overall Assessment: This is an innovative and practically valuable work in the field of medical anomaly detection. By introducing an IQA perspective and cleverly combining SSIM and ℓ1 loss, it achieves significant performance improvements across multiple datasets. Although there are certain limitations in theoretical analysis and parameter selection, its pioneering research approach and strong experimental results make it an important contribution to the field.