2025-11-24T19:34:16.534360

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

Pan, Xia, Yan et al.

Reconstruction-based methods, particularly those leveraging autoencoders, have been widely adopted for anomaly detection task in brain MRI. Unlike most existing works try to improve the task accuracy through architectural or algorithmic innovations, we tackle this task from image quality assessment (IQA) perspective, an under-explored direction in the field. Due to the limitations of conventional metrics such as l1 in capturing the nuanced differences in reconstructed images for medical anomaly detection, we propose fusion quality, a novel metric that wisely integrates the structure-level sensitivity of Structural Similarity Index Measure (SSIM) with the pixel-level precision of l1. The metric offers a more comprehensive assessment of reconstruction quality, considering intensity (subtractive property of l1 and divisive property of SSIM), contrast, and structural similarity. Furthermore, the proposed metric makes subtle regional variations more impactful in the final assessment. Thus, considering the inherent divisive properties of SSIM, we design an average intensity ratio (AIR)-based data transformation that amplifies the divisive discrepancies between normal and abnormal regions, thereby enhancing anomaly detection. By fusing the aforementioned two components, we devise the IQA approach. Experimental results on two distinct brain MRI datasets show that our IQA approach significantly enhances medical anomaly detection performance when integrated with state-of-the-art baselines.

academic

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

Basic Information

Paper ID: 2408.08228
Title: Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective
Authors: Zixuan Pan, Jun Xia, Zheyu Yan, Guoyue Xu, Yifan Qin, Xueyang Li, Yawen Wu, Zhenge Jia, Jianxu Chen, Yiyu Shi
Classification: eess.IV cs.CV
Publication Date: August 2024 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2408.08228

Abstract

This paper revisits the task of anomaly detection in brain MRI from the perspective of image quality assessment (IQA). Addressing the limitations of traditional ℓ1 loss in capturing subtle differences in reconstructed images, the authors propose a fusion quality metric that cleverly combines the structural-level sensitivity of the Structural Similarity Index (SSIM) with the pixel-level precision of ℓ1. This metric provides more comprehensive reconstruction quality assessment across three dimensions: intensity, contrast, and structural similarity. Furthermore, considering the inherent divisive nature of SSIM, a data transformation based on Average Intensity Ratio (AIR) is designed to amplify differences between normal and anomalous regions. Experimental results demonstrate that this IQA-based approach significantly improves medical anomaly detection performance.

Research Background and Motivation

Problem Definition

Brain MRI anomaly detection (e.g., tumor identification) is an important task in medical image analysis. Traditional supervised learning methods require large amounts of annotated data, while obtaining precise annotations of medical images (such as tumor segmentation masks) is both difficult and expensive.

Research Motivation

Scarcity of Annotated Data: Medical image annotation requires professional expertise, is costly, and time-consuming
Limitations of Existing Methods: Reconstruction-based anomaly detection methods primarily focus on architectural and algorithmic innovations while neglecting the importance of reconstruction quality assessment metrics
Insufficient Evaluation Metrics: Traditional ℓ1 loss assumes pixel independence and ignores spatial relationships, making it difficult to capture subtle anomalies

Core Observation

As shown in Figure 1, even with identical reconstruction results, using SSIM to compute anomaly maps better identifies tumor regions compared to ℓ1 loss, which motivates the necessity of reconsidering anomaly detection from an IQA perspective.

Core Contributions

First IQA Perspective: Introduces image quality assessment into medical anomaly detection and proposes a fusion quality loss
Novel Evaluation Metric: Combines the advantages of SSIM and ℓ1 loss to provide more comprehensive reconstruction quality assessment
Data Augmentation Strategy: Designs AIR-based transformation to amplify differences between normal and anomalous regions
Significant Performance Improvement: 15.86% DICE improvement on BraTS21 T2 and 21.41% improvement on MSLUB T2
Good Generalization: The method is applicable to different modalities and different baseline models

Methodology Details

Task Definition

Given a normal dataset $X^n = \{x^n_i \in X^n\}^N_{i=1}$ , train a reconstruction model $f_θ(·)$ : $\min_θ \frac{1}{N}\sum_{i=1}^N L_{train}(x^n_i, \hat{x}^n_i), \quad \hat{x}^n_i = f_θ(x^{n'}_i)$

At test time, the anomaly score map is defined as: $Λ_j = L_{test}(x^a_j, \hat{x}^a_j), \quad \hat{x}^a_j = f^*_θ(x^{a'}_j)$

Fusion Quality Loss

SSIM Loss Design

SSIM assesses three dimensions: luminance, contrast, and structure: $l(x,y) = \frac{2μ_xμ_y + C_1}{μ^2_x + μ^2_y + C_1}, \quad c(x,y) = \frac{2σ_xσ_y + C_2}{σ^2_x + σ^2_y + C_2}$ $s(x,y) = \frac{σ_{xy} + C_3}{σ_xσ_y + C_3}$

$SSIM(x,y) = l(x,y) · c(x,y) · s(x,y)$

Local SSIM loss: $L_{SSIM}(x, \hat{x}) = \frac{1-\frac{1}{K}\sum^K_{k=1}SSIM(x_k, \hat{x}_k)}{2}$

Fusion Quality Loss

Combining the advantages of SSIM and ℓ1 loss: $L_{FQ} = αL_{SSIM} + (1-α)L_{ℓ1}, \quad α ∈ [0,1]$

where α = 0.84, selected based on recommendations from prior research 21.

Average Intensity Ratio (AIR) Data Transformation

AIR Definition

$AIR(X) = \frac{(μ^a_X + μ^n_X) + |μ^a_X - μ^n_X|}{(μ^a_X + μ^n_X) - |μ^a_X - μ^n_X|}$

where $μ^a_X$ and $μ^n_X$ are the average pixel intensities of anomalous and normal regions, respectively.

Transformation Strategy

Based on statistical analysis of four modalities in the BraTS dataset:

$0 < μ^n_X < μ^a_X < 1$ holds across all modalities
$μ^n_X > 0.5$ in T1, FLAIR, and T1-CE
$μ^a_X < 0.5$ in T2

The transformation function is designed as: $p(x) = x · I(μ^n_X ≤ 0.5) + (1-x) · I(0.5 < μ^n_X)$

This transformation ensures $AIR(\bar{X}) ≥ AIR(X)$ .

Technical Innovations

Multi-dimensional Quality Assessment: Fuses pixel-level (ℓ1) and structure-level (SSIM) information
Adaptive Weighting Mechanism: SSIM's divisive nature makes structural relationships more important
Data-Driven Preprocessing: Transformation strategy designed based on dataset statistical properties
End-to-End Optimization: Unified use of fusion quality loss in both training and inference phases

Experimental Setup

Datasets

BraTS21: 1,251 brain tumor MRI scans with four modalities (T1, T1-CE, T2, FLAIR)
MSLUB: 30 multiple sclerosis patients' T1, T2, FLAIR scans
IXI: 560 healthy brain T1-T2 scan pairs

Experimental Configuration

Cross-dataset Setting: Training on IXI healthy data, testing on BraTS21 and MSLUB
In-dataset Setting: Five-fold cross-validation on BraTS21 FLAIR and T1-CE
Preprocessing: Resampling, skull stripping, registration

Evaluation Metrics

DICE Coefficient: Measures segmentation accuracy
AUPRC: Area under the Precision-Recall curve

Baseline Methods

Nine baseline methods including Thresh, AE, VAE, SVAE, DAE, f-AnoGAN, DDPM, mDDPM, pDDPM, etc.

Implementation Details

Optimizer: Adam, learning rate 1e-4, batch size 32
Training epochs: 1,600
Noise levels: 500 for BraTS21 (T2), 750 for others
Post-processing: Median filtering (kernel size 5) + brain mask erosion (3 iterations)

Experimental Results

Main Results

Results on T2 modality in cross-dataset setting:

Method	BraTS21 (T2)		MSLUB (T2)
	DICE %	AUPRC %	DICE %	AUPRC %
pDDPM	49.41±0.66	54.76±0.83	10.65±1.05	10.37±0.51
pDDPM-IQA	59.45±0.37	62.99±0.37	12.93±0.67	11.51±0.50
Relative Improvement	+20.32%	+15.03%	+21.41%	+10.99%

Ablation Studies

Multi-modality Performance Verification

pDDPM-IQA achieves significant improvements across multiple modalities including BraTS T1, MSLUB T1, BraTS FLAIR, and T1-CE (p < 0.05).

Component Contribution Analysis

LFQ Only: Shows significant improvement over baseline
LFQ + AIR: Further performance enhancement
Both components work synergistically for optimal results

Generalization Verification

Applying the IQA method to the DDPM baseline (DDPM-IQA) achieves consistent performance improvements across all tested datasets and modalities.

Parameter Sensitivity

Sensitivity analysis of the α parameter shows that the method maintains robust performance even with suboptimal α = 0.84.

Case Analysis

Figure 3 presents qualitative results, showing that pDDPM-IQA generates more precise anomaly maps with clearer tumor localization, sharper boundaries, and fewer false positives compared to other methods.

Reconstruction-Based Anomaly Detection

Autoencoder Methods: AE, VAE suffer from reconstruction blur
Improvement Strategies: Vector Quantized VAE, adversarial autoencoders, denoising autoencoders
GAN Methods: AnoGAN, f-AnoGAN, but with stability issues
Diffusion Models: anoDDPM, pDDPM, mDDPM and recent advances

Evaluation Metrics Research

SSIM replacing ℓ2 loss in industrial defect detection
Latent space SSIM loss
Ensemble SSIM methods

Novel Contribution of This Work

First to combine SSIM with ℓ1 loss throughout the entire training and inference process in medical anomaly detection.

Conclusions and Discussion

Main Conclusions

IQA Perspective is Effective: Significantly improves anomaly detection performance from an image quality assessment perspective
Fusion Strategy is Superior: Fusion quality loss combining SSIM and ℓ1 outperforms single metrics
Data Transformation is Important: AIR-based transformation effectively amplifies differences between normal and anomalous regions
Broad Applicability: The method is effective across multiple modalities and baselines

Limitations

Fixed Parameters: α = 0.84 is not optimized for different settings
Transformation Specificity: AIR transformation is designed based on specific dataset statistics
Computational Complexity: SSIM computation adds computational overhead
Insufficient Theoretical Analysis: Lacks theoretical convergence analysis of fusion quality loss

Future Directions

Novel Metrics Exploration: Research better anomaly capture metrics than current fusion quality loss
Adaptive Weighting: Design mechanisms for dynamically adjusting α
Theoretical Analysis: Provide theoretical guarantees for fusion loss
Extended Applications: Generalize to other medical imaging tasks

In-Depth Evaluation

Strengths

Novel Perspective: First systematic study of medical anomaly detection from an IQA perspective
Simple and Effective Method: Well-designed fusion quality loss with straightforward implementation
Comprehensive Experiments: Full validation across multiple datasets, modalities, and baselines
Significant Performance Gains: Relative improvements exceeding 15-20% with practical value
Good Generalization: Applicable to different architectures and modalities

Weaknesses

Weak Theoretical Foundation: Lacks in-depth theoretical analysis of why SSIM + ℓ1 combination is effective
Subjective Parameter Selection: α = 0.84 selection lacks sufficient verification
Missing Computational Cost Analysis: No reported additional computational time overhead
AIR Transformation Limitations: Transformation strategy overly dependent on specific dataset statistics
Incomplete Comparisons: Lacks comparison with other IQA metrics (e.g., LPIPS)

Impact

Academic Value: Opens new research direction in medical anomaly detection
Practical Value: Significant performance improvements with clinical application potential
Method Generality: Generalizable to other medical imaging tasks
Reproducibility: Code implementation provided for easy reproduction and extension

Applicable Scenarios

Medical Anomaly Detection: Brain tumors, multiple sclerosis, and other disease detection
Unsupervised Learning: Medical imaging tasks with scarce annotated data
Quality Assessment: Medical image reconstruction quality evaluation
Method Improvement: Performance enhancement of existing reconstruction-based methods

References

The paper cites 42 relevant references covering deep learning, medical image analysis, anomaly detection, image quality assessment, and other important works across multiple domains, providing a solid theoretical foundation for the research.

Overall Assessment: This is an innovative and practically valuable work in the field of medical anomaly detection. By introducing an IQA perspective and cleverly combining SSIM and ℓ1 loss, it achieves significant performance improvements across multiple datasets. Although there are certain limitations in theoretical analysis and parameter selection, its pioneering research approach and strong experimental results make it an important contribution to the field.