2025-11-24T19:34:16.534360

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

Pan, Xia, Yan et al.

Reconstruction-based methods, particularly those leveraging autoencoders, have been widely adopted for anomaly detection task in brain MRI. Unlike most existing works try to improve the task accuracy through architectural or algorithmic innovations, we tackle this task from image quality assessment (IQA) perspective, an under-explored direction in the field. Due to the limitations of conventional metrics such as l1 in capturing the nuanced differences in reconstructed images for medical anomaly detection, we propose fusion quality, a novel metric that wisely integrates the structure-level sensitivity of Structural Similarity Index Measure (SSIM) with the pixel-level precision of l1. The metric offers a more comprehensive assessment of reconstruction quality, considering intensity (subtractive property of l1 and divisive property of SSIM), contrast, and structural similarity. Furthermore, the proposed metric makes subtle regional variations more impactful in the final assessment. Thus, considering the inherent divisive properties of SSIM, we design an average intensity ratio (AIR)-based data transformation that amplifies the divisive discrepancies between normal and abnormal regions, thereby enhancing anomaly detection. By fusing the aforementioned two components, we devise the IQA approach. Experimental results on two distinct brain MRI datasets show that our IQA approach significantly enhances medical anomaly detection performance when integrated with state-of-the-art baselines.

academic

Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective

基本信息

论文ID: 2408.08228
标题: Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective
作者: Zixuan Pan, Jun Xia, Zheyu Yan, Guoyue Xu, Yifan Qin, Xueyang Li, Yawen Wu, Zhenge Jia, Jianxu Chen, Yiyu Shi
分类: eess.IV cs.CV
发表时间: 2024年8月（arXiv预印本）
论文链接: https://arxiv.org/abs/2408.08228

摘要

本文从图像质量评估（IQA）的角度重新审视脑部MRI的异常检测任务。针对传统ℓ1损失在捕获重建图像细微差异方面的局限性，提出了融合质量（fusion quality）度量，巧妙地将结构相似性指数（SSIM）的结构级敏感性与ℓ1的像素级精度相结合。该度量从强度、对比度和结构相似性三个维度提供更全面的重建质量评估。此外，考虑到SSIM的内在除法特性，设计了基于平均强度比（AIR）的数据变换来放大正常和异常区域之间的差异。实验结果表明，该IQA方法显著提升了医学异常检测性能。

研究背景与动机

问题定义

脑部MRI异常检测（如肿瘤识别）是医学影像分析的重要任务。传统监督学习方法需要大量标注数据，而获取医学图像的精确标注（如肿瘤分割掩码）既困难又昂贵。

研究动机

标注数据稀缺：医学图像标注需要专业知识，成本高昂且耗时
现有方法局限：基于重建的异常检测方法主要关注架构和算法创新，忽视了重建质量评估指标的重要性
评估指标不足：传统ℓ1损失假设像素独立性，忽略空间关系，难以捕获细微异常

核心观察

如图1所示，即使使用相同的重建结果，采用SSIM计算异常图比使用ℓ1损失能更好地识别肿瘤区域，这启发了从IQA角度重新思考异常检测的必要性。

核心贡献

首次提出IQA视角：将图像质量评估引入医学异常检测，提出融合质量损失（fusion quality loss）
新颖的评估指标：结合SSIM和ℓ1损失的优势，提供更全面的重建质量评估
数据增强策略：设计AIR-based变换，放大正常和异常区域的差异
显著性能提升：在BraTS21 T2上DICE提升15.86%，在MSLUB T2上提升21.41%
良好泛化性：方法可应用于不同模态和不同基线模型

方法详解

任务定义

给定正常数据集 $X^n = \{x^n_i \in X^n\}^N_{i=1}$ ，训练重建模型 $f_θ(·)$ ： $\min_θ \frac{1}{N}\sum_{i=1}^N L_{train}(x^n_i, \hat{x}^n_i), \quad \hat{x}^n_i = f_θ(x^{n'}_i)$

测试时，异常分数图定义为： $Λ_j = L_{test}(x^a_j, \hat{x}^a_j), \quad \hat{x}^a_j = f^*_θ(x^{a'}_j)$

融合质量损失（Fusion Quality Loss）

SSIM损失设计

SSIM评估亮度、对比度和结构三个维度： $l(x,y) = \frac{2μ_xμ_y + C_1}{μ^2_x + μ^2_y + C_1}, \quad c(x,y) = \frac{2σ_xσ_y + C_2}{σ^2_x + σ^2_y + C_2}$ $s(x,y) = \frac{σ_{xy} + C_3}{σ_xσ_y + C_3}$

$SSIM(x,y) = l(x,y) · c(x,y) · s(x,y)$

局部SSIM损失： $L_{SSIM}(x, \hat{x}) = \frac{1-\frac{1}{K}\sum^K_{k=1}SSIM(x_k, \hat{x}_k)}{2}$

融合质量损失

结合SSIM和ℓ1损失的优势： $L_{FQ} = αL_{SSIM} + (1-α)L_{ℓ1}, \quad α ∈ [0,1]$

其中α = 0.84，该参数选择参考了先前研究21的建议。

平均强度比（AIR）数据变换

AIR定义

$AIR(X) = \frac{(μ^a_X + μ^n_X) + |μ^a_X - μ^n_X|}{(μ^a_X + μ^n_X) - |μ^a_X - μ^n_X|}$

其中 $μ^a_X$ 和 $μ^n_X$ 分别是异常和正常区域的平均像素强度。

变换策略

基于BraTS数据集四种模态的统计分析：

$0 < μ^n_X < μ^a_X < 1$ 在所有模态中成立
T1、FLAIR和T1-CE中 $μ^n_X > 0.5$
T2中 $μ^a_X < 0.5$

设计变换函数： $p(x) = x · I(μ^n_X ≤ 0.5) + (1-x) · I(0.5 < μ^n_X)$

该变换确保 $AIR(\bar{X}) ≥ AIR(X)$ 。

技术创新点

多维度质量评估：融合像素级（ℓ1）和结构级（SSIM）信息
自适应权重机制：SSIM的除法特性使结构关系更加重要
数据驱动的预处理：基于数据集统计特性设计变换策略
端到端优化：训练和推理阶段统一使用融合质量损失

实验设置

数据集

BraTS21：1251个脑肿瘤MRI扫描，包含T1、T1-CE、T2、FLAIR四种模态
MSLUB：30个多发性硬化症患者的T1、T2、FLAIR扫描
IXI：560个健康脑部的T1-T2扫描对

实验设置

跨数据集设置：在IXI健康数据上训练，在BraTS21和MSLUB上测试
数据集内设置：在BraTS21的FLAIR和T1-CE上进行五折交叉验证
预处理：重采样、颅骨剥离、配准

评价指标

DICE系数：衡量分割准确性
AUPRC：精确率-召回率曲线下面积

对比方法

Thresh、AE、VAE、SVAE、DAE、f-AnoGAN、DDPM、mDDPM、pDDPM等9种基线方法

实现细节

优化器：Adam，学习率1e-4，批大小32
训练轮数：1600轮
噪声级别：BraTS21(T2)为500，其他为750
后处理：中值滤波（核大小5）+ 脑掩膜腐蚀（3次迭代）

实验结果

主要结果

在跨数据集设置下的T2模态结果：

方法	BraTS21 (T2)		MSLUB (T2)
	DICE %	AUPRC %	DICE %	AUPRC %
pDDPM	49.41±0.66	54.76±0.83	10.65±1.05	10.37±0.51
pDDPM-IQA	59.45±0.37	62.99±0.37	12.93±0.67	11.51±0.50
相对提升	+20.32%	+15.03%	+21.41%	+10.99%