MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation
Luo, Xu, Huang et al.
Multi-modal brain tumor segmentation is critical for clinical diagnosis, and it requires accurate identification of distinct internal anatomical subregions. While the recent prompt-based segmentation paradigms enable interactive experiences for clinicians, existing methods ignore cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in real-world scenarios. To address these issues, we propose a MSM-Seg framework for multi-modal brain tumor segmentation. The MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates multi-modal and inter-slice information with the efficient category-agnostic prompt for brain tumor understanding. To this end, we first devise a modality-and-slice memory attention (MSMA) to exploit the cross-modal and inter-slice relationships among the input scans. Then, we propose a multi-scale category-agnostic prompt encoder (MCP-Encoder) to provide tumor region guidance for decoding. Moreover, we devise a modality-adaptive fusion decoder (MF-Decoder) that leverages the complementary decoding information across different modalities to improve segmentation accuracy. Extensive experiments on different MRI datasets demonstrate that our MSM-Seg framework outperforms state-of-the-art methods in multi-modal metastases and glioma tumor segmentation. The code is available at https://github.com/xq141839/MSM-Seg.
academic
MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation
Multi-modal brain tumor segmentation is critical for clinical diagnosis, requiring accurate identification of different internal anatomical sub-regions. Although recent prompt-based segmentation paradigms provide interactive experiences for clinicians, existing methods overlook cross-modal correlations and rely on labor-intensive category-specific prompts, limiting their applicability in practical scenarios. To address these issues, this paper proposes the MSM-Seg framework for multi-modal brain tumor segmentation. MSM-Seg introduces a novel dual-memory segmentation paradigm that synergistically integrates cross-modal and inter-slice information with efficient category-agnostic prompting for brain tumor understanding.
Complexity of Multi-Modal Brain Tumor Segmentation: Requires simultaneous identification of heterogeneous tumor components, including contrast-enhanced core, necrotic regions, and peritumoral edema, each providing different clinical biomarkers for tumor grading and treatment planning.
Limitations of Existing Methods:
Classical 3D multi-modal segmentation frameworks are constrained by computational inefficiency inherent to volumetric processing
Neglect the natural sequential relationships between adjacent slices
Methods like SAM2 rely on category-specific annotations as prompts, requiring labor-intensive manual annotation
Existing approaches typically process different MRI modalities independently or through simple prior connections, failing to fully exploit rich complementary information across modalities
Different MRI modalities exhibit strong complementary relationships: FLAIR sequences excel at displaying peritumoral edema and high-signal lesions, while T1c sequences provide contrast-enhanced visualization of active tumor regions and blood-brain barrier disruption. This complementary relationship motivates the development of a unified framework capable of effectively capturing cross-modal relationships and spatial continuity.
Proposes a Dual-Memory Segmentation Paradigm: Leverages cross-modal and inter-slice relationships in input scans to achieve comprehensive understanding of tumor sub-regions
Designs Modality-and-Slice Memory Attention Mechanism (MSMA): Efficiently utilizes cross-modal and inter-slice relationships to enhance multi-modal feature representation
Develops Multi-Scale Category-Agnostic Prompt Encoder (MCP-Encoder): Provides tumor region guidance and designs a Modality-Adaptive Fusion Decoder (MF-Decoder)
Achieves Significant Performance Improvements: Surpasses existing state-of-the-art segmentation methods on glioma and metastatic tumor datasets
Given multi-modal MRI scans {X_{t,m}}, where t ∈ {1,...,T} denotes slice index and m ∈ {1,...,M} denotes modality index, the objective is to generate accurate brain tumor segmentation masks identifying three hierarchical regions: Enhanced Tumor (ET), Tumor Core (TC), and Whole Tumor (WT).
The core idea is to establish progressive memory integration that gradually refines understanding of the entire tumor structure. Given input slice X_{t,m}, the model maintains latent state S_{t,m} ∈ R^{C×H×W} with update rule:
For each modality m at slice t, receives memory-enhanced embeddings Z_{t,m} and corresponding tumor guidance P_{t,m}. Prompt embeddings are fused through element-wise addition:
H_{t,m} = Z_{t,m} ⊕ P_{t,m}
Generates modality-specific predictions:
Ŷ_{t,m} = P_pd(H_{t,m}) ⊗ P_mlp(E_{t,m})
Final segmentation mask is obtained through adaptive weighting strategy:
BraTS-METS: Brain metastatic tumor segmentation dataset containing 652 multi-contrast MRI examinations covering four modalities: T1, T1c, T2, and FLAIR
BraTS-AGPT: Adult post-treatment glioma segmentation dataset containing 1,349 cases, focusing on segmentation of residual or recurrent gliomas following therapeutic intervention
Early research adopted U-shaped encoder-decoder frameworks with 3D CNNs. Recent methods integrate 3D CNNs with Vision Transformers to capture local spatial patterns and global contextual information. Current research explores replacing ViT with Vision Mamba and RWKV to model long-range dependencies with linear computational complexity.
Memory mechanisms are widely applied in video object segmentation tasks. SAM2 introduces complex memory buffers and memory attention mechanisms to enhance prediction consistency across sequential slices in volumetric scans. Subsequent works such as ReSurgSAM2 and Medical SAM2 optimize memory buffer storage and similarity metrics.
MSM-Seg effectively integrates cross-modal and inter-slice information through a dual-memory segmentation paradigm, combined with category-agnostic prompt design, achieving significant performance improvements in multi-modal brain tumor segmentation tasks and providing an efficient and practical solution for clinical applications.
This work provides a new technical paradigm for multi-modal medical image segmentation. The dual-memory mechanism and category-agnostic prompt design have broad application potential and are expected to have significant impact on the medical image analysis field.
The paper cites 45 relevant references covering key works in multi-modal segmentation, Vision Transformers, SAM series methods, and other critical domains, providing a solid theoretical foundation for this research.