2025-11-20T12:19:22.539414

Deep Attention-guided Adaptive Subsampling

Shankaranarayana, Roy, Sudhakar et al.

Although deep neural networks have provided impressive gains in performance, these improvements often come at the cost of increased computational complexity and expense. In many cases, such as 3D volume or video classification tasks, not all slices or frames are necessary due to inherent redundancies. To address this issue, we propose a novel learnable subsampling framework that can be integrated into any neural network architecture. Subsampling, being a nondifferentiable operation, poses significant challenges for direct adaptation into deep learning models. While some works, have proposed solutions using the Gumbel-max trick to overcome the problem of non-differentiability, they fall short in a crucial aspect: they are only task-adaptive and not inputadaptive. Once the sampling mechanism is learned, it remains static and does not adjust to different inputs, making it unsuitable for real-world applications. To this end, we propose an attention-guided sampling module that adapts to inputs even during inference. This dynamic adaptation results in performance gains and reduces complexity in deep neural network models. We demonstrate the effectiveness of our method on 3D medical imaging datasets from MedMNIST3D as well as two ultrasound video datasets for classification tasks, one of them being a challenging in-house dataset collected under real-world clinical conditions.

academic

Deep Attention-guided Adaptive Subsampling

基本信息

论文ID: 2510.12376
标题: Deep Attention-guided Adaptive Subsampling
作者: Sharath M Shankaranarayana, Soumava Kumar Roy, Prasad Sudhakar, Chandan Aladahalli (GE Healthcare, Bangalore, India)
分类: cs.CV, cs.AI, cs.LG
发表时间: 2025年10月14日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.12376v1

摘要

尽管深度神经网络在性能上取得了显著提升，但这些改进往往以增加计算复杂性和成本为代价。在许多情况下，如3D体积或视频分类任务中，由于固有的冗余性，并非所有切片或帧都是必需的。为解决这一问题，作者提出了一个新颖的可学习子采样框架，可集成到任何神经网络架构中。该框架通过注意力引导的采样模块在推理过程中动态适应输入，实现了性能提升并降低了深度神经网络模型的复杂性。

研究背景与动机

核心问题

计算效率挑战：深度神经网络在处理高维数据（如视频和体积扫描）时面临巨大的计算成本
数据冗余性：在3D医学影像和视频数据中存在大量冗余信息，不是所有帧/切片都对最终任务有用
采样策略局限性：传统的均匀采样或手工启发式方法无法识别和优先处理最显著的信息

现有方法的不足

Deep Probabilistic Subsampling (DPS)：虽然有效，但学习的是固定的、与内容无关的策略
Active Deep Probabilistic Subsampling (ADPS)：虽然引入了实例级适应性，但仅基于已采样的组件进行条件化，没有直接利用输入特征本身
静态性问题：现有方法一旦学习完成，采样机制就保持静态，无法适应不同的输入

研究动机

针对现有方法的局限性，本文提出了既具有任务适应性又具有输入适应性的动态采样框架，能够在推理时根据具体输入调整采样策略。

核心贡献

新颖的即插即用神经采样模块：提出了用于3D体积和视频动态采样的模块，在推理时适应输入，实现任务和输入双重适应性
综合性能验证：在八个医学影像数据集上验证了框架的有效性，包括六个MedMNIST3D数据集、一个公开超声视频数据集和一个临床环境收集的专有数据集
端到端可训练框架：通过Gumbel-Softmax重参数化技巧确保离散样本选择的端到端可微性
可解释性：采样矩阵作为输出产生，使采样过程具有明确的控制性和可解释性

方法详解

任务定义

给定包含T帧的序列 $X \in \mathbb{R}^{B \times T \times C \times H \times W}$ ，目标是学习一个采样函数 $S_\theta$ ，选择k个帧的子集（其中 $k \ll T$ ）。

模型架构

1. 轻量级特征提取

特征提取模块包含多个并行路径来计算输入序列的丰富表示：

时间动态捕获：计算空间和通道维度上的帧间方差
解剖边界识别：应用Sobel和Laplacian核集合计算边缘幅值
特征聚合：将提取的特征连接形成综合特征表示 $F \in \mathbb{R}^{B \times T \times d}$

2. 多头注意力层

聚合的特征张量F通过多头注意力层处理以生成最终采样logits：

$s^h = \text{Softplus}(\text{MLP}^h(F))$

$A^{(:,j,:)}_h = a_{\text{base}} \odot s^{(:,j)}_h$

$A = \frac{1}{H} \sum_{h=1}^H A^h$

其中H是注意力头数， $s^h \in \mathbb{R}^{B \times k}$ 是头特定的尺度因子。

3. 可微Gumbel-Softmax采样

为实现端到端训练，采用Gumbel-Softmax技巧进行可微采样：

自适应温度缩放： $\tau = \tau_0 \cdot (0.5 + \sigma(\text{MLP}_{\text{temp}}(F)))$

采样过程： $G_{b,j,t} \sim \text{Gumbel}(0,1)$ $P_{\text{soft}} = \text{Softmax}_t\left(\frac{A + G}{\tau}\right)$

使用直通估计器(STE)确保可微性，最终得到采样矩阵 $P \in \mathbb{R}^{B \times k \times T}$ 。

技术创新点

动态输入适应：与DPS的静态策略不同，DAS能够根据输入内容动态调整采样策略
轻量级设计：相比ADPS的多阶段过程，DAS采用单次通过的轻量级模块
自适应温度机制：动态控制探索与利用之间的权衡
多模态特征融合：结合时间动态和空间结构信息

实验设置

数据集

MedMNIST3D：六个3D体积数据集(Organ, Nodule, Adrenal, Fracture, Vessel, Synapse)，涵盖多器官分割和病理检测任务
Breast Ultrasound Video (BUSV)：公开的乳腺超声视频数据集，用于乳腺病变检测的二分类基准
内部胃窦数据集：在真实医院环境中收集的专有临床超声视频数据集，包含五类胃内容物分类