2025-11-22T18:28:15.174123

Federated Dropout: Convergence Analysis and Resource Allocation

Xie, Wen, Liu et al.

Federated Dropout is an efficient technique to overcome both communication and computation bottlenecks for deploying federated learning at the network edge. In each training round, an edge device only needs to update and transmit a sub-model, which is generated by the typical method of dropout in deep learning, and thus effectively reduces the per-round latency. \textcolor{blue}{However, the theoretical convergence analysis for Federated Dropout is still lacking in the literature, particularly regarding the quantitative influence of dropout rate on convergence}. To address this issue, by using the Taylor expansion method, we mathematically show that the gradient variance increases with a scaling factor of $Î³/(1-Î³)$, with $Î³\in [0, Î¸)$ denoting the dropout rate and $Î¸$ being the maximum dropout rate ensuring the loss function reduction. Based on the above approximation, we provide the convergence analysis for Federated Dropout. Specifically, it is shown that a larger dropout rate of each device leads to a slower convergence rate. This provides a theoretical foundation for reducing the convergence latency by making a tradeoff between the per-round latency and the overall rounds till convergence. Moreover, a low-complexity algorithm is proposed to jointly optimize the dropout rate and the bandwidth allocation for minimizing the loss function in all rounds under a given per-round latency and limited network resources. Finally, numerical results are provided to verify the effectiveness of the proposed algorithm.

academic

Federated Dropout: Convergence Analysis and Resource Allocation

基本信息

论文ID: 2501.00379
标题: Federated Dropout: Convergence Analysis and Resource Allocation
作者: Sijing Xie, Dingzhu Wen, Xiaonan Liu, Changsheng You, Tharmalingam Ratnarajah, Kaibin Huang
分类: cs.LG cs.IT math.IT
发表时间: 2024年12月31日
论文链接: https://arxiv.org/abs/2501.00379

摘要

联邦Dropout是一种有效技术，可以克服在网络边缘部署联邦学习时的通信和计算瓶颈。在每轮训练中，边缘设备只需要更新和传输一个子模型，该子模型通过深度学习中的典型dropout方法生成，从而有效减少了每轮延迟。然而，文献中仍缺乏对联邦Dropout的理论收敛分析，特别是关于dropout率对收敛的定量影响。为解决这一问题，本文使用泰勒展开方法，数学证明了梯度方差以γ/(1-γ)的比例因子增长，其中γ∈[0,θ)表示dropout率，θ是确保损失函数减少的最大dropout率。基于此近似，本文提供了联邦Dropout的收敛分析，表明每个设备的dropout率越大，收敛速度越慢。这为通过在每轮延迟和收敛总轮数之间进行权衡来减少收敛延迟提供了理论基础。

研究背景与动机

问题背景

边缘AI的需求激增：移动数据爆炸推动了网络边缘AI部署，联邦边缘学习(FEEL)成为实现边缘AI的有前途技术
计算资源限制：边缘设备面临严重的计算资源限制，而现代深度神经网络(DNNs)和大语言模型(LLMs)需要大量计算能力
现有方法的局限性：
- 通信高效方法（梯度压缩、设备调度等）主要解决通信瓶颈
- 模型剪枝方法在训练早期仍有大量通信开销，且通常降低模型表示能力
- 缺乏对计算开销的本质性减少

研究动机

理论空白：FedDrop框架虽然实用，但缺乏严格的理论收敛分析
优化需求：需要理论指导来优化dropout率和资源分配的联合设计
实际应用：为资源受限环境中的联邦学习提供理论基础和实用算法

核心贡献

收敛理论分析：
- 使用泰勒展开证明子网梯度向量是原始DNN梯度向量的方差有界估计
- 数学证明梯度方差与γ/(1-γ)成正比
- 建立了dropout率与收敛速度的定量关系
每轮损失函数最小化：
- 基于理论分析，刻画了任意轮次的学习损失减少
- 在系统带宽、任务完成延迟和设备能量预算约束下，最大化学习损失减少
联合优化算法：
- 提出自适应dropout率和带宽分配的联合设计
- 通过KKT条件获得闭式解
- 算法复杂度仅为O(K²)
性能评估：
- 在欠拟合和过拟合两种场景下进行数值实验
- 验证了理论分析的正确性

方法详解

任务定义

输入：K个边缘设备，每个设备k持有本地数据集Dk 目标：最小化全局损失函数： $F(w) = \sum_{k=1}^K \frac{|D_k|}{|D|} f_k(\hat{w}_k; D_k)$ 其中 $\hat{w}_k$ 是设备k对应的dropout生成子网， $f_k$ 是设备k的本地损失函数。

模型架构

1. 联邦Dropout框架

FedDrop框架包含五个步骤：

生成阶段：服务器为每个设备生成子网
推送阶段：设备下载对应子网
计算阶段：设备基于本地数据更新子网
拉取阶段：设备上传更新后的子网
聚合阶段：服务器聚合所有子网更新全局模型

2. Dropout机制

对于dropout率为γk的设备k，子网定义为： $\hat{w}_k = w \circ m_k$ 其中dropout掩码mk的第j个元素为： $m_{k,j} = \begin{cases} \frac{1}{1-\gamma_k}, & \text{概率为} (1-\gamma_k) \\ 0, & \text{概率为} \gamma_k \end{cases}$

3. 延迟和能耗模型

每轮总延迟： $T_{k,t} = T^{com,dl}_{k,t} + T^{cmp}_{k,t} + T^{com,ul}_{k,t}$

总能耗： $E_{k,t} = E^{com,ul}_{k,t} + E^{cmp}_{k,t} + \xi_k$

技术创新点

1. 梯度方差界定理

引理1：在假设条件下，子网梯度向量是方差有界的估计： $E_{m_k^{(t)}}[\hat{g}_k(\hat{w}_k^{(t)})] = \tilde{g}_k(w^{(t)})$ $D_{m_k^{(t)}}[\hat{g}_k(\hat{w}_k^{(t)})] \leq (AG)^2 \cdot \frac{\gamma_{k,t}}{1-\gamma_{k,t}}$