2025-11-16T12:19:12.111003

Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity

Santi, Salami, Calderara

Nowdays, there are an abundance of portable devices capable of collecting large amounts of data and with decent computational power. This opened the possibility to train AI models in a distributed manner, preserving the participating clients' privacy. However, because of privacy regulations and safety requirements, elimination upon necessity of a client contribution to the model has become mandatory. The cleansing process must satisfy specific efficacy and time requirements. In recent years, research efforts have produced several knowledge removal methods, but these require multiple communication rounds between the data holders and the process coordinator. This can cause the unavailability of an effective model up to the end of the removal process, which can result in a disservice to the system users. In this paper, we introduce an innovative solution based on Task Arithmetic and the Neural Tangent Kernel, to rapidly remove a client's influence from a model.

academic

Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity

Basic Information

Paper ID: 2510.13606
Title: Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity
Authors: Riccardo Santi, Riccardo Salami, Simone Calderara (University of Modena and Reggio Emilia, Italy)
Classification: cs.LG (Machine Learning)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13606v1

Abstract

With the advancement of computational capabilities and data collection capacity on portable devices, distributed AI model training has become feasible while protecting the privacy of participating clients. However, due to privacy regulations and security requirements, the elimination of client contributions to models has become a mandatory requirement when necessary. The cleaning process must satisfy specific efficiency and temporal requirements. Recent research has produced multiple knowledge removal methods, but these methods require multiple rounds of communication between data holders and process coordinators, which may result in model unavailability before the removal process concludes, causing service interruptions for system users. This paper proposes an innovative solution for rapidly removing client influence based on Task Arithmetic and Neural Tangent Kernel (NTK).

Research Background and Motivation

Problem Definition

The core problem addressed in this research is Federated Unlearning (FU): rapidly and effectively removing the contribution of specific clients to the global model in federated learning environments while maintaining model performance and privacy protection.

Problem Significance

Regulatory Compliance: Privacy regulations such as GDPR and CCPA require the capability of "right to be forgotten"
Security Requirements: Need to remove contributions from malicious or contaminated client data
Sensitive Domains: Patient data revocation requirements in healthcare and similar fields
Service Continuity: Traditional methods require multiple communication rounds, resulting in prolonged model unavailability

Limitations of Existing Methods

FedEraser and similar methods require multiple communication rounds to produce effective cleaning models
Models are unavailable during the unlearning process, causing service interruptions
Insufficient robustness in high data heterogeneity environments

Research Motivation

Propose a method capable of completing client unlearning within single-round communication, minimizing service interruption time while maintaining good performance in high data heterogeneity environments.

Core Contributions

Propose SATA Method: A novel federated unlearning method based on Task Arithmetic and Neural Tangent Kernel, enabling client unlearning within single-round communication
Innovative Dual Task Vector Mechanism: Each client maintains two independent task vectors, where the standalone task vector is specifically designed for unlearning operations
NTK-Enhanced Task Arithmetic: Leverages Neural Tangent Kernel training to improve task vector decoupling and reduce inter-task interference
Comprehensive Experimental Validation: Comparative evaluation with multiple baseline methods on Cars-196 and Resisc45 datasets, demonstrating method effectiveness

Methodology Details

Task Definition

Input:

Pre-trained model parameters θ₀
Local datasets of K clients {D₁, D₂, ..., Dₖ}
Target client for unlearning tgt

Output:

Cleaned global model θ̂clean with target client influence removed
Preserved model performance on non-target client data

Constraints:

Complete unlearning within single-round communication
Protect client privacy
Maintain model performance on non-target client data

Model Architecture

1. Dual Task Vector Mechanism

Each client k maintains two independent task vectors:

Primary Task Vector τₖ: Participates in distributed training process and contributes to global model computation
Standalone Task Vector τₖˢᵃ: Remains isolated, uncontaminated by other client information, specifically designed for future unlearning operations

2. Task Arithmetic Framework

Based on task arithmetic theory, task vector τₜ = θₜ - θ₀ represents parameter changes after fine-tuning on a specific task. Combining multiple task vectors:

θnew = θ₀ + ∑ᵢ₌₁ᵀ λᵢτᵢ

where λᵢ are scalar weight coefficients.

3. Unlearning Operation

When unlearning target client tgt is required, simply subtract its standalone task vector from the global model:

θ̂clean = θ̂ - λtgt τₜₒₜˢᵃ

4. NTK Enhancement

Leverages the property of Neural Tangent Kernel that linearizes neural network learning dynamics in the infinite-width limit:

flin(x; θ) = f(x; θ₀) + (θ - θ₀)ᵀ∇θf(x; θ₀)

Training in the NTK regime improves task vector decoupling, with the final model expressible as:

flin(x; θᵣ₋₁ + ∑ₖ₌₁ᴷ λₖτₖ - λtgt τₜₒₜˢᵃ) = f(x; θᵣ₋₁) + (∑ₖ₌₁ᴷ λₖτₖ - λtgt τₜₒₜˢᵃ)ᵀ∇θf(x; θᵣ₋₁)

Technical Innovations

Single-Round Unlearning: Unlike traditional methods requiring multiple communication rounds, SATA completes unlearning within a single round
Standalone Task Vector Design: Avoids retraining requirements by maintaining independent task vectors
NTK Enhancement: Improves decoupling between task vectors, reducing the impact of unlearning operations on other client contributions
Theoretical Foundation: Provides solid theoretical basis based on task arithmetic with interpretable unlearning mechanisms

Experimental Setup

Datasets

Cars-196: Vehicle image dataset containing 196 classes corresponding to vehicle brands, models, and years
Resisc45: Remote sensing image dataset containing 45 classes

Both datasets employ non-IID partitioning using Dirichlet distribution, with parameter β controlling data skewness (smaller β indicates more skewed distribution).

Evaluation Metrics

Global Model Accuracy: Classification accuracy on test set
Target Client Unlearning Effect: Accuracy on target client test data (lower is better)
Target Client Training Data Unlearning: Accuracy on target client training data (lower is better)

Comparison Methods

Train From Scratch (TFS): Retraining from pre-training (upper bound baseline)
Continue to Train (CTT): Continue training excluding target client, leveraging catastrophic forgetting
FedEraser: Most well-known FU method reconstructing global model from historical client updates

Implementation Details

Model: ViT-B/16 based on OpenAI CLIP with frozen classification head
Optimizer: AdamW
Experimental Setup:
- Resisc45: 3 FL rounds + 3 FU rounds + extended PU rounds
- Cars-196: 10 FL rounds + 10 FU rounds + 5 PU rounds
Hyperparameters: λtgt and learning rate optimized through grid search

Experimental Results

Main Results

Unlearning Effect (Table 1)

SATA NTK significantly outperforms competing methods in target client test set accuracy across all settings:

Resisc45 Dataset:

β=0.05: 9.96% during FU phase vs FedEraser's 56.79%
β=0.1: 31.69% during FU phase vs FedEraser's 80.10%
β=0.5: 14.29% during FU phase vs FedEraser's 89.95%

Cars196 Dataset:

β=0.05: 1.48% during FU phase vs FedEraser's 56.04%
β=0.1: 6.36% during FU phase vs FedEraser's 58.32%
β=0.5: 0.27% during FU phase vs FedEraser's 69.93%

Global Model Performance (Table 2)

While SATA demonstrates superior unlearning effects, global model accuracy is slightly lower than other methods, particularly during FU phase:

Performance Degradation Analysis:

More pronounced performance degradation in high heterogeneity (low β) environments
Performance recovers to near other methods' levels after PU phase

Ablation Studies

NTK Effect Verification (Tables 3-4)

Comparing effects with and without NTK training:

SATA vs SATA NTK: NTK training consistently improves unlearning performance
SAFA vs SAFA NTK: SAFA (Stand Alone FedAvg) achieves higher global accuracy but slightly inferior unlearning effects

Different Unlearning Strategy Comparison

θ₀ + ∑ᵢ≠tgt λᵢτᵢˢᵃ: Using only standalone task vectors of remaining clients
θ̂ - λtgt τₜₒₜˢᵃ: Subtracting target client contribution from global model (SATA method)

Results demonstrate SATA method's superior unlearning effects.

Case Analysis

Visualization results in Figure 1 reveal:

SATA achieves lowest accuracy on target client
Although global accuracy decreases, rapid recovery occurs during PU phase
Higher β values (lower data heterogeneity) yield better method performance

Experimental Findings

Effectiveness of Single-Round Unlearning: SATA successfully achieves effective unlearning within single-round communication
Importance of NTK: NTK training significantly enhances task arithmetic effects
Impact of Data Heterogeneity: High heterogeneity environments present greater challenges for the method
Rapid Recovery Capability: PU phase enables quick model performance recovery

Federated Learning Algorithms

FedAvg: Fundamental parameter averaging aggregation method
FedProx: Introduces proximal term for heterogeneity handling
SCAFFOLD: Uses control variates to mitigate client drift
FedDC: Adjusts updates through drift estimation and correction

Machine Unlearning

Centralized Unlearning: Traditional machine unlearning methods unsuitable for federated settings
Federated Unlearning: FedEraser, FedRecover, FedRecovery and similar methods

Linear operation framework for pre-trained model editing
Theoretical foundations of NTK-enhanced task arithmetic

Conclusions and Discussion

Main Conclusions

Proposes the first effective method enabling federated unlearning within single-round communication
Theoretical framework based on task arithmetic and NTK provides good interpretability
Validates method effectiveness across multiple data heterogeneity settings
Significantly reduces service interruption time during unlearning process

Limitations

High Heterogeneity Challenge: Performance limited in high Dirichlet coefficient (low heterogeneity) environments
Global Performance Degradation: Global model accuracy decreases during unlearning process
Dual Vector Overhead: Requires maintaining additional standalone task vectors, increasing storage and computational costs
Hyperparameter Sensitivity: Parameters such as λtgt require careful tuning

Future Directions

Address performance limitations in high Dirichlet coefficient settings
Explore adaptability in other modalities and federated settings
Further optimize global model performance preservation
Investigate adaptive hyperparameter selection methods

In-Depth Evaluation

Strengths

Strong Innovation: First to achieve single-round federated unlearning, addressing critical practical problems
Solid Theoretical Foundation: Based on robust theoretical foundations of task arithmetic and NTK
High Practical Value: Significantly reduces service interruption time, improving system availability
Comprehensive Experiments: Thorough evaluation across multiple datasets and heterogeneity settings
Method Simplicity: Core concepts are intuitive and straightforward, facilitating understanding and implementation

Weaknesses

Performance Trade-off: Clear trade-off exists between unlearning effect and global performance
Heterogeneity Limitations: Suboptimal performance in certain heterogeneity settings
Resource Overhead: Dual task vector mechanism introduces additional storage and computational costs
Insufficient Theoretical Analysis: Lacks in-depth analysis of method convergence and theoretical guarantees

Impact

Academic Contribution: Provides new research direction for federated unlearning field
Practical Value: Addresses critical practical deployment issues with important application prospects
Technical Inspiration: Application of task arithmetic in federated learning offers valuable insights

Applicable Scenarios

Time-Sensitive Systems: Real-time services requiring rapid unlearning response
High-Frequency Unlearning Requirements: Dynamic environments frequently requiring client removal
Resource-Abundant Environments: Systems capable of bearing dual vector storage overhead
Low-to-Medium Heterogeneity Environments: Federated learning scenarios with relatively uniform data distribution

References

This paper cites 34 relevant references covering multiple related domains including federated learning, machine unlearning, and task arithmetic, providing comprehensive theoretical foundation and comparison baselines.

Overall Assessment: This is an important contribution to the federated unlearning field, proposing a single-round unlearning method that addresses critical practical problems. Despite certain limitations, its innovation and practical value make it a significant advance in this domain.