2025-11-16T12:19:12.111003

Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity

Santi, Salami, Calderara
Nowdays, there are an abundance of portable devices capable of collecting large amounts of data and with decent computational power. This opened the possibility to train AI models in a distributed manner, preserving the participating clients' privacy. However, because of privacy regulations and safety requirements, elimination upon necessity of a client contribution to the model has become mandatory. The cleansing process must satisfy specific efficacy and time requirements. In recent years, research efforts have produced several knowledge removal methods, but these require multiple communication rounds between the data holders and the process coordinator. This can cause the unavailability of an effective model up to the end of the removal process, which can result in a disservice to the system users. In this paper, we introduce an innovative solution based on Task Arithmetic and the Neural Tangent Kernel, to rapidly remove a client's influence from a model.
academic

Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity

Basic Information

  • Paper ID: 2510.13606
  • Title: Towards Robust Knowledge Removal in Federated Learning with High Data Heterogeneity
  • Authors: Riccardo Santi, Riccardo Salami, Simone Calderara (University of Modena and Reggio Emilia, Italy)
  • Classification: cs.LG (Machine Learning)
  • Publication Date: October 15, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.13606v1

Abstract

With the advancement of computational capabilities and data collection capacity on portable devices, distributed AI model training has become feasible while protecting the privacy of participating clients. However, due to privacy regulations and security requirements, the elimination of client contributions to models has become a mandatory requirement when necessary. The cleaning process must satisfy specific efficiency and temporal requirements. Recent research has produced multiple knowledge removal methods, but these methods require multiple rounds of communication between data holders and process coordinators, which may result in model unavailability before the removal process concludes, causing service interruptions for system users. This paper proposes an innovative solution for rapidly removing client influence based on Task Arithmetic and Neural Tangent Kernel (NTK).

Research Background and Motivation

Problem Definition

The core problem addressed in this research is Federated Unlearning (FU): rapidly and effectively removing the contribution of specific clients to the global model in federated learning environments while maintaining model performance and privacy protection.

Problem Significance

  1. Regulatory Compliance: Privacy regulations such as GDPR and CCPA require the capability of "right to be forgotten"
  2. Security Requirements: Need to remove contributions from malicious or contaminated client data
  3. Sensitive Domains: Patient data revocation requirements in healthcare and similar fields
  4. Service Continuity: Traditional methods require multiple communication rounds, resulting in prolonged model unavailability

Limitations of Existing Methods

  • FedEraser and similar methods require multiple communication rounds to produce effective cleaning models
  • Models are unavailable during the unlearning process, causing service interruptions
  • Insufficient robustness in high data heterogeneity environments

Research Motivation

Propose a method capable of completing client unlearning within single-round communication, minimizing service interruption time while maintaining good performance in high data heterogeneity environments.

Core Contributions

  1. Propose SATA Method: A novel federated unlearning method based on Task Arithmetic and Neural Tangent Kernel, enabling client unlearning within single-round communication
  2. Innovative Dual Task Vector Mechanism: Each client maintains two independent task vectors, where the standalone task vector is specifically designed for unlearning operations
  3. NTK-Enhanced Task Arithmetic: Leverages Neural Tangent Kernel training to improve task vector decoupling and reduce inter-task interference
  4. Comprehensive Experimental Validation: Comparative evaluation with multiple baseline methods on Cars-196 and Resisc45 datasets, demonstrating method effectiveness

Methodology Details

Task Definition

Input:

  • Pre-trained model parameters θ₀
  • Local datasets of K clients {D₁, D₂, ..., Dₖ}
  • Target client for unlearning tgt

Output:

  • Cleaned global model θ̂clean with target client influence removed
  • Preserved model performance on non-target client data

Constraints:

  • Complete unlearning within single-round communication
  • Protect client privacy
  • Maintain model performance on non-target client data

Model Architecture

1. Dual Task Vector Mechanism

Each client k maintains two independent task vectors:

  • Primary Task Vector τₖ: Participates in distributed training process and contributes to global model computation
  • Standalone Task Vector τₖˢᵃ: Remains isolated, uncontaminated by other client information, specifically designed for future unlearning operations

2. Task Arithmetic Framework

Based on task arithmetic theory, task vector τₜ = θₜ - θ₀ represents parameter changes after fine-tuning on a specific task. Combining multiple task vectors:

θnew = θ₀ + ∑ᵢ₌₁ᵀ λᵢτᵢ

where λᵢ are scalar weight coefficients.

3. Unlearning Operation

When unlearning target client tgt is required, simply subtract its standalone task vector from the global model:

θ̂clean = θ̂ - λtgt τₜₒₜˢᵃ

4. NTK Enhancement

Leverages the property of Neural Tangent Kernel that linearizes neural network learning dynamics in the infinite-width limit:

flin(x; θ) = f(x; θ₀) + (θ - θ₀)ᵀ∇θf(x; θ₀)

Training in the NTK regime improves task vector decoupling, with the final model expressible as:

flin(x; θᵣ₋₁ + ∑ₖ₌₁ᴷ λₖτₖ - λtgt τₜₒₜˢᵃ) = f(x; θᵣ₋₁) + (∑ₖ₌₁ᴷ λₖτₖ - λtgt τₜₒₜˢᵃ)ᵀ∇θf(x; θᵣ₋₁)

Technical Innovations

  1. Single-Round Unlearning: Unlike traditional methods requiring multiple communication rounds, SATA completes unlearning within a single round
  2. Standalone Task Vector Design: Avoids retraining requirements by maintaining independent task vectors
  3. NTK Enhancement: Improves decoupling between task vectors, reducing the impact of unlearning operations on other client contributions
  4. Theoretical Foundation: Provides solid theoretical basis based on task arithmetic with interpretable unlearning mechanisms

Experimental Setup

Datasets

  1. Cars-196: Vehicle image dataset containing 196 classes corresponding to vehicle brands, models, and years
  2. Resisc45: Remote sensing image dataset containing 45 classes

Both datasets employ non-IID partitioning using Dirichlet distribution, with parameter β controlling data skewness (smaller β indicates more skewed distribution).

Evaluation Metrics

  1. Global Model Accuracy: Classification accuracy on test set
  2. Target Client Unlearning Effect: Accuracy on target client test data (lower is better)
  3. Target Client Training Data Unlearning: Accuracy on target client training data (lower is better)

Comparison Methods

  1. Train From Scratch (TFS): Retraining from pre-training (upper bound baseline)
  2. Continue to Train (CTT): Continue training excluding target client, leveraging catastrophic forgetting
  3. FedEraser: Most well-known FU method reconstructing global model from historical client updates

Implementation Details

  • Model: ViT-B/16 based on OpenAI CLIP with frozen classification head
  • Optimizer: AdamW
  • Experimental Setup:
    • Resisc45: 3 FL rounds + 3 FU rounds + extended PU rounds
    • Cars-196: 10 FL rounds + 10 FU rounds + 5 PU rounds
  • Hyperparameters: λtgt and learning rate optimized through grid search

Experimental Results

Main Results

Unlearning Effect (Table 1)

SATA NTK significantly outperforms competing methods in target client test set accuracy across all settings:

Resisc45 Dataset:

  • β=0.05: 9.96% during FU phase vs FedEraser's 56.79%
  • β=0.1: 31.69% during FU phase vs FedEraser's 80.10%
  • β=0.5: 14.29% during FU phase vs FedEraser's 89.95%

Cars196 Dataset:

  • β=0.05: 1.48% during FU phase vs FedEraser's 56.04%
  • β=0.1: 6.36% during FU phase vs FedEraser's 58.32%
  • β=0.5: 0.27% during FU phase vs FedEraser's 69.93%

Global Model Performance (Table 2)

While SATA demonstrates superior unlearning effects, global model accuracy is slightly lower than other methods, particularly during FU phase:

Performance Degradation Analysis:

  • More pronounced performance degradation in high heterogeneity (low β) environments
  • Performance recovers to near other methods' levels after PU phase

Ablation Studies

NTK Effect Verification (Tables 3-4)

Comparing effects with and without NTK training:

  • SATA vs SATA NTK: NTK training consistently improves unlearning performance
  • SAFA vs SAFA NTK: SAFA (Stand Alone FedAvg) achieves higher global accuracy but slightly inferior unlearning effects

Different Unlearning Strategy Comparison

  1. θ₀ + ∑ᵢ≠tgt λᵢτᵢˢᵃ: Using only standalone task vectors of remaining clients
  2. θ̂ - λtgt τₜₒₜˢᵃ: Subtracting target client contribution from global model (SATA method)

Results demonstrate SATA method's superior unlearning effects.

Case Analysis

Visualization results in Figure 1 reveal:

  • SATA achieves lowest accuracy on target client
  • Although global accuracy decreases, rapid recovery occurs during PU phase
  • Higher β values (lower data heterogeneity) yield better method performance

Experimental Findings

  1. Effectiveness of Single-Round Unlearning: SATA successfully achieves effective unlearning within single-round communication
  2. Importance of NTK: NTK training significantly enhances task arithmetic effects
  3. Impact of Data Heterogeneity: High heterogeneity environments present greater challenges for the method
  4. Rapid Recovery Capability: PU phase enables quick model performance recovery

Federated Learning Algorithms

  • FedAvg: Fundamental parameter averaging aggregation method
  • FedProx: Introduces proximal term for heterogeneity handling
  • SCAFFOLD: Uses control variates to mitigate client drift
  • FedDC: Adjusts updates through drift estimation and correction

Machine Unlearning

  • Centralized Unlearning: Traditional machine unlearning methods unsuitable for federated settings
  • Federated Unlearning: FedEraser, FedRecover, FedRecovery and similar methods
  • Linear operation framework for pre-trained model editing
  • Theoretical foundations of NTK-enhanced task arithmetic

Conclusions and Discussion

Main Conclusions

  1. Proposes the first effective method enabling federated unlearning within single-round communication
  2. Theoretical framework based on task arithmetic and NTK provides good interpretability
  3. Validates method effectiveness across multiple data heterogeneity settings
  4. Significantly reduces service interruption time during unlearning process

Limitations

  1. High Heterogeneity Challenge: Performance limited in high Dirichlet coefficient (low heterogeneity) environments
  2. Global Performance Degradation: Global model accuracy decreases during unlearning process
  3. Dual Vector Overhead: Requires maintaining additional standalone task vectors, increasing storage and computational costs
  4. Hyperparameter Sensitivity: Parameters such as λtgt require careful tuning

Future Directions

  1. Address performance limitations in high Dirichlet coefficient settings
  2. Explore adaptability in other modalities and federated settings
  3. Further optimize global model performance preservation
  4. Investigate adaptive hyperparameter selection methods

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to achieve single-round federated unlearning, addressing critical practical problems
  2. Solid Theoretical Foundation: Based on robust theoretical foundations of task arithmetic and NTK
  3. High Practical Value: Significantly reduces service interruption time, improving system availability
  4. Comprehensive Experiments: Thorough evaluation across multiple datasets and heterogeneity settings
  5. Method Simplicity: Core concepts are intuitive and straightforward, facilitating understanding and implementation

Weaknesses

  1. Performance Trade-off: Clear trade-off exists between unlearning effect and global performance
  2. Heterogeneity Limitations: Suboptimal performance in certain heterogeneity settings
  3. Resource Overhead: Dual task vector mechanism introduces additional storage and computational costs
  4. Insufficient Theoretical Analysis: Lacks in-depth analysis of method convergence and theoretical guarantees

Impact

  1. Academic Contribution: Provides new research direction for federated unlearning field
  2. Practical Value: Addresses critical practical deployment issues with important application prospects
  3. Technical Inspiration: Application of task arithmetic in federated learning offers valuable insights

Applicable Scenarios

  1. Time-Sensitive Systems: Real-time services requiring rapid unlearning response
  2. High-Frequency Unlearning Requirements: Dynamic environments frequently requiring client removal
  3. Resource-Abundant Environments: Systems capable of bearing dual vector storage overhead
  4. Low-to-Medium Heterogeneity Environments: Federated learning scenarios with relatively uniform data distribution

References

This paper cites 34 relevant references covering multiple related domains including federated learning, machine unlearning, and task arithmetic, providing comprehensive theoretical foundation and comparison baselines.


Overall Assessment: This is an important contribution to the federated unlearning field, proposing a single-round unlearning method that addresses critical practical problems. Despite certain limitations, its innovation and practical value make it a significant advance in this domain.