2025-11-23T22:52:17.543262

FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

Li, Bidkhori
We introduce a novel framework for Federated Class Incremental Learning, called Federated Gaussian Task Embedding and Alignment (FedGTEA). FedGTEA is designed to capture task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. At the client side, the Cardinality-Agnostic Task Encoder (CATE) produces Gaussian-distributed task embeddings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. Importantly, CATE maintains a fixed parameter size regardless of the number of tasks, which ensures scalability across long task sequences. On the server side, FedGTEA utilizes the 2-Wasserstein distance to measure inter-task gaps between Gaussian embeddings. We formulate the Wasserstein loss to enforce inter-task separation. This probabilistic formulation not only enhances representation learning but also preserves task-level privacy by avoiding the direct transmission of latent embeddings, aligning with the privacy constraints in federated learning. Extensive empirical evaluations on popular datasets demonstrate that FedGTEA achieves superior classification performance and significantly mitigates forgetting, consistently outperforming strong existing baselines.
academic

FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

Basic Information

  • Paper ID: 2510.12927
  • Title: FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment
  • Authors: Haolin Li, Hoda Bidkhori (George Mason University)
  • Classification: cs.LG stat.ML
  • Conference: AISTATS 2026, Tangier, Morocco
  • Paper Link: https://arxiv.org/abs/2510.12927

Abstract

This paper proposes a novel federated class-incremental learning framework, FedGTEA (Federated Gaussian Task Embedding and Alignment). The framework captures task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. On the client side, a Cardinality-Agnostic Task Encoder (CATE) generates Gaussian-distributed task embeddings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. A key characteristic of CATE is that it maintains fixed parameter scale regardless of the number of tasks, ensuring scalability for long task sequences. On the server side, FedGTEA leverages the 2-Wasserstein distance to measure task gaps between Gaussian embeddings, enforcing task separation through Wasserstein loss. This probabilistic formulation not only enhances representation learning but also protects task-level privacy by avoiding direct transmission of latent embeddings.

Research Background and Motivation

Problem Definition

Federated Class-Incremental Learning (FCIL) is a hybrid of federated learning (FL) and class-incremental learning (CIL), requiring simultaneous solutions to three core challenges:

  1. Catastrophic Forgetting: Occurs both during local client updates and global aggregation
  2. Statistical Heterogeneity: Data distributions across clients are typically non-independent and identically distributed
  3. Task Context Ambiguity: Lack of task identity at test time leads to semantic drift and performance degradation

Research Motivation

Existing FCIL methods primarily focus on data-level feature utilization while neglecting the importance of task-level context. As shown in Figure 1, the same input may produce contradictory answers under different tasks (e.g., "What is this object?" vs. "What is the background color?"), requiring different task-level contextual information. Therefore, how to effectively utilize task context in FCIL remains a relatively underdeveloped research area.

Limitations of Existing Methods

  • Most methods focus on memory-based data-level feature utilization
  • Prompt learning methods, while incorporating task knowledge, suffer from increased memory usage and computational overhead
  • Lack of parameter-efficient task encoder design

Core Contributions

  1. Proposes FedGTEA Algorithm: Effectively captures task-level knowledge in FCIL in a scalable and robust manner, introducing a Cardinality-Agnostic Task Encoder (CATE) on the client side to generate task embeddings modeled as Gaussian random variables, and leveraging 2-Wasserstein distance on the server side to promote task separation.
  2. Designs CATE Module: Capable of inferring task embeddings from data batches of arbitrary size with cardinality-agnostic properties. By modeling embeddings as Gaussian random variables, the server can quantify inter-task distances using the 2-Wasserstein metric.
  3. Server-side Optimization Framework: First performs initial model aggregation using FedAvg principles, then formulates an optimization problem containing three loss components: knowledge distillation loss, Wasserstein loss, and anchor loss.
  4. Experimental Validation: Achieves superior accuracy and forgetting performance compared to strong baselines (AC-GAN + FedAvg/FedProx, GLFC, FedCIL, FLwF-2T) on multiple benchmark datasets.

Methodology Details

Task Definition

The FCIL system consists of N clients and a central server, processing a global task sequence T = {T¹, T², ..., Tᵀ}. Each client Cₖ collects a local dataset Dᵗₖ ⊂ Tᵗ during task Tᵗ. The objective is to find global parameters θᵗₘ that minimize the loss across all observed tasks and all clients.

Model Architecture

Client-side Model

The client model contains two core components:

1. Cardinality-Agnostic Task Encoder (CATE)

  • Designed as a fully connected neural network that, given a batch of arbitrary size B = (x₁, x₂, ..., xᵦ), outputs a d-dimensional task embedding:
    Eᵦ = (1/b)∑ᵢ₌₁ᵇ CATE(xᵢ) ∈ ℝᵈ
    
  • Parameter count does not grow with the number of tasks, ensuring scalability for long task sequences

2. AC-GAN Module

  • Discriminator contains Real/Fake head and classification head
  • Classification head fuses data features F and task embeddings E for prediction
  • Generator G synthesizes images for replay

Gaussian Task Embedding

Task embeddings are modeled as Gaussian random variables:

  • Global: Eᵗ ~ N(μᵗ, Σᵗ)
  • Client-specific: Eᵗₖ ~ N(μᵗₖ, Σᵗₖ)

Server-side Aggregation and Regularization

Initial Model Aggregation

Follows FedAvg principles:

θ̂ᵗₘ = ∑ₖ₌₁ᴺ wₖθᵗₖ

where weights wₖ are proportional to the number of local data points |Dᵗₖ|.

Model Regularization and Integration

Server loss contains three components:

Lserver = αLKD + βLWasserstein + γLanchor

1. Knowledge Distillation Loss:

LKD = ∑(x,y)∈Aᵀ KL(θᵀ⁻¹ₘ(x)∥θ(x))

2. Wasserstein Loss: Uses 2-Wasserstein distance to measure inter-task distances. For two Gaussian distributions:

W²₂(m₁,m₂) = ∥μ₁ - μ₂∥²₂ + tr(Σ₁ + Σ₂ - 2(Σ₁^(1/2)Σ₂Σ₁^(1/2))^(1/2))

Wasserstein loss is defined as:

LWasserstein = [∑₁≤ᵢ<ⱼ≤ᵀ W²₂(Nᵢ,Nⱼ)]⁻¹

3. Anchor Loss:

Lanchor = ∥θ - θ̂ᵗₘ∥₂

Technical Innovations

  1. Cardinality-Agnostic Design: CATE can handle input batches of arbitrary size, providing better robustness and adaptability
  2. Gaussian Modeling: Modeling task embeddings as Gaussian random variables enables the use of Wasserstein distance for inter-task distance measurement
  3. Privacy Protection: Protects task-level privacy by avoiding direct transmission of latent embeddings
  4. Multi-level Regularization: Comprehensive loss function combining knowledge distillation, task separation, and model stability

Experimental Setup

Datasets

Three standard FCIL datasets are used:

  • CIFAR-10: 10 classes, 60,000 instances
  • CIFAR-100 iCaRL Split: Randomly split according to iCaRL principles
  • CIFAR-100 Superclass Split: 20 semantically related superclasses, each containing 5 classes

Task Sequence Configuration

  • Sequence 1 (CIFAR-10): 5 clients, 5 tasks, 2 classes per task
  • Sequence 2 (CIFAR-100): 10 clients, 10 tasks, 10 classes per task
  • Sequence 3 (CIFAR-100 Superclass): 10 clients, 20 tasks, 5 semantically related classes per task

Evaluation Metrics

  • Average Accuracy: Final test accuracy across all observed tasks
  • Average Forgetting: Gap between peak accuracy and final accuracy for each task

Comparison Methods

  • FL Baselines: FedAvg, FedProx
  • CIL Methods: iCaRL, DER
  • FCIL Methods: FLwF-2T, FedCIL, GLFC
  • Enhanced Baselines: AC-GAN + FedAvg/FedProx

Implementation Details

  • Optimizer: Adam
  • Batch size: 64
  • CIFAR-10: Learning rate 1×10⁻⁴, 60 global communication rounds, 100 local iterations per round
  • CIFAR-100: Learning rate 1×10⁻³, 40 global communication rounds, 400 local iterations per round
  • Hyperparameters: α=0.3, β=0.3, γ=0.4

Experimental Results

Main Results

ModelSequence 1: CIFAR-10Sequence 2: CIFAR-100Sequence 3: CIFAR-100 Superclass
Accuracy↑ Forgetting↓Accuracy↑ Forgetting↓Accuracy↑ Forgetting↓
FedAvg26.2±2.6 8.5±1.723.4±2.9 9.2±1.923.7±2.5 13.2±1.6
FedProx26.1±1.8 8.6±1.324.1±1.9 8.4±2.023.1±1.9 14.5±2.3
GLFC35.7±1.1 6.3±0.933.1±0.6 10.7±1.833.6±1.7 11.2±2.2
FedCIL32.4±1.9 6.9±1.931.5±0.4 7.4±1.231.2±1.6 10.8±2.0
FedGTEA37.1±0.7 4.5±0.535.9±0.6 6.6±1.735.1±1.2 8.6±1.4

Key Findings

  1. Sequence 1: FedGTEA achieves the highest accuracy (37.1±0.7) and the only forgetting rate below 5% (4.5±0.5)
  2. Sequence 2: FedGTEA obtains the best accuracy (35.9±0.6) while maintaining single-digit forgetting rate (6.6±1.7)
  3. Sequence 3: FedGTEA performs best in both accuracy (35.1±1.2) and forgetting rate (8.6±1.4)

Ablation Study

Model VariantSequence 1: CIFAR-10Sequence 2: CIFAR-100Sequence 3: CIFAR-100 Superclass
w/o CATE & Wasserstein32.6±0.5 7.1±0.732.2±0.5 8.1±1.131.7±0.7 10.5±0.9
w/o Wasserstein34.1±0.7 5.8±0.433.3±0.4 8.8±0.732.2±0.3 10.3±0.3
w/o Anchor30.2±1.3 6.9±1.432.5±0.4 8.1±0.331.0±0.4 10.8±0.2
w/o Distillation32.3±1.5 8.7±1.131.9±0.6 10.9±1.631.4±1.1 12.2±2.4
Complete FedGTEA37.1±0.7 4.5±0.535.9±0.6 6.6±1.735.1±1.2 8.6±1.4

Ablation Study Analysis

  • Distillation Loss: Removing it significantly increases forgetting rate (from 8.6 to 12.2 on CIFAR-100 superclass), demonstrating its importance for retaining prior knowledge
  • Anchor Loss: Removing it substantially decreases accuracy (nearly 7% drop on CIFAR-10), indicating its necessity for stabilizing discriminative feature representation
  • CATE and Wasserstein Loss: Removing them significantly degrades performance, validating the effectiveness of the task encoder and task separation mechanism

Class-Incremental Learning

CIL methods are primarily categorized into three types:

  1. Replay Methods: Such as iCaRL and GEM, maintaining sample buffers
  2. Regularization Methods: Constraining parameter updates through knowledge distillation
  3. Prompt Methods: Such as L2P and DualPrompt, learning context vector pools

Federated Learning

Main aggregation strategies include FedAvg and FedProx, addressing statistical heterogeneity through weighted averaging and regularization, respectively.

Federated Class-Incremental Learning

Existing FCIL methods are categorized as:

  1. Replay Methods: Using local sample buffers or generative replay
  2. Regularization and Distillation Methods: Transferring knowledge through knowledge distillation
  3. Prompt Methods: Storing prompt pools on clients to encode task context

Conclusions and Discussion

Main Conclusions

FedGTEA achieves effective modeling of task-level knowledge in FCIL by introducing a cardinality-agnostic task encoder and Wasserstein distance regularization, outperforming existing methods in both accuracy and forgetting performance.

Limitations

  1. Computational Complexity: The O(n³) complexity of 2-Wasserstein distance computation may become a bottleneck for high-dimensional embeddings
  2. Hyperparameter Sensitivity: The weights of three loss components require careful tuning
  3. Limited Evaluation Scope: Evaluation is restricted to image classification tasks; applicability to other domains remains unknown

Future Directions

  1. Explore more efficient Wasserstein distance computation methods
  2. Investigate adaptive hyperparameter adjustment strategies
  3. Extend to other modalities and task types

In-Depth Evaluation

Strengths

  1. Strong Novelty: First systematic modeling of task-level knowledge in FCIL, proposing cardinality-agnostic task encoder design
  2. Solid Theoretical Foundation: Using properties of 2-Wasserstein distance provides rigorous theoretical support for task separation
  3. Comprehensive Experiments: Full evaluation across multiple datasets and settings, with ablation studies validating the effectiveness of each component
  4. Privacy Protection: Protects task-level privacy by avoiding direct embedding transmission

Weaknesses

  1. Computational Overhead: Wasserstein distance computation and matrix operations may introduce additional computational costs
  2. Parameter Tuning: Balancing multiple hyperparameters requires substantial tuning effort
  3. Insufficient Generalization Verification: Validation limited to CIFAR datasets; lacks experiments on larger and more diverse datasets

Impact

This work introduces a new perspective of task-level modeling to the FCIL field, potentially inspiring more research focusing on task context. The cardinality-agnostic design and privacy protection features make it promising for practical applications.

Applicable Scenarios

  • Federated systems requiring long-term learning of new classes
  • Distributed learning scenarios with high privacy requirements
  • Environments with significant variations in client data distributions

References

The paper cites important works in FCIL, CIL, and FL domains, including classical methods such as FedAvg, iCaRL, and AC-GAN, as well as recent FCIL research including FedCIL and GLFC, providing a solid theoretical foundation for this research.