2025-11-23T22:52:17.543262

FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

Li, Bidkhori

We introduce a novel framework for Federated Class Incremental Learning, called Federated Gaussian Task Embedding and Alignment (FedGTEA). FedGTEA is designed to capture task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. At the client side, the Cardinality-Agnostic Task Encoder (CATE) produces Gaussian-distributed task embeddings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. Importantly, CATE maintains a fixed parameter size regardless of the number of tasks, which ensures scalability across long task sequences. On the server side, FedGTEA utilizes the 2-Wasserstein distance to measure inter-task gaps between Gaussian embeddings. We formulate the Wasserstein loss to enforce inter-task separation. This probabilistic formulation not only enhances representation learning but also preserves task-level privacy by avoiding the direct transmission of latent embeddings, aligning with the privacy constraints in federated learning. Extensive empirical evaluations on popular datasets demonstrate that FedGTEA achieves superior classification performance and significantly mitigates forgetting, consistently outperforming strong existing baselines.

academic

FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment

Basic Information

Paper ID: 2510.12927
Title: FedGTEA: Federated Class-Incremental Learning with Gaussian Task Embedding and Alignment
Authors: Haolin Li, Hoda Bidkhori (George Mason University)
Classification: cs.LG stat.ML
Conference: AISTATS 2026, Tangier, Morocco
Paper Link: https://arxiv.org/abs/2510.12927

Abstract

This paper proposes a novel federated class-incremental learning framework, FedGTEA (Federated Gaussian Task Embedding and Alignment). The framework captures task-specific knowledge and model uncertainty in a scalable and communication-efficient manner. On the client side, a Cardinality-Agnostic Task Encoder (CATE) generates Gaussian-distributed task embeddings that encode task knowledge, address statistical heterogeneity, and quantify data uncertainty. A key characteristic of CATE is that it maintains fixed parameter scale regardless of the number of tasks, ensuring scalability for long task sequences. On the server side, FedGTEA leverages the 2-Wasserstein distance to measure task gaps between Gaussian embeddings, enforcing task separation through Wasserstein loss. This probabilistic formulation not only enhances representation learning but also protects task-level privacy by avoiding direct transmission of latent embeddings.

Research Background and Motivation

Problem Definition

Federated Class-Incremental Learning (FCIL) is a hybrid of federated learning (FL) and class-incremental learning (CIL), requiring simultaneous solutions to three core challenges:

Catastrophic Forgetting: Occurs both during local client updates and global aggregation
Statistical Heterogeneity: Data distributions across clients are typically non-independent and identically distributed
Task Context Ambiguity: Lack of task identity at test time leads to semantic drift and performance degradation

Research Motivation

Existing FCIL methods primarily focus on data-level feature utilization while neglecting the importance of task-level context. As shown in Figure 1, the same input may produce contradictory answers under different tasks (e.g., "What is this object?" vs. "What is the background color?"), requiring different task-level contextual information. Therefore, how to effectively utilize task context in FCIL remains a relatively underdeveloped research area.

Limitations of Existing Methods

Most methods focus on memory-based data-level feature utilization
Prompt learning methods, while incorporating task knowledge, suffer from increased memory usage and computational overhead
Lack of parameter-efficient task encoder design

Core Contributions

Proposes FedGTEA Algorithm: Effectively captures task-level knowledge in FCIL in a scalable and robust manner, introducing a Cardinality-Agnostic Task Encoder (CATE) on the client side to generate task embeddings modeled as Gaussian random variables, and leveraging 2-Wasserstein distance on the server side to promote task separation.
Designs CATE Module: Capable of inferring task embeddings from data batches of arbitrary size with cardinality-agnostic properties. By modeling embeddings as Gaussian random variables, the server can quantify inter-task distances using the 2-Wasserstein metric.
Server-side Optimization Framework: First performs initial model aggregation using FedAvg principles, then formulates an optimization problem containing three loss components: knowledge distillation loss, Wasserstein loss, and anchor loss.
Experimental Validation: Achieves superior accuracy and forgetting performance compared to strong baselines (AC-GAN + FedAvg/FedProx, GLFC, FedCIL, FLwF-2T) on multiple benchmark datasets.

Methodology Details

Task Definition

The FCIL system consists of N clients and a central server, processing a global task sequence T = {T¹, T², ..., Tᵀ}. Each client Cₖ collects a local dataset Dᵗₖ ⊂ Tᵗ during task Tᵗ. The objective is to find global parameters θᵗₘ that minimize the loss across all observed tasks and all clients.

Model Architecture

Client-side Model

The client model contains two core components:

1. Cardinality-Agnostic Task Encoder (CATE)

Designed as a fully connected neural network that, given a batch of arbitrary size B = (x₁, x₂, ..., xᵦ), outputs a d-dimensional task embedding:
```
Eᵦ = (1/b)∑ᵢ₌₁ᵇ CATE(xᵢ) ∈ ℝᵈ
```
Parameter count does not grow with the number of tasks, ensuring scalability for long task sequences

2. AC-GAN Module

Discriminator contains Real/Fake head and classification head
Classification head fuses data features F and task embeddings E for prediction
Generator G synthesizes images for replay

Gaussian Task Embedding

Task embeddings are modeled as Gaussian random variables:

Global: Eᵗ ~ N(μᵗ, Σᵗ)
Client-specific: Eᵗₖ ~ N(μᵗₖ, Σᵗₖ)

Server-side Aggregation and Regularization

Initial Model Aggregation

Follows FedAvg principles:

θ̂ᵗₘ = ∑ₖ₌₁ᴺ wₖθᵗₖ

where weights wₖ are proportional to the number of local data points |Dᵗₖ|.

Model Regularization and Integration

Server loss contains three components:

Lserver = αLKD + βLWasserstein + γLanchor

1. Knowledge Distillation Loss:

LKD = ∑(x,y)∈Aᵀ KL(θᵀ⁻¹ₘ(x)∥θ(x))

2. Wasserstein Loss: Uses 2-Wasserstein distance to measure inter-task distances. For two Gaussian distributions:

W²₂(m₁,m₂) = ∥μ₁ - μ₂∥²₂ + tr(Σ₁ + Σ₂ - 2(Σ₁^(1/2)Σ₂Σ₁^(1/2))^(1/2))

Wasserstein loss is defined as:

LWasserstein = [∑₁≤ᵢ<ⱼ≤ᵀ W²₂(Nᵢ,Nⱼ)]⁻¹

3. Anchor Loss:

Lanchor = ∥θ - θ̂ᵗₘ∥₂

Technical Innovations

Cardinality-Agnostic Design: CATE can handle input batches of arbitrary size, providing better robustness and adaptability
Gaussian Modeling: Modeling task embeddings as Gaussian random variables enables the use of Wasserstein distance for inter-task distance measurement
Privacy Protection: Protects task-level privacy by avoiding direct transmission of latent embeddings
Multi-level Regularization: Comprehensive loss function combining knowledge distillation, task separation, and model stability

Experimental Setup

Datasets

Three standard FCIL datasets are used:

CIFAR-10: 10 classes, 60,000 instances
CIFAR-100 iCaRL Split: Randomly split according to iCaRL principles
CIFAR-100 Superclass Split: 20 semantically related superclasses, each containing 5 classes

Task Sequence Configuration

Sequence 1 (CIFAR-10): 5 clients, 5 tasks, 2 classes per task
Sequence 2 (CIFAR-100): 10 clients, 10 tasks, 10 classes per task
Sequence 3 (CIFAR-100 Superclass): 10 clients, 20 tasks, 5 semantically related classes per task

Evaluation Metrics

Average Accuracy: Final test accuracy across all observed tasks
Average Forgetting: Gap between peak accuracy and final accuracy for each task

Comparison Methods

FL Baselines: FedAvg, FedProx
CIL Methods: iCaRL, DER
FCIL Methods: FLwF-2T, FedCIL, GLFC
Enhanced Baselines: AC-GAN + FedAvg/FedProx

Implementation Details

Optimizer: Adam
Batch size: 64
CIFAR-10: Learning rate 1×10⁻⁴, 60 global communication rounds, 100 local iterations per round
CIFAR-100: Learning rate 1×10⁻³, 40 global communication rounds, 400 local iterations per round
Hyperparameters: α=0.3, β=0.3, γ=0.4

Experimental Results

Main Results

Model	Sequence 1: CIFAR-10	Sequence 2: CIFAR-100	Sequence 3: CIFAR-100 Superclass
	Accuracy↑ Forgetting↓	Accuracy↑ Forgetting↓	Accuracy↑ Forgetting↓
FedAvg	26.2±2.6 8.5±1.7	23.4±2.9 9.2±1.9	23.7±2.5 13.2±1.6
FedProx	26.1±1.8 8.6±1.3	24.1±1.9 8.4±2.0	23.1±1.9 14.5±2.3
GLFC	35.7±1.1 6.3±0.9	33.1±0.6 10.7±1.8	33.6±1.7 11.2±2.2
FedCIL	32.4±1.9 6.9±1.9	31.5±0.4 7.4±1.2	31.2±1.6 10.8±2.0
FedGTEA	37.1±0.7 4.5±0.5	35.9±0.6 6.6±1.7	35.1±1.2 8.6±1.4

Key Findings

Sequence 1: FedGTEA achieves the highest accuracy (37.1±0.7) and the only forgetting rate below 5% (4.5±0.5)
Sequence 2: FedGTEA obtains the best accuracy (35.9±0.6) while maintaining single-digit forgetting rate (6.6±1.7)
Sequence 3: FedGTEA performs best in both accuracy (35.1±1.2) and forgetting rate (8.6±1.4)

Ablation Study

Model Variant	Sequence 1: CIFAR-10	Sequence 2: CIFAR-100	Sequence 3: CIFAR-100 Superclass
w/o CATE & Wasserstein	32.6±0.5 7.1±0.7	32.2±0.5 8.1±1.1	31.7±0.7 10.5±0.9
w/o Wasserstein	34.1±0.7 5.8±0.4	33.3±0.4 8.8±0.7	32.2±0.3 10.3±0.3
w/o Anchor	30.2±1.3 6.9±1.4	32.5±0.4 8.1±0.3	31.0±0.4 10.8±0.2
w/o Distillation	32.3±1.5 8.7±1.1	31.9±0.6 10.9±1.6	31.4±1.1 12.2±2.4
Complete FedGTEA	37.1±0.7 4.5±0.5	35.9±0.6 6.6±1.7	35.1±1.2 8.6±1.4

Ablation Study Analysis

Distillation Loss: Removing it significantly increases forgetting rate (from 8.6 to 12.2 on CIFAR-100 superclass), demonstrating its importance for retaining prior knowledge
Anchor Loss: Removing it substantially decreases accuracy (nearly 7% drop on CIFAR-10), indicating its necessity for stabilizing discriminative feature representation
CATE and Wasserstein Loss: Removing them significantly degrades performance, validating the effectiveness of the task encoder and task separation mechanism

Class-Incremental Learning

CIL methods are primarily categorized into three types:

Replay Methods: Such as iCaRL and GEM, maintaining sample buffers
Regularization Methods: Constraining parameter updates through knowledge distillation
Prompt Methods: Such as L2P and DualPrompt, learning context vector pools

Federated Learning

Main aggregation strategies include FedAvg and FedProx, addressing statistical heterogeneity through weighted averaging and regularization, respectively.

Federated Class-Incremental Learning

Existing FCIL methods are categorized as:

Replay Methods: Using local sample buffers or generative replay
Regularization and Distillation Methods: Transferring knowledge through knowledge distillation
Prompt Methods: Storing prompt pools on clients to encode task context

Conclusions and Discussion

Main Conclusions

FedGTEA achieves effective modeling of task-level knowledge in FCIL by introducing a cardinality-agnostic task encoder and Wasserstein distance regularization, outperforming existing methods in both accuracy and forgetting performance.

Limitations

Computational Complexity: The O(n³) complexity of 2-Wasserstein distance computation may become a bottleneck for high-dimensional embeddings
Hyperparameter Sensitivity: The weights of three loss components require careful tuning
Limited Evaluation Scope: Evaluation is restricted to image classification tasks; applicability to other domains remains unknown

Future Directions

Explore more efficient Wasserstein distance computation methods
Investigate adaptive hyperparameter adjustment strategies
Extend to other modalities and task types

In-Depth Evaluation

Strengths

Strong Novelty: First systematic modeling of task-level knowledge in FCIL, proposing cardinality-agnostic task encoder design
Solid Theoretical Foundation: Using properties of 2-Wasserstein distance provides rigorous theoretical support for task separation
Comprehensive Experiments: Full evaluation across multiple datasets and settings, with ablation studies validating the effectiveness of each component
Privacy Protection: Protects task-level privacy by avoiding direct embedding transmission

Weaknesses

Computational Overhead: Wasserstein distance computation and matrix operations may introduce additional computational costs
Parameter Tuning: Balancing multiple hyperparameters requires substantial tuning effort
Insufficient Generalization Verification: Validation limited to CIFAR datasets; lacks experiments on larger and more diverse datasets

Impact

This work introduces a new perspective of task-level modeling to the FCIL field, potentially inspiring more research focusing on task context. The cardinality-agnostic design and privacy protection features make it promising for practical applications.

Applicable Scenarios

Federated systems requiring long-term learning of new classes
Distributed learning scenarios with high privacy requirements
Environments with significant variations in client data distributions

References

The paper cites important works in FCIL, CIL, and FL domains, including classical methods such as FedAvg, iCaRL, and AC-GAN, as well as recent FCIL research including FedCIL and GLFC, providing a solid theoretical foundation for this research.