2025-11-24T09:58:18.212416

Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation

Zhou, Zhu, Cui et al.

Continual Test-Time Adaptation (CTTA) aims to quickly fine-tune the model during the test phase so that it can adapt to multiple unknown downstream domain distributions without pre-acquiring downstream domain data. To this end, existing advanced CTTA methods mainly reduce the catastrophic forgetting of historical knowledge caused by irregular switching of downstream domain data by restoring the initial model or reusing historical models. However, these methods are usually accompanied by serious insufficient learning of new knowledge and interference from potentially harmful historical knowledge, resulting in severe performance degradation. To this end, we propose a class-aware domain Knowledge Fusion and Fission method for continual test-time adaptation, called KFF, which adaptively expands and merges class-aware domain knowledge in old and new domains according to the test-time data from different domains, where discriminative historical knowledge can be dynamically accumulated. Specifically, considering the huge domain gap within streaming data, a domain Knowledge FIssion (KFI) module is designed to adaptively separate new domain knowledge from a paired class-aware domain prompt pool, alleviating the impact of negative knowledge brought by old domains that are distinct from the current domain. Besides, to avoid the cumulative computation and storage overheads from continuously fissioning new knowledge, a domain Knowledge FUsion (KFU) module is further designed to merge the fissioned new knowledge into the existing knowledge pool with minimal cost, where a greedy knowledge dynamic merging strategy is designed to improve the compatibility of new and old knowledge while keeping the computational efficiency. Extensive experiments on the ImageNet-C dataset verify the effectiveness of our proposed method against other methods.

academic

Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation

Basic Information

Paper ID: 2510.12150
Title: Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation
Authors: Jiahuan Zhou, Chao Zhu, Zhenyu Cui, Zichen Liu, Xu Zou, Gang Hua
Category: cs.CV (Computer Vision)
Conference: NeurIPS 2025 (39th Conference on Neural Information Processing Systems)
Paper Link: https://arxiv.org/abs/2510.12150

Abstract

This paper proposes KFF, a class-aware domain knowledge fusion and fission method for addressing the Continual Test-Time Adaptation (CTTA) problem. The method adaptively separates new domain knowledge through a Knowledge Fission (KFI) module to avoid negative interference from historical domains, and merges the fissioned new knowledge into the existing knowledge pool at minimal cost through a Knowledge Fusion (KFU) module. Experiments on the ImageNet-C dataset demonstrate a 5.1% performance improvement over the state-of-the-art method DPCore.

Research Background and Motivation

Problem Definition

Continual Test-Time Adaptation (CTTA) aims to enable pre-trained models to rapidly adapt to multiple unknown downstream domain distributions during the test phase without prior access to downstream domain data. This presents a more challenging problem compared to traditional Test-Time Adaptation (TTA).

Core Challenges

Catastrophic Forgetting: Irregular domain data switching leads to catastrophic forgetting of historical knowledge
Insufficient New Knowledge Learning: Existing methods often fail to sufficiently learn new knowledge while preserving historical knowledge
Harmful Historical Knowledge Interference: Knowledge conflicts between different domains disrupt gradient optimization directions

Limitations of Existing Methods

Regularization Methods: Preserve historical knowledge through regularization but suppress new knowledge learning
Parameter Reset Methods: Avoid forgetting by restoring initial models but lose useful historical knowledge
Model Fusion Methods: Select and fuse historical model parameters but suffer from domain conflicts and unbounded storage overhead

Core Contributions

Proposes KFF Framework: The first class-aware domain knowledge fusion and fission framework capable of dynamically accumulating discriminative historical knowledge
Designs KFI Module: A knowledge fission module that adaptively separates new domain knowledge, reducing negative knowledge interference across domains
Develops KFU Module: A knowledge fusion module that merges knowledge through greedy strategies, balancing effectiveness and efficiency
Achieves State-of-the-Art Performance: Achieves 34.8% error rate on ImageNet-C, a 5.1% improvement over DPCore
Provides Theoretical Analysis: Theoretical guarantees based on well-separated clustering assumptions

Method Details

Task Definition

Given source domain training data $D_S = \{Y_S, X_S\}$ and test data streams from different domain distributions $D_T = \{X_T\}_{T=1}^N$ , the model $f_θ$ must process test batches $B_T^j = \{x_t\}_{t=0}^b$ online, with the objective of adapting to target domains while maintaining performance on historical domains.

Model Architecture

Overall Framework

The KFF framework contains two core modules:

Knowledge Fission (KFI) Module: Dynamically fissions class-aware domain knowledge
Knowledge Fusion (KFU) Module: Merges fissioned knowledge into the existing knowledge pool

Knowledge Fission Module (KFI)

Class-level Knowledge Fission:

Uses cosine similarity $s_{t,i} = \text{sim}(\tilde{y}_t, y_i)$ to evaluate the matching degree between pseudo-labels and prompt keys
Selects candidate prompts with $s_{t,i} > γ_c$ and uses them in a weighted manner:

P_t = Σ_{i=0}^{N_c} w_i P_i^c, w_i = exp(s_{t,i}/τ_c) / Σ exp(s_{t,i}/τ_c)

If no candidate prompts exist, fissions new prompts for test samples

Domain-level Knowledge Fission:

Uses test batch statistical features $Γ_T^j = \{μ, σ\}$ as input keys
Selects candidate prompts based on Euclidean distance: $d_i = \|Γ_T^j - Γ_i\|_2 < γ_d$
Merges through distance-weighted combination:

P^d = Σ_{i=0}^{N_d} w_i P_i^d, w_i = exp(-d_i/τ_d) / Σ exp(-d_i/τ_d)

Knowledge Fusion Module (KFU)

Class-level Knowledge Fusion:

Uses entropy threshold $γ_h$ to control prompt pool updates
Directly adds newly fissioned prompts to the pool
For combined prompts, updates original prompts by weight:

P_{c_i}^* = (1/b) Σ_{t=0}^b [w_{ti} P_t^* + (1-w_{ti}) P_i^c]

Uses Minimum Spanning Tree (MST) algorithm to cluster and fuse prompts to control pool size

Domain-level Knowledge Fusion:

Directly adds new prompts to the domain prompt pool
Updates combined prompts by weight: $P_{d_i}^* = w_i P_d^* + (1-w_i) P_i^d$
Fuses nearest neighbor prompt pairs when pool is full

Loss Function Design

Employs a two-level loss function:

L = L_d + a·L_c

Where:

Domain alignment loss: $L_d = \|μ_s - μ_T^j(P)\|_2 + α\|σ_s - σ_T^j(P)\|_2$
Instance-level entropy loss: $L_c = (1/b) Σ_{t=0}^b H(\hat{y}_t)$

Experimental Setup

Datasets

ImageNet-to-ImageNet-C: 15 corruption types at maximum severity level 5
CIFAR100-to-CIFAR100-C: Same setup
CIFAR10-to-CIFAR10-C: Same setup

Evaluation Metrics

Classification error rate (%) as primary metric
Number of learnable parameters, memory usage, and computation time as efficiency metrics

Comparison Methods

TTA Methods: TENT, SAR, POEM
CTTA Methods: CoTTA, VDP, RoTTA, C-MAE, ROID, ViDA, CoLA, PALM, DPCore

Implementation Details

Backbone Network: ViT-B/16
Optimizer: AdamW with domain prompt learning rate 0.1 and class prompt learning rate 0.001
Batch Size: 64
Domain prompt length: 8, class prompt length: 1
Key Hyperparameters: $γ_d=25, γ_c=0.005, γ_h=2, N_d=20, N_c=100$

Experimental Results

Main Results

Non-Repeating Domain Setting:

ImageNet-C: 34.8% vs DPCore's 39.9%, improvement of 5.1%
CIFAR100-C: 22.5% vs DPCore's 25.1%, improvement of 2.6%
CIFAR10-C: 12.4% vs DPCore's 15.4%, improvement of 3.0%

Repeating Domain Setting (10 rounds):

ImageNet-C average error rate: 34.5% vs DPCore's 44.4%, improvement of 9.9%
Performance remains stable across multiple rounds, validating method robustness

Efficiency Analysis

Introduces only 0.09M learnable parameters (approximately 0.1% of total model parameters)
In repeating domain settings, DPCore uses approximately 5 times more parameters than this method by round 10
Computational overhead comparable to DPCore but with significantly superior performance

Ablation Study

Component contribution analysis:

Domain prompts only + KFI + KFU: 39.5%
Class prompts only + KFI + KFU: 50.9%
Dual prompts without KFI + KFU: 62.9% (severe performance degradation)
Dual prompts + KFI without KFU: 36.9%
Complete method: 34.8%

Results demonstrate that each component is indispensable, with the KFI module being most critical for performance improvement.

Visualization Analysis

Attention Map Analysis: The method concentrates attention on discriminative regions related to classes
t-SNE Analysis: Domain prompt keys and test batch statistical features form well-separated clusters
Class Distribution Analysis: Class prompts effectively map different classes to corresponding prompts

Theoretical Analysis

Well-Separated Clustering Assumption

Assumes test batches can be naturally partitioned into N well-separated clusters based on feature representations, with a threshold θ such that:

∀i≠j, max_{B,B'∈C_i} d(B,B') < θ < min_{B∈C_i,B'∈C_j} d(B,B')

Theoretical Guarantees

Lemma A.1: The KFI mechanism correctly assigns all batches to prompts of the same cluster Lemma A.2: The KFU mechanism only fuses prompts within the same cluster Proposition A.3: The KFF method correctly assigns all batches to prompts of the same cluster

Theoretical analysis guarantees method correctness, with t-SNE visualizations in experiments validating theoretical assumptions.

Test-Time Adaptation (TTA)

Early methods primarily use self-supervised losses such as entropy minimization and consistency maximization
Limitations: Assume static target domains and cannot handle dynamic domain shifts

Continual Test-Time Adaptation (CTTA)

Regularization Methods: EATA and EcoTTA mitigate error accumulation through regularization
Reset Methods: ERSK and CoTTA use weight resets to combat catastrophic forgetting
Prompt Learning Methods: VDP, SVDP, and DPCore leverage minimal parameters to learn domain-specific knowledge

Prompt Learning

Extended from NLP to computer vision
Existing methods primarily focus on domain-level knowledge, overlooking class-level information shared across domains

Conclusions and Discussion

Main Conclusions

The KFF framework effectively addresses domain conflict issues in CTTA
Class-aware design better leverages cross-domain shared knowledge
Knowledge fission and fusion mechanisms balance effectiveness and efficiency
Achieves significant performance improvements across multiple benchmark datasets

Limitations

Source Domain Dependency: Requires access to source domain statistics, presenting challenges in privacy-constrained scenarios
Synthetic Corruptions: Primarily validated on artificially designed corruptions; robustness to real-world distribution shifts requires further verification
Computational Overhead: While relatively efficient, remains challenging on resource-constrained devices
Hyperparameter Sensitivity: Requires careful tuning of key hyperparameters for different datasets

Future Directions

Explore adaptation methods without source domain statistical information
Validate method robustness on real-world datasets
Further optimize computational efficiency
Investigate adaptive hyperparameter adjustment mechanisms

In-Depth Evaluation

Strengths

Strong Novelty: First to propose class-aware knowledge fission and fusion framework, addressing the important domain conflict problem
Theoretical Support: Provides theoretical analysis based on well-separated clustering assumptions
Comprehensive Experiments: Conducts thorough comparative experiments and ablation studies across multiple datasets
Superior Efficiency: Achieves best performance while maintaining computational efficiency
Clear Visualization: Provides intuitive method explanations through attention maps and t-SNE visualizations

Weaknesses

Assumption Limitations: Well-separated clustering assumptions may not always hold in practical applications
Evaluation Limitations: Primarily evaluated on synthetic corruptions, lacking validation in real-world scenarios
Source Domain Dependency: Requirement for source domain statistics limits method applicability
Hyperparameter Complexity: Involves multiple hyperparameters requiring careful tuning

Impact

Academic Contribution: Provides new perspectives for CTTA research, expected to attract widespread attention
Practical Value: Has application potential in scenarios requiring continuous adaptation such as autonomous driving and medical imaging
Reproducibility: Authors commit to open-sourcing code, facilitating method dissemination

Applicable Scenarios

Computer vision tasks requiring continuous adaptation to multiple domain shifts
Edge computing scenarios with parameter efficiency requirements
Applications with access to limited source domain statistics
Structured environments with relatively predictable domain changes

This paper makes important contributions to the CTTA field, effectively addressing domain conflict issues through innovative knowledge fission and fusion mechanisms, achieving significant performance improvements while maintaining computational efficiency. Despite certain limitations, its core ideas and technical innovations provide valuable references for related research.