2025-11-24T09:58:18.212416

Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation

Zhou, Zhu, Cui et al.
Continual Test-Time Adaptation (CTTA) aims to quickly fine-tune the model during the test phase so that it can adapt to multiple unknown downstream domain distributions without pre-acquiring downstream domain data. To this end, existing advanced CTTA methods mainly reduce the catastrophic forgetting of historical knowledge caused by irregular switching of downstream domain data by restoring the initial model or reusing historical models. However, these methods are usually accompanied by serious insufficient learning of new knowledge and interference from potentially harmful historical knowledge, resulting in severe performance degradation. To this end, we propose a class-aware domain Knowledge Fusion and Fission method for continual test-time adaptation, called KFF, which adaptively expands and merges class-aware domain knowledge in old and new domains according to the test-time data from different domains, where discriminative historical knowledge can be dynamically accumulated. Specifically, considering the huge domain gap within streaming data, a domain Knowledge FIssion (KFI) module is designed to adaptively separate new domain knowledge from a paired class-aware domain prompt pool, alleviating the impact of negative knowledge brought by old domains that are distinct from the current domain. Besides, to avoid the cumulative computation and storage overheads from continuously fissioning new knowledge, a domain Knowledge FUsion (KFU) module is further designed to merge the fissioned new knowledge into the existing knowledge pool with minimal cost, where a greedy knowledge dynamic merging strategy is designed to improve the compatibility of new and old knowledge while keeping the computational efficiency. Extensive experiments on the ImageNet-C dataset verify the effectiveness of our proposed method against other methods.
academic

Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation

Basic Information

  • Paper ID: 2510.12150
  • Title: Class-aware Domain Knowledge Fusion and Fission for Continual Test-Time Adaptation
  • Authors: Jiahuan Zhou, Chao Zhu, Zhenyu Cui, Zichen Liu, Xu Zou, Gang Hua
  • Category: cs.CV (Computer Vision)
  • Conference: NeurIPS 2025 (39th Conference on Neural Information Processing Systems)
  • Paper Link: https://arxiv.org/abs/2510.12150

Abstract

This paper proposes KFF, a class-aware domain knowledge fusion and fission method for addressing the Continual Test-Time Adaptation (CTTA) problem. The method adaptively separates new domain knowledge through a Knowledge Fission (KFI) module to avoid negative interference from historical domains, and merges the fissioned new knowledge into the existing knowledge pool at minimal cost through a Knowledge Fusion (KFU) module. Experiments on the ImageNet-C dataset demonstrate a 5.1% performance improvement over the state-of-the-art method DPCore.

Research Background and Motivation

Problem Definition

Continual Test-Time Adaptation (CTTA) aims to enable pre-trained models to rapidly adapt to multiple unknown downstream domain distributions during the test phase without prior access to downstream domain data. This presents a more challenging problem compared to traditional Test-Time Adaptation (TTA).

Core Challenges

  1. Catastrophic Forgetting: Irregular domain data switching leads to catastrophic forgetting of historical knowledge
  2. Insufficient New Knowledge Learning: Existing methods often fail to sufficiently learn new knowledge while preserving historical knowledge
  3. Harmful Historical Knowledge Interference: Knowledge conflicts between different domains disrupt gradient optimization directions

Limitations of Existing Methods

  • Regularization Methods: Preserve historical knowledge through regularization but suppress new knowledge learning
  • Parameter Reset Methods: Avoid forgetting by restoring initial models but lose useful historical knowledge
  • Model Fusion Methods: Select and fuse historical model parameters but suffer from domain conflicts and unbounded storage overhead

Core Contributions

  1. Proposes KFF Framework: The first class-aware domain knowledge fusion and fission framework capable of dynamically accumulating discriminative historical knowledge
  2. Designs KFI Module: A knowledge fission module that adaptively separates new domain knowledge, reducing negative knowledge interference across domains
  3. Develops KFU Module: A knowledge fusion module that merges knowledge through greedy strategies, balancing effectiveness and efficiency
  4. Achieves State-of-the-Art Performance: Achieves 34.8% error rate on ImageNet-C, a 5.1% improvement over DPCore
  5. Provides Theoretical Analysis: Theoretical guarantees based on well-separated clustering assumptions

Method Details

Task Definition

Given source domain training data DS={YS,XS}D_S = \{Y_S, X_S\} and test data streams from different domain distributions DT={XT}T=1ND_T = \{X_T\}_{T=1}^N, the model fθf_θ must process test batches BTj={xt}t=0bB_T^j = \{x_t\}_{t=0}^b online, with the objective of adapting to target domains while maintaining performance on historical domains.

Model Architecture

Overall Framework

The KFF framework contains two core modules:

  • Knowledge Fission (KFI) Module: Dynamically fissions class-aware domain knowledge
  • Knowledge Fusion (KFU) Module: Merges fissioned knowledge into the existing knowledge pool

Knowledge Fission Module (KFI)

Class-level Knowledge Fission:

  • Uses cosine similarity st,i=sim(y~t,yi)s_{t,i} = \text{sim}(\tilde{y}_t, y_i) to evaluate the matching degree between pseudo-labels and prompt keys
  • Selects candidate prompts with st,i>γcs_{t,i} > γ_c and uses them in a weighted manner:
P_t = Σ_{i=0}^{N_c} w_i P_i^c, w_i = exp(s_{t,i}/τ_c) / Σ exp(s_{t,i}/τ_c)
  • If no candidate prompts exist, fissions new prompts for test samples

Domain-level Knowledge Fission:

  • Uses test batch statistical features ΓTj={μ,σ}Γ_T^j = \{μ, σ\} as input keys
  • Selects candidate prompts based on Euclidean distance: di=ΓTjΓi2<γdd_i = \|Γ_T^j - Γ_i\|_2 < γ_d
  • Merges through distance-weighted combination:
P^d = Σ_{i=0}^{N_d} w_i P_i^d, w_i = exp(-d_i/τ_d) / Σ exp(-d_i/τ_d)

Knowledge Fusion Module (KFU)

Class-level Knowledge Fusion:

  • Uses entropy threshold γhγ_h to control prompt pool updates
  • Directly adds newly fissioned prompts to the pool
  • For combined prompts, updates original prompts by weight:
P_{c_i}^* = (1/b) Σ_{t=0}^b [w_{ti} P_t^* + (1-w_{ti}) P_i^c]
  • Uses Minimum Spanning Tree (MST) algorithm to cluster and fuse prompts to control pool size

Domain-level Knowledge Fusion:

  • Directly adds new prompts to the domain prompt pool
  • Updates combined prompts by weight: Pdi=wiPd+(1wi)PidP_{d_i}^* = w_i P_d^* + (1-w_i) P_i^d
  • Fuses nearest neighbor prompt pairs when pool is full

Loss Function Design

Employs a two-level loss function:

L = L_d + a·L_c

Where:

  • Domain alignment loss: Ld=μsμTj(P)2+ασsσTj(P)2L_d = \|μ_s - μ_T^j(P)\|_2 + α\|σ_s - σ_T^j(P)\|_2
  • Instance-level entropy loss: Lc=(1/b)Σt=0bH(y^t)L_c = (1/b) Σ_{t=0}^b H(\hat{y}_t)

Experimental Setup

Datasets

  • ImageNet-to-ImageNet-C: 15 corruption types at maximum severity level 5
  • CIFAR100-to-CIFAR100-C: Same setup
  • CIFAR10-to-CIFAR10-C: Same setup

Evaluation Metrics

  • Classification error rate (%) as primary metric
  • Number of learnable parameters, memory usage, and computation time as efficiency metrics

Comparison Methods

  • TTA Methods: TENT, SAR, POEM
  • CTTA Methods: CoTTA, VDP, RoTTA, C-MAE, ROID, ViDA, CoLA, PALM, DPCore

Implementation Details

  • Backbone Network: ViT-B/16
  • Optimizer: AdamW with domain prompt learning rate 0.1 and class prompt learning rate 0.001
  • Batch Size: 64
  • Domain prompt length: 8, class prompt length: 1
  • Key Hyperparameters: γd=25,γc=0.005,γh=2,Nd=20,Nc=100γ_d=25, γ_c=0.005, γ_h=2, N_d=20, N_c=100

Experimental Results

Main Results

Non-Repeating Domain Setting:

  • ImageNet-C: 34.8% vs DPCore's 39.9%, improvement of 5.1%
  • CIFAR100-C: 22.5% vs DPCore's 25.1%, improvement of 2.6%
  • CIFAR10-C: 12.4% vs DPCore's 15.4%, improvement of 3.0%

Repeating Domain Setting (10 rounds):

  • ImageNet-C average error rate: 34.5% vs DPCore's 44.4%, improvement of 9.9%
  • Performance remains stable across multiple rounds, validating method robustness

Efficiency Analysis

  • Introduces only 0.09M learnable parameters (approximately 0.1% of total model parameters)
  • In repeating domain settings, DPCore uses approximately 5 times more parameters than this method by round 10
  • Computational overhead comparable to DPCore but with significantly superior performance

Ablation Study

Component contribution analysis:

  • Domain prompts only + KFI + KFU: 39.5%
  • Class prompts only + KFI + KFU: 50.9%
  • Dual prompts without KFI + KFU: 62.9% (severe performance degradation)
  • Dual prompts + KFI without KFU: 36.9%
  • Complete method: 34.8%

Results demonstrate that each component is indispensable, with the KFI module being most critical for performance improvement.

Visualization Analysis

  • Attention Map Analysis: The method concentrates attention on discriminative regions related to classes
  • t-SNE Analysis: Domain prompt keys and test batch statistical features form well-separated clusters
  • Class Distribution Analysis: Class prompts effectively map different classes to corresponding prompts

Theoretical Analysis

Well-Separated Clustering Assumption

Assumes test batches can be naturally partitioned into N well-separated clusters based on feature representations, with a threshold θ such that:

∀i≠j, max_{B,B'∈C_i} d(B,B') < θ < min_{B∈C_i,B'∈C_j} d(B,B')

Theoretical Guarantees

Lemma A.1: The KFI mechanism correctly assigns all batches to prompts of the same cluster Lemma A.2: The KFU mechanism only fuses prompts within the same cluster Proposition A.3: The KFF method correctly assigns all batches to prompts of the same cluster

Theoretical analysis guarantees method correctness, with t-SNE visualizations in experiments validating theoretical assumptions.

Test-Time Adaptation (TTA)

  • Early methods primarily use self-supervised losses such as entropy minimization and consistency maximization
  • Limitations: Assume static target domains and cannot handle dynamic domain shifts

Continual Test-Time Adaptation (CTTA)

  • Regularization Methods: EATA and EcoTTA mitigate error accumulation through regularization
  • Reset Methods: ERSK and CoTTA use weight resets to combat catastrophic forgetting
  • Prompt Learning Methods: VDP, SVDP, and DPCore leverage minimal parameters to learn domain-specific knowledge

Prompt Learning

  • Extended from NLP to computer vision
  • Existing methods primarily focus on domain-level knowledge, overlooking class-level information shared across domains

Conclusions and Discussion

Main Conclusions

  1. The KFF framework effectively addresses domain conflict issues in CTTA
  2. Class-aware design better leverages cross-domain shared knowledge
  3. Knowledge fission and fusion mechanisms balance effectiveness and efficiency
  4. Achieves significant performance improvements across multiple benchmark datasets

Limitations

  1. Source Domain Dependency: Requires access to source domain statistics, presenting challenges in privacy-constrained scenarios
  2. Synthetic Corruptions: Primarily validated on artificially designed corruptions; robustness to real-world distribution shifts requires further verification
  3. Computational Overhead: While relatively efficient, remains challenging on resource-constrained devices
  4. Hyperparameter Sensitivity: Requires careful tuning of key hyperparameters for different datasets

Future Directions

  1. Explore adaptation methods without source domain statistical information
  2. Validate method robustness on real-world datasets
  3. Further optimize computational efficiency
  4. Investigate adaptive hyperparameter adjustment mechanisms

In-Depth Evaluation

Strengths

  1. Strong Novelty: First to propose class-aware knowledge fission and fusion framework, addressing the important domain conflict problem
  2. Theoretical Support: Provides theoretical analysis based on well-separated clustering assumptions
  3. Comprehensive Experiments: Conducts thorough comparative experiments and ablation studies across multiple datasets
  4. Superior Efficiency: Achieves best performance while maintaining computational efficiency
  5. Clear Visualization: Provides intuitive method explanations through attention maps and t-SNE visualizations

Weaknesses

  1. Assumption Limitations: Well-separated clustering assumptions may not always hold in practical applications
  2. Evaluation Limitations: Primarily evaluated on synthetic corruptions, lacking validation in real-world scenarios
  3. Source Domain Dependency: Requirement for source domain statistics limits method applicability
  4. Hyperparameter Complexity: Involves multiple hyperparameters requiring careful tuning

Impact

  1. Academic Contribution: Provides new perspectives for CTTA research, expected to attract widespread attention
  2. Practical Value: Has application potential in scenarios requiring continuous adaptation such as autonomous driving and medical imaging
  3. Reproducibility: Authors commit to open-sourcing code, facilitating method dissemination

Applicable Scenarios

  • Computer vision tasks requiring continuous adaptation to multiple domain shifts
  • Edge computing scenarios with parameter efficiency requirements
  • Applications with access to limited source domain statistics
  • Structured environments with relatively predictable domain changes

This paper makes important contributions to the CTTA field, effectively addressing domain conflict issues through innovative knowledge fission and fusion mechanisms, achieving significant performance improvements while maintaining computational efficiency. Despite certain limitations, its core ideas and technical innovations provide valuable references for related research.