2025-11-22T20:19:15.981080

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

Wu, Zhao, Chen et al.
Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from minimal examples without forgetting prior knowledge, a task complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods often struggle with generalization due to their reliance on limited datasets. While diffusion models offer a path for data augmentation, their direct application can lead to semantic misalignment or ineffective guidance. This paper introduces Diffusion-Classifier Synergy (DCS), a novel framework that establishes a mutual boosting loop between diffusion model and FSCIL classifier. DCS utilizes a reward-aligned learning strategy, where a dynamic, multi-faceted reward function derived from the classifier's state directs the diffusion model. This reward system operates at two levels: the feature level ensures semantic coherence and diversity using prototype-anchored maximum mean discrepancy and dimension-wise variance matching, while the logits level promotes exploratory image generation and enhances inter-class discriminability through confidence recalibration and cross-session confusion-aware mechanisms. This co-evolutionary process, where generated images refine the classifier and an improved classifier state yields better reward signals, demonstrably achieves state-of-the-art performance on FSCIL benchmarks, significantly enhancing both knowledge retention and new class learning.
academic

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

Basic Information

  • Paper ID: 2510.03608
  • Title: Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL
  • Authors: Ruitao Wu, Yifan Zhao, Guangyao Chen, Jia Li
  • Category: cs.CV
  • Conference: NeurIPS 2025
  • Paper Link: https://arxiv.org/abs/2510.03608

Abstract

Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from extremely limited samples while avoiding catastrophic forgetting of prior knowledge. This task is complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods struggle with generalization due to reliance on limited datasets. While diffusion models offer pathways for data augmentation, direct application may lead to semantic misalignment or ineffective guidance. This paper proposes the Diffusion-Classifier Synergy (DCS) framework, which establishes a mutual boosting loop between diffusion models and FSCIL classifiers. DCS employs reward-aligned learning strategy, guiding the diffusion model through dynamic multifaceted reward functions derived from classifier states. The reward system operates at two levels: at the feature level, prototype-anchored maximum mean discrepancy and dimensional variance matching ensure semantic consistency and diversity; at the logits level, confidence recalibration and cross-session confusion-aware mechanisms promote exploratory image generation and enhance inter-class discriminability. Through this co-evolutionary process, generated images optimize the classifier, while improved classifier states produce better reward signals, achieving state-of-the-art performance on FSCIL benchmarks with significant improvements in knowledge retention and new class learning.

Research Background and Motivation

Problem Definition

Few-Shot Class-Incremental Learning (FSCIL) is a highly challenging task requiring models to:

  1. Sequential Learning: Learn new classes from continuous data streams
  2. Few-Shot Constraint: New classes have only limited training samples (typically 5-shot)
  3. Avoid Forgetting: Maintain knowledge of previously learned classes

Core Challenges

  1. Stability-Plasticity Dilemma: Balancing between learning new knowledge and retaining old knowledge
  2. Data Scarcity: Extremely limited samples for new classes lead to unreliable empirical risk minimization
  3. Insufficient Generalization: Existing methods over-rely on limited initial datasets

Limitations of Existing Methods

Traditional FSCIL methods suffer from two main issues:

  1. Semantic Misalignment and Insufficient Diversity: Images generated directly by diffusion models may exhibit semantic bias or limited diversity
  2. Missing Feedback Mechanism: Lack of mechanism for diffusion models to adjust outputs based on current classifier state

Core Contributions

  1. Proposes DCS Framework: First to establish a mutual boosting loop between diffusion models and FSCIL classifiers, implementing reward-aligned generation through the DAS algorithm
  2. Multi-Level Reward Design: Designs multifaceted reward functions operating at feature and logits levels
    • Feature Level: Ensures semantic consistency and promotes intra-class diversity
    • Logits Level: Guides generation of exploratory, generalizable intra-class images and enhances inter-class discriminability
  3. State-of-the-Art Performance: Achieves state-of-the-art results on FSCIL benchmark datasets with significant improvements in old class retention and new class learning

Method Details

Task Definition

FSCIL involves sequential learning from continuous data stream Dtrain={Dtraint}t=0TD_{train} = \{D^t_{train}\}^T_{t=0}, where:

  • Each session tt introduces training samples (xi,yi)(x_i, y_i) from new disjoint class set CtC_t
  • Base session (t=0)(t=0) has abundant data, while incremental sessions (t>0)(t>0) follow N-way K-shot format
  • After training on DtraintD^t_{train}, the model must be evaluated on all seen classes Cseent=s=0tCsC^t_{seen} = \bigcup^t_{s=0} C_s

Model Architecture

Mutual Boosting Loop Mechanism

The core idea of DCS is establishing bidirectional feedback between diffusion model and classifier:

  1. Reward Computation: Compute multiple reward components RiR_i based on classifier σ\sigma (parameters θ\theta) outputs on generated images xx
  2. Diffusion Model Optimization: ϕ=argmaxϕiRi(σθ(D(x;ϕ)))\phi^* = \arg\max_\phi \sum_i R_i(\sigma_\theta(D(x;\phi)))
  3. Classifier Improvement: θ=argminθLcls(σθ;xD(x;ϕ),y)\theta^* = \arg\min_\theta L_{cls}(\sigma_\theta; x \cup D(x;\phi^*), y)

Feature-Level Reward Design

1. Prototype-Anchored Maximum Mean Discrepancy Reward (R_PAMMD)RPAMMD(xgen,Igen(c,N))=α1N2i=1Nj=1Nk(zi,zj)+β1Ni=1Nk(zi,μc)R_{PAMMD}(x_{gen}, I^{(c,N)}_{gen}) = -\alpha \frac{1}{N^2}\sum_{i=1}^N\sum_{j=1}^N k(z_i,z_j) + \beta \frac{1}{N}\sum_{i=1}^N k(z_i,\mu_c)

Where:

  • First term (diversity): Encourages differences among generated images
  • Second term (consistency): Ensures semantic consistency with class prototype
  • k(,)k(\cdot,\cdot) is a positive definite kernel function, μc\mu_c is the class prototype

2. Variance Matching Reward (R_VM)RVM(xgen,Igen(c,N))=d=1D(vgendvreald)2R_{VM}(x_{gen}, I^{(c,N)}_{gen}) = -\sum_{d=1}^D (v^d_{gen} - v^d_{real})^2

Maintains feature distribution consistency by matching variance of generated images with real images across dimensions.

Logits-Level Reward Design

1. Recalibrated Confidence Reward (R_RC)RRC(xgen,yc)=log(p^(ycxgen;T))R_{RC}(x_{gen}, y_c) = \log(\hat{p}(y_c|x_{gen};T))

Where temperature parameter TT is adaptively adjusted based on classifier's raw confidence: T(xgen)=Tbase+Tscalep^c(ycxgen)1/Nc11/NcT(x_{gen}) = T_{base} + T_{scale} \cdot \frac{\hat{p}_c(y_c|x_{gen}) - 1/N_c}{1 - 1/N_c}

2. Cross-Session Confusion-Aware Reward (R_CSCA)RCSCA(xgen,yc)=yCwy(xgen)log(p^(yxgen;Ts))R_{CSCA}(x_{gen}, y_c) = \sum_{y \in C} w_y(x_{gen}) \log(\hat{p}(y|x_{gen};T_s))

Where dynamic weights are: wyt(xgen)=11+γdcos(xgen,μt)w_{y_t}(x_{gen}) = \frac{1}{1 + \gamma \cdot d_{cos}(x_{gen}, \mu_t)}

Technical Innovations

  1. Bidirectional Feedback Mechanism: First to achieve co-evolution between diffusion model and classifier
  2. Multi-Level Reward Design: Simultaneous optimization in both feature and decision spaces
  3. Adaptive Temperature Adjustment: Dynamically adjusts reward smoothness based on classifier confidence
  4. Confusion-Aware Generation: Actively generates hard samples to enhance inter-class discriminability

Experimental Setup

Datasets

  • CIFAR-100: 60 base classes, 40 incremental classes (8-way 5-shot)
  • miniImageNet: 60 base classes, 40 incremental classes (8-way 5-shot)
  • CUB-200: 100 base classes, 40 incremental classes (10-way 5-shot)

Evaluation Metrics

  • Session Accuracy: Model performance within specific learning sessions
  • Average Accuracy: Mean accuracy across all sessions from initial to current

Baseline Methods

Includes mainstream FSCIL methods: TOPIC, CEC, FACT, TEEN, SAVC, DyCR, ALFSCIL, OrCo, ADBS, etc.

Implementation Details

  • Diffusion Model: Stable Diffusion 3.5 Medium
  • Image Generation: 30 images per class in base session, 30 per new class/10 per old class in new sessions
  • Backbone Network: ResNet-18 (CUB-200), ResNet-12 (miniImageNet, CIFAR-100)
  • Optimizer: SGD with momentum 0.9, weight decay 0.0005

Experimental Results

Main Results

miniImageNet Dataset:

  • DCS average accuracy: 68.14%
  • Best baseline (OrCo): 66.90%
  • Improvement: +1.24%

CUB-200 Dataset:

  • DCS average accuracy: 69.73%
  • Best baseline (SAVC): 69.35%
  • Improvement: +0.38%

CIFAR-100 Dataset:

  • DCS average accuracy: 66.36%
  • Best baseline (ALFSCIL): 66.75%

Ablation Study

Ablation studies on CIFAR-100 show component contributions:

  • R_PAMMD only: +1.24%
  • +R_VM: +1.86%
  • +R_RC: +3.50%
  • +R_CSCA (complete DCS): +5.64%

Results indicate logits-level rewards are more critical for performance improvement.

Generation Quality Analysis

  • FID Improvement: Feature-level rewards significantly improve FID and CLIP scores
  • CLIP Score Enhancement: R_RC achieves best CLIP scores
  • Strategic Degradation: R_CSCA intentionally reduces generation quality to produce hard samples near decision boundaries

Experimental Findings

  1. Efficiency Advantage: DCS achieves large-scale generation performance with minimal generated images
  2. Component Synergy: All reward components contribute positively to final performance
  3. Cross-Dataset Consistency: Reward design shows consistent performance across different datasets

Class-Incremental Learning

  • Data Replay Methods: Store or generate previous task data
  • Network Expansion Methods: Dynamically adjust model architecture
  • Parameter Regularization Methods: Adjust parameters with fixed network structure

Few-Shot Class-Incremental Learning

  • Dynamic Network Methods: Maintain feature space relationships through architecture adjustment
  • Meta-Learning Methods: Introduce meta-learning concepts
  • Feature Space Methods: Enhance feature space robustness through virtual class instances
  • Pre-trained Model Methods: Leverage vision-language models like CLIP

Diffusion Models for Image Classification

  • Large-Scale Data Augmentation: Synthesize additional training data to improve classifiers
  • Conditioning Mechanisms: Enhance semantic control and sample diversity
  • Domain-Specific Applications: Few-shot learning or continual learning scenarios

Conclusions and Discussion

Main Conclusions

  1. DCS successfully establishes synergistic mechanisms between diffusion models and FSCIL classifiers
  2. Multi-level reward design effectively addresses semantic alignment and diversity issues
  3. Achieves state-of-the-art performance on standard FSCIL benchmarks

Limitations

  1. Dependence on Pre-trained Models: Performance relies on high-quality pre-trained diffusion models
  2. Domain-Specific Constraints: Performance may degrade in specialized domains with insufficient diffusion model training data
  3. Computational Complexity: Multi-component reward systems and iterative boosting loops increase tuning and computational burden

Future Directions

  1. Explore more efficient reward computation methods
  2. Investigate applicability in more specialized domains
  3. Develop lightweight framework variants

In-Depth Evaluation

Strengths

  1. Strong Novelty: First to propose mutual boosting mechanism between diffusion models and classifiers with novel concepts
  2. Well-Designed Techniques: Multi-level reward design is comprehensive with solid theoretical foundation
  3. Comprehensive Experiments: Full evaluation on multiple standard datasets with detailed ablation studies
  4. Significant Performance Gains: Notable improvements on challenging FSCIL tasks

Weaknesses

  1. Computational Overhead: Generation process and multiple reward computations increase training time and resource requirements
  2. Hyperparameter Sensitivity: Multiple reward component weights require careful tuning
  3. Limited Generalization Verification: Primarily validated in computer vision; applicability to other domains unknown
  4. Limited Theoretical Analysis: Lacks theoretical guarantees on convergence and stability

Impact

  1. Academic Value: Provides new research directions and technical pathways for FSCIL field
  2. Practical Value: Applicable potential in resource-constrained continual learning scenarios
  3. Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

  1. Continual Learning Systems: Applications requiring continuous learning of new classes
  2. Resource-Constrained Environments: Scenarios where storing large amounts of historical data is infeasible
  3. Few-Shot Learning: Domain applications with scarce new class samples

References

The paper cites 82 relevant references covering multiple related fields including class-incremental learning, few-shot learning, and diffusion models, providing solid theoretical foundation and technical support for the research.