2025-11-22T20:19:15.981080

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

Wu, Zhao, Chen et al.

Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from minimal examples without forgetting prior knowledge, a task complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods often struggle with generalization due to their reliance on limited datasets. While diffusion models offer a path for data augmentation, their direct application can lead to semantic misalignment or ineffective guidance. This paper introduces Diffusion-Classifier Synergy (DCS), a novel framework that establishes a mutual boosting loop between diffusion model and FSCIL classifier. DCS utilizes a reward-aligned learning strategy, where a dynamic, multi-faceted reward function derived from the classifier's state directs the diffusion model. This reward system operates at two levels: the feature level ensures semantic coherence and diversity using prototype-anchored maximum mean discrepancy and dimension-wise variance matching, while the logits level promotes exploratory image generation and enhances inter-class discriminability through confidence recalibration and cross-session confusion-aware mechanisms. This co-evolutionary process, where generated images refine the classifier and an improved classifier state yields better reward signals, demonstrably achieves state-of-the-art performance on FSCIL benchmarks, significantly enhancing both knowledge retention and new class learning.

academic

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

Basic Information

Paper ID: 2510.03608
Title: Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL
Authors: Ruitao Wu, Yifan Zhao, Guangyao Chen, Jia Li
Category: cs.CV
Conference: NeurIPS 2025
Paper Link: https://arxiv.org/abs/2510.03608

Abstract

Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from extremely limited samples while avoiding catastrophic forgetting of prior knowledge. This task is complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods struggle with generalization due to reliance on limited datasets. While diffusion models offer pathways for data augmentation, direct application may lead to semantic misalignment or ineffective guidance. This paper proposes the Diffusion-Classifier Synergy (DCS) framework, which establishes a mutual boosting loop between diffusion models and FSCIL classifiers. DCS employs reward-aligned learning strategy, guiding the diffusion model through dynamic multifaceted reward functions derived from classifier states. The reward system operates at two levels: at the feature level, prototype-anchored maximum mean discrepancy and dimensional variance matching ensure semantic consistency and diversity; at the logits level, confidence recalibration and cross-session confusion-aware mechanisms promote exploratory image generation and enhance inter-class discriminability. Through this co-evolutionary process, generated images optimize the classifier, while improved classifier states produce better reward signals, achieving state-of-the-art performance on FSCIL benchmarks with significant improvements in knowledge retention and new class learning.

Research Background and Motivation

Problem Definition

Few-Shot Class-Incremental Learning (FSCIL) is a highly challenging task requiring models to:

Sequential Learning: Learn new classes from continuous data streams
Few-Shot Constraint: New classes have only limited training samples (typically 5-shot)
Avoid Forgetting: Maintain knowledge of previously learned classes

Core Challenges

Stability-Plasticity Dilemma: Balancing between learning new knowledge and retaining old knowledge
Data Scarcity: Extremely limited samples for new classes lead to unreliable empirical risk minimization
Insufficient Generalization: Existing methods over-rely on limited initial datasets

Limitations of Existing Methods

Traditional FSCIL methods suffer from two main issues:

Semantic Misalignment and Insufficient Diversity: Images generated directly by diffusion models may exhibit semantic bias or limited diversity
Missing Feedback Mechanism: Lack of mechanism for diffusion models to adjust outputs based on current classifier state

Core Contributions

Proposes DCS Framework: First to establish a mutual boosting loop between diffusion models and FSCIL classifiers, implementing reward-aligned generation through the DAS algorithm
Multi-Level Reward Design: Designs multifaceted reward functions operating at feature and logits levels
- Feature Level: Ensures semantic consistency and promotes intra-class diversity
- Logits Level: Guides generation of exploratory, generalizable intra-class images and enhances inter-class discriminability
State-of-the-Art Performance: Achieves state-of-the-art results on FSCIL benchmark datasets with significant improvements in old class retention and new class learning

Method Details

Task Definition

FSCIL involves sequential learning from continuous data stream $D_{train} = \{D^t_{train}\}^T_{t=0}$ , where:

Each session $t$ introduces training samples $(x_i, y_i)$ from new disjoint class set $C_t$
Base session $(t=0)$ has abundant data, while incremental sessions $(t>0)$ follow N-way K-shot format
After training on $D^t_{train}$ , the model must be evaluated on all seen classes $C^t_{seen} = \bigcup^t_{s=0} C_s$

Model Architecture

Mutual Boosting Loop Mechanism

The core idea of DCS is establishing bidirectional feedback between diffusion model and classifier:

Reward Computation: Compute multiple reward components $R_i$ based on classifier $\sigma$ (parameters $\theta$ ) outputs on generated images $x$
Diffusion Model Optimization: $\phi^* = \arg\max_\phi \sum_i R_i(\sigma_\theta(D(x;\phi)))$
Classifier Improvement: $\theta^* = \arg\min_\theta L_{cls}(\sigma_\theta; x \cup D(x;\phi^*), y)$

Feature-Level Reward Design

1. Prototype-Anchored Maximum Mean Discrepancy Reward (R_PAMMD) $R_{PAMMD}(x_{gen}, I^{(c,N)}_{gen}) = -\alpha \frac{1}{N^2}\sum_{i=1}^N\sum_{j=1}^N k(z_i,z_j) + \beta \frac{1}{N}\sum_{i=1}^N k(z_i,\mu_c)$

Where:

First term (diversity): Encourages differences among generated images
Second term (consistency): Ensures semantic consistency with class prototype
$k(\cdot,\cdot)$ is a positive definite kernel function, $\mu_c$ is the class prototype

2. Variance Matching Reward (R_VM) $R_{VM}(x_{gen}, I^{(c,N)}_{gen}) = -\sum_{d=1}^D (v^d_{gen} - v^d_{real})^2$

Maintains feature distribution consistency by matching variance of generated images with real images across dimensions.

Logits-Level Reward Design

1. Recalibrated Confidence Reward (R_RC) $R_{RC}(x_{gen}, y_c) = \log(\hat{p}(y_c|x_{gen};T))$

Where temperature parameter $T$ is adaptively adjusted based on classifier's raw confidence: $T(x_{gen}) = T_{base} + T_{scale} \cdot \frac{\hat{p}_c(y_c|x_{gen}) - 1/N_c}{1 - 1/N_c}$

2. Cross-Session Confusion-Aware Reward (R_CSCA) $R_{CSCA}(x_{gen}, y_c) = \sum_{y \in C} w_y(x_{gen}) \log(\hat{p}(y|x_{gen};T_s))$

Where dynamic weights are: $w_{y_t}(x_{gen}) = \frac{1}{1 + \gamma \cdot d_{cos}(x_{gen}, \mu_t)}$

Technical Innovations

Bidirectional Feedback Mechanism: First to achieve co-evolution between diffusion model and classifier
Multi-Level Reward Design: Simultaneous optimization in both feature and decision spaces
Adaptive Temperature Adjustment: Dynamically adjusts reward smoothness based on classifier confidence
Confusion-Aware Generation: Actively generates hard samples to enhance inter-class discriminability

Experimental Setup

Datasets

CIFAR-100: 60 base classes, 40 incremental classes (8-way 5-shot)
miniImageNet: 60 base classes, 40 incremental classes (8-way 5-shot)
CUB-200: 100 base classes, 40 incremental classes (10-way 5-shot)

Evaluation Metrics

Session Accuracy: Model performance within specific learning sessions
Average Accuracy: Mean accuracy across all sessions from initial to current

Baseline Methods

Includes mainstream FSCIL methods: TOPIC, CEC, FACT, TEEN, SAVC, DyCR, ALFSCIL, OrCo, ADBS, etc.

Implementation Details

Diffusion Model: Stable Diffusion 3.5 Medium
Image Generation: 30 images per class in base session, 30 per new class/10 per old class in new sessions
Backbone Network: ResNet-18 (CUB-200), ResNet-12 (miniImageNet, CIFAR-100)
Optimizer: SGD with momentum 0.9, weight decay 0.0005

Experimental Results

Main Results

miniImageNet Dataset:

DCS average accuracy: 68.14%
Best baseline (OrCo): 66.90%
Improvement: +1.24%

CUB-200 Dataset:

DCS average accuracy: 69.73%
Best baseline (SAVC): 69.35%
Improvement: +0.38%

CIFAR-100 Dataset:

DCS average accuracy: 66.36%
Best baseline (ALFSCIL): 66.75%

Ablation Study

Ablation studies on CIFAR-100 show component contributions:

R_PAMMD only: +1.24%
+R_VM: +1.86%
+R_RC: +3.50%
+R_CSCA (complete DCS): +5.64%

Results indicate logits-level rewards are more critical for performance improvement.

Generation Quality Analysis

FID Improvement: Feature-level rewards significantly improve FID and CLIP scores
CLIP Score Enhancement: R_RC achieves best CLIP scores
Strategic Degradation: R_CSCA intentionally reduces generation quality to produce hard samples near decision boundaries

Experimental Findings

Efficiency Advantage: DCS achieves large-scale generation performance with minimal generated images
Component Synergy: All reward components contribute positively to final performance
Cross-Dataset Consistency: Reward design shows consistent performance across different datasets

Class-Incremental Learning

Data Replay Methods: Store or generate previous task data
Network Expansion Methods: Dynamically adjust model architecture
Parameter Regularization Methods: Adjust parameters with fixed network structure

Few-Shot Class-Incremental Learning

Dynamic Network Methods: Maintain feature space relationships through architecture adjustment
Meta-Learning Methods: Introduce meta-learning concepts
Feature Space Methods: Enhance feature space robustness through virtual class instances
Pre-trained Model Methods: Leverage vision-language models like CLIP

Diffusion Models for Image Classification

Large-Scale Data Augmentation: Synthesize additional training data to improve classifiers
Conditioning Mechanisms: Enhance semantic control and sample diversity
Domain-Specific Applications: Few-shot learning or continual learning scenarios

Conclusions and Discussion

Main Conclusions

DCS successfully establishes synergistic mechanisms between diffusion models and FSCIL classifiers
Multi-level reward design effectively addresses semantic alignment and diversity issues
Achieves state-of-the-art performance on standard FSCIL benchmarks

Limitations

Dependence on Pre-trained Models: Performance relies on high-quality pre-trained diffusion models
Domain-Specific Constraints: Performance may degrade in specialized domains with insufficient diffusion model training data
Computational Complexity: Multi-component reward systems and iterative boosting loops increase tuning and computational burden

Future Directions

Explore more efficient reward computation methods
Investigate applicability in more specialized domains
Develop lightweight framework variants

In-Depth Evaluation

Strengths

Strong Novelty: First to propose mutual boosting mechanism between diffusion models and classifiers with novel concepts
Well-Designed Techniques: Multi-level reward design is comprehensive with solid theoretical foundation
Comprehensive Experiments: Full evaluation on multiple standard datasets with detailed ablation studies
Significant Performance Gains: Notable improvements on challenging FSCIL tasks

Weaknesses

Computational Overhead: Generation process and multiple reward computations increase training time and resource requirements
Hyperparameter Sensitivity: Multiple reward component weights require careful tuning
Limited Generalization Verification: Primarily validated in computer vision; applicability to other domains unknown
Limited Theoretical Analysis: Lacks theoretical guarantees on convergence and stability

Impact

Academic Value: Provides new research directions and technical pathways for FSCIL field
Practical Value: Applicable potential in resource-constrained continual learning scenarios
Reproducibility: Provides detailed implementation details and hyperparameter settings

Applicable Scenarios

Continual Learning Systems: Applications requiring continuous learning of new classes
Resource-Constrained Environments: Scenarios where storing large amounts of historical data is infeasible
Few-Shot Learning: Domain applications with scarce new class samples

References

The paper cites 82 relevant references covering multiple related fields including class-incremental learning, few-shot learning, and diffusion models, providing solid theoretical foundation and technical support for the research.