Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from minimal examples without forgetting prior knowledge, a task complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods often struggle with generalization due to their reliance on limited datasets. While diffusion models offer a path for data augmentation, their direct application can lead to semantic misalignment or ineffective guidance. This paper introduces Diffusion-Classifier Synergy (DCS), a novel framework that establishes a mutual boosting loop between diffusion model and FSCIL classifier. DCS utilizes a reward-aligned learning strategy, where a dynamic, multi-faceted reward function derived from the classifier's state directs the diffusion model. This reward system operates at two levels: the feature level ensures semantic coherence and diversity using prototype-anchored maximum mean discrepancy and dimension-wise variance matching, while the logits level promotes exploratory image generation and enhances inter-class discriminability through confidence recalibration and cross-session confusion-aware mechanisms. This co-evolutionary process, where generated images refine the classifier and an improved classifier state yields better reward signals, demonstrably achieves state-of-the-art performance on FSCIL benchmarks, significantly enhancing both knowledge retention and new class learning.
- Paper ID: 2510.03608
- Title: Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL
- Authors: Ruitao Wu, Yifan Zhao, Guangyao Chen, Jia Li
- Category: cs.CV
- Conference: NeurIPS 2025
- Paper Link: https://arxiv.org/abs/2510.03608
Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from extremely limited samples while avoiding catastrophic forgetting of prior knowledge. This task is complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods struggle with generalization due to reliance on limited datasets. While diffusion models offer pathways for data augmentation, direct application may lead to semantic misalignment or ineffective guidance. This paper proposes the Diffusion-Classifier Synergy (DCS) framework, which establishes a mutual boosting loop between diffusion models and FSCIL classifiers. DCS employs reward-aligned learning strategy, guiding the diffusion model through dynamic multifaceted reward functions derived from classifier states. The reward system operates at two levels: at the feature level, prototype-anchored maximum mean discrepancy and dimensional variance matching ensure semantic consistency and diversity; at the logits level, confidence recalibration and cross-session confusion-aware mechanisms promote exploratory image generation and enhance inter-class discriminability. Through this co-evolutionary process, generated images optimize the classifier, while improved classifier states produce better reward signals, achieving state-of-the-art performance on FSCIL benchmarks with significant improvements in knowledge retention and new class learning.
Few-Shot Class-Incremental Learning (FSCIL) is a highly challenging task requiring models to:
- Sequential Learning: Learn new classes from continuous data streams
- Few-Shot Constraint: New classes have only limited training samples (typically 5-shot)
- Avoid Forgetting: Maintain knowledge of previously learned classes
- Stability-Plasticity Dilemma: Balancing between learning new knowledge and retaining old knowledge
- Data Scarcity: Extremely limited samples for new classes lead to unreliable empirical risk minimization
- Insufficient Generalization: Existing methods over-rely on limited initial datasets
Traditional FSCIL methods suffer from two main issues:
- Semantic Misalignment and Insufficient Diversity: Images generated directly by diffusion models may exhibit semantic bias or limited diversity
- Missing Feedback Mechanism: Lack of mechanism for diffusion models to adjust outputs based on current classifier state
- Proposes DCS Framework: First to establish a mutual boosting loop between diffusion models and FSCIL classifiers, implementing reward-aligned generation through the DAS algorithm
- Multi-Level Reward Design: Designs multifaceted reward functions operating at feature and logits levels
- Feature Level: Ensures semantic consistency and promotes intra-class diversity
- Logits Level: Guides generation of exploratory, generalizable intra-class images and enhances inter-class discriminability
- State-of-the-Art Performance: Achieves state-of-the-art results on FSCIL benchmark datasets with significant improvements in old class retention and new class learning
FSCIL involves sequential learning from continuous data stream Dtrain={Dtraint}t=0T, where:
- Each session t introduces training samples (xi,yi) from new disjoint class set Ct
- Base session (t=0) has abundant data, while incremental sessions (t>0) follow N-way K-shot format
- After training on Dtraint, the model must be evaluated on all seen classes Cseent=⋃s=0tCs
The core idea of DCS is establishing bidirectional feedback between diffusion model and classifier:
- Reward Computation: Compute multiple reward components Ri based on classifier σ (parameters θ) outputs on generated images x
- Diffusion Model Optimization:
ϕ∗=argmaxϕ∑iRi(σθ(D(x;ϕ)))
- Classifier Improvement:
θ∗=argminθLcls(σθ;x∪D(x;ϕ∗),y)
1. Prototype-Anchored Maximum Mean Discrepancy Reward (R_PAMMD)RPAMMD(xgen,Igen(c,N))=−αN21∑i=1N∑j=1Nk(zi,zj)+βN1∑i=1Nk(zi,μc)
Where:
- First term (diversity): Encourages differences among generated images
- Second term (consistency): Ensures semantic consistency with class prototype
- k(⋅,⋅) is a positive definite kernel function, μc is the class prototype
2. Variance Matching Reward (R_VM)RVM(xgen,Igen(c,N))=−∑d=1D(vgend−vreald)2
Maintains feature distribution consistency by matching variance of generated images with real images across dimensions.
1. Recalibrated Confidence Reward (R_RC)RRC(xgen,yc)=log(p^(yc∣xgen;T))
Where temperature parameter T is adaptively adjusted based on classifier's raw confidence:
T(xgen)=Tbase+Tscale⋅1−1/Ncp^c(yc∣xgen)−1/Nc
2. Cross-Session Confusion-Aware Reward (R_CSCA)RCSCA(xgen,yc)=∑y∈Cwy(xgen)log(p^(y∣xgen;Ts))
Where dynamic weights are:
wyt(xgen)=1+γ⋅dcos(xgen,μt)1
- Bidirectional Feedback Mechanism: First to achieve co-evolution between diffusion model and classifier
- Multi-Level Reward Design: Simultaneous optimization in both feature and decision spaces
- Adaptive Temperature Adjustment: Dynamically adjusts reward smoothness based on classifier confidence
- Confusion-Aware Generation: Actively generates hard samples to enhance inter-class discriminability
- CIFAR-100: 60 base classes, 40 incremental classes (8-way 5-shot)
- miniImageNet: 60 base classes, 40 incremental classes (8-way 5-shot)
- CUB-200: 100 base classes, 40 incremental classes (10-way 5-shot)
- Session Accuracy: Model performance within specific learning sessions
- Average Accuracy: Mean accuracy across all sessions from initial to current
Includes mainstream FSCIL methods: TOPIC, CEC, FACT, TEEN, SAVC, DyCR, ALFSCIL, OrCo, ADBS, etc.
- Diffusion Model: Stable Diffusion 3.5 Medium
- Image Generation: 30 images per class in base session, 30 per new class/10 per old class in new sessions
- Backbone Network: ResNet-18 (CUB-200), ResNet-12 (miniImageNet, CIFAR-100)
- Optimizer: SGD with momentum 0.9, weight decay 0.0005
miniImageNet Dataset:
- DCS average accuracy: 68.14%
- Best baseline (OrCo): 66.90%
- Improvement: +1.24%
CUB-200 Dataset:
- DCS average accuracy: 69.73%
- Best baseline (SAVC): 69.35%
- Improvement: +0.38%
CIFAR-100 Dataset:
- DCS average accuracy: 66.36%
- Best baseline (ALFSCIL): 66.75%
Ablation studies on CIFAR-100 show component contributions:
- R_PAMMD only: +1.24%
- +R_VM: +1.86%
- +R_RC: +3.50%
- +R_CSCA (complete DCS): +5.64%
Results indicate logits-level rewards are more critical for performance improvement.
- FID Improvement: Feature-level rewards significantly improve FID and CLIP scores
- CLIP Score Enhancement: R_RC achieves best CLIP scores
- Strategic Degradation: R_CSCA intentionally reduces generation quality to produce hard samples near decision boundaries
- Efficiency Advantage: DCS achieves large-scale generation performance with minimal generated images
- Component Synergy: All reward components contribute positively to final performance
- Cross-Dataset Consistency: Reward design shows consistent performance across different datasets
- Data Replay Methods: Store or generate previous task data
- Network Expansion Methods: Dynamically adjust model architecture
- Parameter Regularization Methods: Adjust parameters with fixed network structure
- Dynamic Network Methods: Maintain feature space relationships through architecture adjustment
- Meta-Learning Methods: Introduce meta-learning concepts
- Feature Space Methods: Enhance feature space robustness through virtual class instances
- Pre-trained Model Methods: Leverage vision-language models like CLIP
- Large-Scale Data Augmentation: Synthesize additional training data to improve classifiers
- Conditioning Mechanisms: Enhance semantic control and sample diversity
- Domain-Specific Applications: Few-shot learning or continual learning scenarios
- DCS successfully establishes synergistic mechanisms between diffusion models and FSCIL classifiers
- Multi-level reward design effectively addresses semantic alignment and diversity issues
- Achieves state-of-the-art performance on standard FSCIL benchmarks
- Dependence on Pre-trained Models: Performance relies on high-quality pre-trained diffusion models
- Domain-Specific Constraints: Performance may degrade in specialized domains with insufficient diffusion model training data
- Computational Complexity: Multi-component reward systems and iterative boosting loops increase tuning and computational burden
- Explore more efficient reward computation methods
- Investigate applicability in more specialized domains
- Develop lightweight framework variants
- Strong Novelty: First to propose mutual boosting mechanism between diffusion models and classifiers with novel concepts
- Well-Designed Techniques: Multi-level reward design is comprehensive with solid theoretical foundation
- Comprehensive Experiments: Full evaluation on multiple standard datasets with detailed ablation studies
- Significant Performance Gains: Notable improvements on challenging FSCIL tasks
- Computational Overhead: Generation process and multiple reward computations increase training time and resource requirements
- Hyperparameter Sensitivity: Multiple reward component weights require careful tuning
- Limited Generalization Verification: Primarily validated in computer vision; applicability to other domains unknown
- Limited Theoretical Analysis: Lacks theoretical guarantees on convergence and stability
- Academic Value: Provides new research directions and technical pathways for FSCIL field
- Practical Value: Applicable potential in resource-constrained continual learning scenarios
- Reproducibility: Provides detailed implementation details and hyperparameter settings
- Continual Learning Systems: Applications requiring continuous learning of new classes
- Resource-Constrained Environments: Scenarios where storing large amounts of historical data is infeasible
- Few-Shot Learning: Domain applications with scarce new class samples
The paper cites 82 relevant references covering multiple related fields including class-incremental learning, few-shot learning, and diffusion models, providing solid theoretical foundation and technical support for the research.