Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance
Zhong, Jiang, Tao et al.
Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. We refer to this misalignment as noise shift. Through empirical analysis, we demonstrate that noise shift is widespread in modern diffusion models and exhibits a systematic bias, leading to sub-optimal generation due to both out-of-distribution generalization and inaccurate denoising updates. To address this problem, we propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule. We further introduce a classifier-free variant of NAG, which jointly trains a noise-conditional and a noise-unconditional model via noise-condition dropout, thereby eliminating the need for external classifiers. Extensive experiments, including ImageNet generation and various supervised fine-tuning tasks, show that NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
academic
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance
Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. This paper identifies a long-overlooked but prevalent issue in such models: the mismatch between predefined noise levels and the actual noise levels encoded in intermediate states during sampling. The authors term this mismatch "noise shift." Through empirical analysis, the authors demonstrate that noise shift is widespread in modern diffusion models and exhibits systematic bias, leading to out-of-distribution artifacts and inaccurate denoising updates, thereby producing suboptimal generation results. To address this issue, the authors propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly guides the sampling trajectory to maintain consistency with predefined noise schedules.
Denoising generative models such as diffusion models and flow models have achieved significant success in visual generation tasks like image synthesis and video generation. The core principle of these models is to iteratively recover the target sample from pure noise. However, during the iterative sampling process, the model inevitably accumulates errors from multiple sources, including:
The authors find that a key manifestation of these cumulative errors is that the noise level inherently encoded in intermediate states may deviate from the predefined schedule. This phenomenon, termed "noise shift," has long been overlooked by the community but is actually both widespread and rooted in the collective effects of various error sources.
Identification of Noise Shift: First systematic identification and analysis of the widespread but long-overlooked noise shift problem in denoising generative models
Proposal of NAG Method: Design of Noise Awareness Guidance (NAG) to mitigate the noise shift problem
Development of Classifier-Free Variant: Introduction of a classifier-free variant of NAG, jointly training noise-conditional and noise-unconditional models through noise-conditional dropout
Comprehensive Experimental Validation: Verification of NAG's effectiveness and generality on ImageNet generation and supervised fine-tuning tasks
For noise level ( t \in 0,T ), the continuous-time stochastic interpolation is defined as:
xt=αtx0+σtϵ
where ( \alpha_0 = \sigma_T = 1 ), ( \alpha_T = \sigma_0 = 0 ), ( \alpha_t ) is monotonically decreasing, and ( \sigma_t ) is monotonically increasing.
The cumulative error ( e ) can be viewed as an additional Gaussian perturbation applied to ( x_t ): ( \hat{x}_t = x_t + e ), where ( e \sim \mathcal{N}(0, \sigma_e^2 I) ).
This perturbation increases the effective variance from ( \sigma_t^2 ) to ( \sigma_t^2 + \sigma_e^2 ), making the perturbed state appear as if sampled at a shifted noise level ( t' = t + \delta ):
σt+δ2=σt2+σe2
Statement 1: When the error variance ( \sigma_e^2 ) is small, the first-order approximation of the shift ( \delta ) is:
δ≈σ˙tσt2+σe2−σt
Using ( p_t(t|x) \propto p_t(x|t)/p_t(x) ), score mixing is used to approximate the gradient of the implicit noise predictor:
swnag(x∣t)=(wnag+1)s(x∣t)−wnags(x)
Follow the training strategy of CFG: randomly drop the noise condition ( t ) with a fixed probability during training, allowing the model to share weights between conditional and unconditional objectives.
Classifier Guidance: Using external classifiers for conditional generation
Classifier-Free Guidance (CFG): Achieving guidance through mixing conditional and unconditional models
Domain Guidance (DoG): Guidance method specifically designed for fine-tuning scenarios
This paper's NAG is the first method to explicitly use the noise level itself as a guidance signal, directly enhancing alignment with the expected noise condition.
The authors hope this work will attract researchers to focus on the widespread training-inference mismatch problem in denoising generation, promoting the following research directions:
Theoretical or empirical analysis of the noise shift problem
Building generative models robust to inference-phase shifts
Exploring the boundaries of high-quality generation
The paper cites important works in related fields such as diffusion models, flow models, and guidance techniques, including:
Ho et al. (2020): Original DDPM paper
Peebles & Xie (2023): DiT architecture
Ma et al. (2024): SiT architecture
Ho & Salimans (2021): Classifier-free guidance
Dhariwal & Nichol (2021): Classifier guidance
Overall Evaluation: This is a high-quality research paper that identifies an important but overlooked problem in denoising generative models, proposes a simple and effective solution, and validates the method's effectiveness and generality through comprehensive experiments. This work has significant academic value and practical implications for the field of diffusion models.