CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models
Rychkovskiy, GPT-5
We introduce CADE 2.5 (Comfy Adaptive Detail Enhancer), a sampler-level guidance stack for SD/SDXL latent diffusion models. The central module, ZeResFDG, unifies (i) frequency-decoupled guidance that reweights low- and high-frequency components of the guidance signal, (ii) energy rescaling that matches the per-sample magnitude of the guided prediction to the positive branch, and (iii) zero-projection that removes the component parallel to the unconditional direction. A lightweight spectral EMA with hysteresis switches between a conservative and a detail-seeking mode as structure crystallizes during sampling. Across SD/SDXL samplers, ZeResFDG improves sharpness, prompt adherence, and artifact control at moderate guidance scales without any retraining. In addition, we employ a training-free inference-time stabilizer, QSilk Micrograin Stabilizer (quantile clamp + depth/edge-gated micro-detail injection), which improves robustness and yields natural high-frequency micro-texture at high resolutions with negligible overhead. For completeness we note that the same rule is compatible with alternative parameterizations (e.g., velocity), which we briefly discuss in the Appendix; however, this paper focuses on SD/SDXL latent diffusion models.
academic
CADE 2.5 - ZeResFDG: Frequency-Decoupled, Rescaled and Zero-Projected Guidance for SD/SDXL Latent Diffusion Models
This paper proposes CADE 2.5 (Comfy Adaptive Detail Enhancer), a sampler-level guidance stack for SD/SDXL latent diffusion models. The core module ZeResFDG unifies three key techniques: (1) frequency-decoupled guidance, which reweights low-frequency and high-frequency components of the guidance signal; (2) energy rescaling, matching the per-sample amplitude of guided predictions to the positive branch; (3) zero projection, removing components parallel to the unconditional direction. A lightweight spectral EMA with hysteresis mechanism switches between conservative and detail-seeking modes during structural crystallization in the sampling process. The method improves clarity, prompt adherence, and artifact control at moderate guidance scales without retraining.
Latent diffusion models (such as SD/SDXL), while capable of generating high-fidelity images, exhibit quality degradation at large classifier-free guidance (CFG) scales, manifesting as oversaturation, color shifts, or texture artifacts. Reducing CFG to avoid these effects often sacrifices clarity and prompt adherence.
This issue directly impacts the quality of diffusion models in practical applications. Users face a trade-off between image clarity/prompt adherence and artifact control, which limits model utility.
While these methods show some effectiveness, they lack a unified framework to simultaneously address frequency component processing, energy matching, and directional drift issues.
This work aims to provide a compact sampler-end solution by reshaping the guidance signal itself to address the above issues while maintaining training-free characteristics.
Proposed the ZeResFDG unified framework: Organically combines frequency decoupling, energy rescaling, and zero projection techniques
Designed an adaptive mode-switching mechanism: Dynamically switches between conservative and detail-seeking modes based on spectral EMA and hysteresis
Developed QSilk Micrograin Stabilizer: A training-free inference-time stabilizer that improves robustness and produces natural microtextures at high resolution
Implemented a plug-and-play sampler wrapper: Integrates into existing SD/SDXL pipelines without retraining
Verified cross-parameterization compatibility: The method applies to different parameterizations (e.g., velocity parameterization)
Given conditional prediction y_c and unconditional prediction y_u, standard CFG forms y_cfg = y_u + s(y_c - y_u), where s > 0 is the guidance scale. The objective is to reduce artifacts at high CFG scales while maintaining prompt adherence.
Monitor the high-frequency ratio r_HF = ∥Δ_h∥²/(∥Δ_ℓ∥² + ∥Δ_h∥²) and track EMA ρ. Switch between conservative mode (CFGZeroFD) and detail-seeking mode (RescaleFDG) via two thresholds (τ_lo, τ_hi) and hysteresis mechanism.
Experiments use the SDXL model at 672×944 resolution with final output at 3688×5192. Testing includes different SDXL models targeting photography and anime styles.
Experiments demonstrate CADE 2.5 improvements across multiple aspects:
Anime-Style Portraits: Clearer lines, better color and lighting effects, significant enhancement of eye, nose, and lip details, without flickering artifacts
Photorealistic Portraits: Enhanced microtextures while maintaining global tone, reduced eye artifacts, richer hair details, more natural skin tone and microtextures
High-Frequency Details: Significant enhancement of microtextures in lips, nose, neck and other regions
The paper provides detailed visual comparisons showing ZeResFDG significantly improves microtexture quality and reduces typical high-CFG artifacts (oversaturation, halo effects) while maintaining global composition and tone.
The paper particularly mentions complementarity with Sadat et al. (2025)'s Adaptive Projection Guidance (APG) framework. APG decomposes classifier-free guidance into parallel and orthogonal components, while this work extends this perspective by incorporating rescaling and zero projection terms specifically for SD/SDXL.
CADE 2.5 successfully addresses quality degradation of SD/SDXL models at high CFG scales through the ZeResFDG framework, significantly improving image quality while maintaining training-free characteristics.
The paper cites important works in the field, including:
Attention-guided methods such as SAG/PAG
Related research on the APG framework
Foundational theory on diffusion model guidance mechanisms
Optimization techniques widely used in practice
Overall Assessment: This is a technically strong engineering optimization paper. While it has some limitations in theoretical depth and evaluation comprehensiveness, its practical value is high, providing effective improvement solutions for diffusion model applications. The training-free characteristics and significant visual improvements make it promising for practical deployment.