2025-11-25T10:52:16.800785

Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

Li, Yan

This paper investigates score-based diffusion models when the underlying target distribution is concentrated on or near low-dimensional manifolds within the higher-dimensional space in which they formally reside, a common characteristic of natural image distributions. Despite previous efforts to understand the data generation process of diffusion models, existing theoretical support remains highly suboptimal in the presence of low-dimensional structure, which we strengthen in this paper. For the popular Denoising Diffusion Probabilistic Model (DDPM), we find that the dependency of the error incurred within each denoising step on the ambient dimension $d$ is in general unavoidable. We further identify a unique design of coefficients that yields a converges rate at the order of $O(k^{2}/\sqrt{T})$ (up to log factors), where $k$ is the intrinsic dimension of the target distribution and $T$ is the number of steps. This represents the first theoretical demonstration that the DDPM sampler can adapt to unknown low-dimensional structures in the target distribution, highlighting the critical importance of coefficient design. All of this is achieved by a novel set of analysis tools that characterize the algorithmic dynamics in a more deterministic manner.

academic

Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

基本信息

论文ID: 2405.14861
标题: Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models
作者: Gen Li (香港中文大学), Yuling Yan (威斯康星大学麦迪逊分校)
分类: cs.LG cs.AI math.ST stat.ML stat.TH
发表时间: 2025年1月3日 (arXiv v2版本于2024年12月31日)
论文链接: https://arxiv.org/abs/2405.14861

摘要

本文研究了当目标分布集中在高维空间中的低维流形上或其附近时的基于分数的扩散模型，这是自然图像分布的常见特征。尽管之前在理解扩散模型的数据生成过程方面做了努力，但在存在低维结构时，现有的理论支持仍然高度次优。对于流行的去噪扩散概率模型(DDPM)，作者发现每个去噪步骤中产生的误差对环境维度d的依赖通常是不可避免的。进一步，作者识别出一种独特的系数设计，能够产生 $O(k^2/\sqrt{T})$ 阶的收敛率(忽略对数因子)，其中k是目标分布的内在维数，T是步数。这代表了DDPM采样器能够适应目标分布中未知低维结构的首次理论证明，突出了系数设计的关键重要性。

研究背景与动机

问题定义

扩散模型在生成高质量图像、音频和文本方面表现出色，但现有理论分析存在显著的理论-实践差距。具体而言：

理论预测vs实际性能差距：现有理论表明为达到ε精度需要poly(d)/ε²步数，其中d是问题维度。然而实际中，CIFAR-10(d=32×32×3)只需50步，ImageNet只需250步就能生成好的样本。
低维结构的普遍性：自然图像分布通常集中在高维空间的低维流形上或其附近，但现有理论未能利用这一结构特性。
系数设计的重要性被忽视：现有分析对DDPM中系数选择的重要性认识不足。

现有方法局限性

维度依赖性：现有最佳结果(Benton et al. 2023)仍显示对环境维度d的线性依赖
低维结构利用不足：De Bortoli (2022)虽然考虑了低维流形，但误差界仍对环境维度d线性依赖，且对流形直径呈指数依赖
分析工具局限：现有分析方法无法有效处理低维结构情况

核心贡献

首次维度自适应理论：证明了DDPM采样器能够自适应未知低维结构，收敛率为 $O(k^2/\sqrt{T})$ (忽略对数因子)，其中k是内在维数而非环境维度d。
独特系数设计：识别出唯一的系数设计 $\eta_t^* = 1-\alpha_t$ 和 $(\sigma_t^*)^2 = \frac{(1-\alpha_t)(\alpha_t-\bar{\alpha}_t)}{1-\bar{\alpha}_t}$ ，使得每个去噪步骤不会产生与环境维度d成比例的离散化误差。
新颖分析工具：开发了一套新的分析工具，以更确定性的方式刻画算法动态，包括高概率集合识别和条件密度连接技术。
系数设计唯一性证明：理论证明了所提出的系数选择在某种意义下是唯一的，偏离此设计将导致与环境维度d成比例的误差。

方法详解

任务定义

考虑DDPM的前向过程： $X_t = \sqrt{1-\beta_t}X_{t-1} + \sqrt{\beta_t}W_t \quad (t=1,\ldots,T)$

其中 $X_0 \sim p_{data}$ ， $W_t \sim N(0,I_d)$ 。

逆向过程为： $Y_{t-1} = \frac{1}{\sqrt{\alpha_t}}(Y_t + \eta_t s_t(Y_t) + \sigma_t Z_t) \quad (t=T,\ldots,1)$

其中 $Y_T \sim N(0,I_d)$ ， $s_t(\cdot)$ 是学习的分数函数。

关键假设与设置

低维结构刻画

使用ε-网和覆盖数来刻画内在维数：

对于 $\varepsilon = T^{-c_\varepsilon}$ ，定义内在维数k满足 $\log N_\varepsilon(\mathcal{X}) \leq C_{cover}k\log T$
支撑集合有界： $\sup_{x\in\mathcal{X}}\|x\|_2 \leq R = T^{c_R}$

学习率调度

采用特定的学习率调度： $\beta_1 = \frac{1}{T^{c_0}}, \quad \beta_{t+1} = \frac{c_1\log T}{T}\min\left\{\beta_1\left(1+\frac{c_1\log T}{T}\right)^t, 1\right\}$

核心技术创新

1. 最优系数设计

关键发现是系数的特定选择： $\eta_t^* = 1-\alpha_t, \quad (\sigma_t^*)^2 = \frac{(1-\alpha_t)(\alpha_t-\bar{\alpha}_t)}{1-\bar{\alpha}_t}$

其中 $\alpha_t = 1-\beta_t$ ， $\bar{\alpha}_t = \prod_{i=1}^t \alpha_i$ 。

2. 分析框架

通过分解总变差距离： $TV^2(q_1,p_1) \leq \frac{1}{2}KL(p_{X_T}\|p_{Y_T}) + \frac{1}{2}\sum_{t=2}^T \mathbb{E}_{x_t\sim q_t}[KL(p_{X_{t-1}|X_t}(\cdot|x_t)\|p_{Y_{t-1}|Y_t}(\cdot|x_t))]$

3. 高概率集合识别

定义典型集合： $\mathcal{T}_t = \{\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\omega : x_0 \in \cup_{i\in\mathcal{I}}B_i, \omega \in \mathcal{G}\}$

其中 $\mathcal{G}$ 是高概率高斯集合， $\mathcal{I}$ 是高概率覆盖集合索引。

实验设置

数据集

使用退化高斯分布 $p_{data} = N(0,I_k)$ 作为可处理的例子，其中 $I_k \in \mathbb{R}^{d \times d}$ 是对角矩阵，前k个对角元素为1，其余为0。

评价指标

总变差距离TV $(q_1,p_1)$
KL散度KL $(q_1\|p_1)$

对比方法

对比两种系数设计：

本文方法： $\eta_t = \eta_t^*$ ， $\sigma_t = \sigma_t^*$ （公式2.4）
基线方法： $\eta_t = \sigma_t^2 = 1-\alpha_t$ （常用理论分析设计）

实现细节

固定内在维数k=8
环境维度d从10变化到1000
步数T ∈ {100, 200, 500, 1000}
使用Ho et al. (2020)的学习率调度（实践中常用）

实验结果

主要结果

实验验证了理论预测：

本文方法：误差与环境维度d无关，保持在低水平
基线方法：误差随环境维度d增长而显著增加

具体数值表现：

当d=1000时，本文方法的误差保持在10⁻⁴到10⁻²量级
基线方法的误差增长到10⁻¹到10⁰量级

维度依赖性分析

实验清晰展示了两种方法的不同行为：

维度无关性：本文方法在所有T值下都显示出与d无关的误差
线性增长：基线方法显示误差随d近似线性增长

实验发现

系数设计的选择对低维适应性至关重要
即使在相对小的步数下，正确的系数设计也能显著改善性能
理论预测与实验结果高度一致

理论分析

主要理论结果

定理1（收敛分析）

在最优系数选择下： $TV(q_1,p_1) \leq C\frac{(k+\log d)^2\log^3 T}{\sqrt{T}} + C\varepsilon_{score}\log T$

其中第一项是离散化误差，第二项是分数匹配误差。

定理2（系数设计唯一性）

对于目标分布 $p_{data} = N(0,I_k)$ ，任何偏离最优系数的选择都会导致： $\mathbb{E}_{x_t\sim q_t}[KL(p_{X_{t-1}|X_t}(\cdot|x_t)\|p_{Y_{t-1}|Y_t}(\cdot|x_t))] \geq \frac{d}{4}(\eta_t-\eta_t^*)^2 + \frac{d}{40}\left(\frac{(\sigma_t^*)^2}{\sigma_t^2}-1\right)^2$