2025-11-16T08:55:12.135200

On Convolutions, Intrinsic Dimension, and Diffusion Models

Leung, Hosseinzadeh, Loaiza-Ganem

The manifold hypothesis asserts that data of interest in high-dimensional ambient spaces, such as image data, lies on unknown low-dimensional submanifolds. Diffusion models (DMs) -- which operate by convolving data with progressively larger amounts of Gaussian noise and then learning to revert this process -- have risen to prominence as the most performant generative models, and are known to be able to learn distributions with low-dimensional support. For a given datum in one of these submanifolds, we should thus intuitively expect DMs to have implicitly learned its corresponding local intrinsic dimension (LID), i.e. the dimension of the submanifold it belongs to. Kamkari et al. (2024b) recently showed that this is indeed the case by linking this LID to the rate of change of the log marginal densities of the DM with respect to the amount of added noise, resulting in an LID estimator known as FLIPD. LID estimators such as FLIPD have a plethora of uses, among others they quantify the complexity of a given datum, and can be used to detect outliers, adversarial examples and AI-generated text. FLIPD achieves state-of-the-art performance at LID estimation, yet its theoretical underpinnings are incomplete since Kamkari et al. (2024b) only proved its correctness under the highly unrealistic assumption of affine submanifolds. In this work we bridge this gap by formally proving the correctness of FLIPD under realistic assumptions. Additionally, we show that an analogous result holds when Gaussian convolutions are replaced with uniform ones, and discuss the relevance of this result.

academic

On Convolutions, Intrinsic Dimension, and Diffusion Models

基本信息

论文ID: 2506.20705
标题: On Convolutions, Intrinsic Dimension, and Diffusion Models
作者: Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem (Layer 6 AI)
分类: cs.LG cs.AI stat.ML
发表时间/会议: Transactions on Machine Learning Research (10/2025)
论文链接: https://arxiv.org/abs/2506.20705

摘要

流形假说断言高维环境空间中的感兴趣数据（如图像数据）位于未知的低维子流形上。扩散模型（DMs）通过对数据进行逐渐增大的高斯噪声卷积并学习逆转该过程而运行，已成为最高性能的生成模型，并且已知能够学习具有低维支撑的分布。对于这些子流形中的给定数据点，我们直观地期望DMs已经隐式学习了其相应的局部内在维数（LID），即它所属子流形的维数。Kamkari等人（2024b）最近通过将LID与DM的对数边际密度相对于添加噪声量的变化率联系起来，证明了这确实是这种情况，从而产生了名为FLIPD的LID估计器。FLIPD在LID估计方面达到了最先进的性能，但其理论基础不完整，因为Kamkari等人（2024b）仅在仿射子流形的高度不现实假设下证明了其正确性。本文通过在现实假设下正式证明FLIPD的正确性来弥补这一差距。此外，我们还证明了当高斯卷积被均匀卷积替换时，类似的结果成立，并讨论了该结果的相关性。

研究背景与动机

问题定义

本文要解决的核心问题是为FLIPD（Flow-based Local Intrinsic Dimension）估计器提供严格的理论基础。具体来说：

理论缺陷：Kamkari等人提出的FLIPD虽然在实践中表现优异，但其理论证明仅在仿射子流形的不现实假设下成立
实际需求：需要在一般的嵌入子流形上证明FLIPD的正确性，使其理论基础与实际应用相匹配

重要性分析

局部内在维数（LID）估计在机器学习中具有重要应用价值：

复杂度量化：有效量化图像复杂度
异常检测：检测离群点、对抗样本和AI生成文本
泛化预测：神经网络表示的LID估计可预测泛化性能
记忆化检测：识别模型记忆化现象

现有方法局限性

传统LID估计器存在以下问题：

计算复杂度高：依赖成对距离计算，在数据集大小和环境维度上扩展性差
维度诅咒：在高维空间中性能下降
理论不完整：FLIPD虽然性能优异，但理论基础薄弱

核心贡献

理论完善：在现实假设下正式证明了FLIPD的正确性，将其从仿射子流形扩展到一般的光滑嵌入子流形
结果扩展：证明了当高斯卷积被均匀卷积替换时，类似的结果依然成立
数学严谨性：提供了完整的数学证明，包括复杂的微分几何分析
实用价值：为FLIPD在实际应用中的可靠性提供了理论保证

方法详解

核心理论结果

本文的核心是证明以下关键等式在一般条件下成立：

$\text{LID}(x) = D + \lim_{\delta \to -\infty} \frac{\partial}{\partial \delta} \log \varrho_N(x, \delta)$

其中：

$\varrho_N(x, \delta)$ 是数据分布与对数标准差为 $\delta$ 的高斯噪声的卷积
$D$ 是环境空间维度
$\delta \to -\infty$ 对应噪声趋于零的极限

主要定理

定理1（高斯情况）：设 $M$ 是 $\mathbb{R}^D$ 中的光滑 $d$ 维嵌入子流形， $p$ 是 $M$ 上的概率密度函数。对于 $x \in M$ ，如果 $p$ 在 $x$ 处连续， $p(x) > 0$ ，且满足有限二阶矩条件，则：

$\lim_{\delta \to -\infty} \frac{\partial}{\partial \delta} \log \varrho_N(x, \delta) = d - D$

定理2（均匀情况）：类似的结果对均匀分布卷积也成立：

$\lim_{\delta \to -\infty} \frac{\partial}{\partial \delta} \log \varrho_U(x, \delta) = d - D$

证明思路

证明的核心思想是利用高斯密度和均匀密度的分解性质：

高斯情况：利用关系式 $N_D(x-x'; 0, \delta) = (2\pi)^{\frac{d-D}{2}} e^{\delta(d-D)} N_d(x-x'; 0, \delta)$
均匀情况：利用类似的分解 $U_D(x;\mu, \delta) = C_D^U (C_d^U)^{-1} e^{\delta(d-D)} U_d(x;\mu, \delta)$
极限分析：通过精细的微分几何分析，证明导数的极限收敛到期望值