2025-11-19T18:58:14.309516

A Connection Between Score Matching and Local Intrinsic Dimension

Yeats, Jacobson, Hannan et al.

The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.

academic

A Connection Between Score Matching and Local Intrinsic Dimension

Basic Information

Paper ID: 2510.12975
Title: A Connection Between Score Matching and Local Intrinsic Dimension
Authors: Eric Yeats, Aaron Jacobson, Darryl Hannan, Yiran Jia, Timothy Doster, Henry Kvinge, Scott Mahan (PNNL, UNC Chapel Hill, UC San Diego)
Classification: cs.LG stat.ML
Publication Venue/Date: Accepted at 3rd SPIGM Workshop @ NeurIPS 2025
Paper Link: https://arxiv.org/abs/2510.12975

Abstract

Local Intrinsic Dimension (LID) is a fundamental quantity in signal processing and learning theory; however, quantifying the LID of high-dimensional complex data has historically been a challenging task. Recent research has found that diffusion models capture data LID through the spectral properties of their score estimates and the rate of change in density estimation under various noise perturbations. While these methods can accurately quantify LID, they require multiple forward passes through diffusion models or gradient computations, which limits their applicability in computationally and memory-constrained scenarios.

This paper demonstrates that LID serves as a lower bound for denoising score matching loss, thereby providing theoretical justification for using denoising score matching loss as an LID estimator. Furthermore, the authors prove that the equivalent implicit score matching loss also approximates LID through normal dimension and is closely related to the recent LID estimator FLIPD. Experiments on manifold benchmarks and Stable Diffusion 3.5 demonstrate that denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory efficiency as problem scale and quantization levels increase.

Research Background and Motivation

Problem Definition

High-dimensional data typically exhibits low-dimensional structure, known as the manifold hypothesis, which is a core assumption in machine learning. Local Intrinsic Dimension (LID) is a fundamental quantity that encapsulates the low-dimensional structure of data. For a point x, LID represents the local dimensionality required to losslessly encode data in the neighborhood of x.

Significance

Signal Processing Implications: LID determines the boundaries of (local) compressibility of distributions
Deep Learning Value: Lower LID improves statistical efficiency of learning, making learning and generalization easier
Practical Applications: Widely applied in engineering tasks such as anomaly detection, clustering, and segmentation

Limitations of Existing Methods

Non-parametric Methods: Require substantial sampled data, are strongly influenced by hyperparameter selection, and fail to generalize in low-data settings
Parametric Methods: While leveraging deep generative models for scalability, LIDL requires multiple generative models, and FLIPD and normal bundle methods require gradient computation or numerous forward passes

Research Motivation

Existing parametric LID estimation methods have limitations in computational and memory efficiency, particularly in large-scale applications. This paper aims to discover a more efficient and scalable LID estimation method.

Core Contributions

Theoretical Contribution: Proves that denoising score matching loss has LID as a lower bound, providing theoretical foundation for its use as a scalable LID estimator
Method Connection: Establishes close relationships between score matching loss and current leading estimators (FLIPD and normal bundle methods)
Experimental Validation: Experiments on manifold benchmarks and Stable Diffusion 3.5/2.0 demonstrate that denoising score matching loss is a highly competitive LID estimator
Practical Advantages: Demonstrates superior scalability in memory consumption and quantization consistency

Methodology Details

Task Definition

Given a point x sampled from a d-dimensional data manifold M⊂Rⁿ, estimate its local intrinsic dimension d. Input consists of high-dimensional data points, with output being the corresponding LID estimate.

Core Theory

Theorem 3.1: Denoising Score Matching Loss Lower Bound

For a random variable x sampled from a d-dimensional manifold M, when σ→0⁺ is sufficiently small:

E_x[L_DSM(x,σ,θ)] ≥ d

where denoising score matching loss is defined as:

E_x[L_DSM(x,σ,θ)] := E_{x~p(x),ε~N(0,I)} σ²||ε/σ + s_θ(x+σε)||²

Proof Strategy:

Decompose noise ε into tangent space and normal space components
Tangent space components: expected squared error for each dimension is approximately 1
Normal space components: expected squared error is approximately 0 due to manifold structure
Summation yields LID as lower bound

Theorem 3.3: Implicit Score Matching Loss Lower Bound

E_{x̃}[L_ISM(x̃,σ,θ)] ≥ -(n-d)

This indicates that implicit score matching loss has negative normal dimension as a lower bound.

Connections to Existing Methods

Relationship with FLIPD

FLIPD computation at point x is:

FLIPD(x,σ,θ) := L_ISM(x,σ,θ) + σ²/2||s_θ(x)||² + n

Through Theorem 3.3, it can be proven that:

E_{x̃}[FLIPD(x̃,σ,θ)] ≥ d

Relationship with Normal Bundle Method

The normal bundle method computes singular values of an m×n matrix, while the proposed error bundle method computes eigenvalues of the error vector matrix. The denoising loss equals the trace (area) of Gram matrix eigenvalues, remaining accurate with small samples.

Experimental Setup

Datasets

Manifolds with known LID from scikit-dimension package:

Hyperspheres and hyperballs with d=16, n=64
HyperTwinPeaks with d=128, n=256
Clifford torus and nonlinear manifolds with d=32, n=128

Model Architecture

DiT (Diffusion Transformer): patch size=4, hidden dim=128, 16 attention heads, 8 layers
MLP: with skip connections, similar to architecture used in FLIPD

Evaluation Metrics

Primary Metric: Mean Absolute Error (MAE) between true LID and estimated LID
Secondary Metrics: Peak GPU memory usage, performance changes after quantization

Comparison Methods

Non-parametric Methods: MLE, TwoNN, ESS
Parametric Methods: FLIPD
Noise Levels: σ = 0.01, 0.02, 0.05

Experimental Results

Main Results

Manifold Benchmark Experiments

Key Findings from Table 1:

DiT Architecture:
- Denoising loss method average MAE: 2.21 (σ=0.05)
- FLIPD average MAE: 23.05 (σ=0.05)
- Significant differences on high-dimensional high-curvature manifolds
MLP Architecture:
- Denoising loss method average MAE: 7.27 (σ=0.05)
- FLIPD average MAE: 11.11 (σ=0.05)
- FLIPD performs better on MLP
Non-parametric Methods:
- ESS performs best: MAE 7.12 (k=100)
- Severe performance degradation on high-dimensional manifolds

Scalability Experiments

Figure 2 Results:

Both parametric methods maintain low MAE as manifold dimension increases
FLIPD memory usage grows rapidly due to gradient computation
Denoising loss method shows slow memory growth

Stable Diffusion Experiments

SD 3.5 Experimental Findings

Correlation: FLIPD and denoising loss estimates are highly correlated
Numerical Differences: FLIPD typically provides higher LID estimates
Quantization Stability: Denoising loss shows smaller changes after quantization
Memory Efficiency: Denoising loss peak memory approximately 60% of FLIPD

SD 2.0 Experiments

Similar high correlation patterns
FLIPD produces negative values at high noise levels (invalid estimates)
Attributed to high Lipschitz constant of U-Net architecture

Ablation Studies

Experiments with different σ values reveal:

σ=0.05 typically yields best performance
Smaller σ values may cause numerical instability
DiT architecture is more robust to σ selection

Non-parametric LID Estimation

MLE Method: Fits Poisson distribution parameters via maximum likelihood
TwoNN Method: Analyzes ratio of second and first nearest neighbor distances
ESS Method: Measures simplex volume skewness formed by points and their neighbors
Fractal Dimension Method: Handles self-similar or fractal structure data

Parametric LID Estimation

LIDL: Uses ensemble of models with normalizing flows
Normal Bundle Method: Counts singular values of score estimate matrix
FLIPD: Uses Fokker-Planck equation, requires single diffusion model

Conclusions and Discussion

Main Conclusions

Denoising score matching loss provides theoretically grounded lower bound for LID
The method achieves good balance between accuracy and computational efficiency
Possesses deep theoretical connections with existing state-of-the-art methods

Theoretical Insights

Constant Term Interpretation: C_DSM equals negative value of average data LID
Multi-scale Training: Training at each scale can be viewed as identifying average LID of that specific noise manifold
Likelihood Computation: May attribute higher likelihood to higher learned normal dimension

Limitations

Experiments use only single H100 GPU, without leveraging distributed computing
Quantization limited to half precision
Does not include "knee point search" in LID curves
Theoretical assumptions require σ sufficiently small and negligible manifold curvature

Future Directions

Extend to larger-scale distributed experiments
Study performance under more extreme quantization conditions
Develop adaptive σ selection strategies
Explore applications on more complex manifold structures

In-Depth Evaluation

Strengths

Solid Theoretical Contribution: Provides rigorous mathematical proofs establishing fundamental connections between score matching and LID
Simple and Efficient Method: Requires no gradient computation or multiple forward passes, with high computational efficiency
Comprehensive Experiments: Covers synthetic manifolds, real data, and large-scale models
High Practical Value: Shows clear advantages in memory-constrained scenarios

Weaknesses

Theoretical Assumption Limitations: Requires conditions that σ be sufficiently small and manifold curvature be negligible
Architecture Dependency: Performance varies across different neural network architectures
Parameter Sensitivity: σ selection significantly impacts results
Limited Verification Scope: Primarily validated on relatively simple synthetic manifolds

Impact

Theoretical Value: Provides new perspective for understanding diffusion models and manifold learning
Practical Significance: Offers viable solution for large-scale LID estimation
Methodological Contribution: Demonstrates how to extract geometric information from training loss

Applicable Scenarios

Large-scale Data Analysis: Computationally and memory-constrained scenarios
Real-time LID Estimation: Applications requiring rapid response
Pre-trained Diffusion Models: Direct utilization of existing models for LID estimation
Manifold Learning Research: Tool for understanding data geometric structure

References

The paper cites multiple important related works, including:

Vincent (2011): Connection between denoising and generative modeling
Hyvärinen & Dayan (2005): Foundational theory of score matching
Kamkari et al. (2024): FLIPD method
Stanczuk et al. (2024): Normal bundle method
Related literature on diffusion models and flow matching

Overall Assessment: This is an excellent paper combining theory and practice, providing new theoretical perspectives and practical methods for LID estimation. While certain technical details could be improved, its core contributions hold significant value for understanding the geometric properties of diffusion models and advancing LID estimation methods.