2025-11-19T18:58:14.309516

A Connection Between Score Matching and Local Intrinsic Dimension

Yeats, Jacobson, Hannan et al.
The local intrinsic dimension (LID) of data is a fundamental quantity in signal processing and learning theory, but quantifying the LID of high-dimensional, complex data has been a historically challenging task. Recent works have discovered that diffusion models capture the LID of data through the spectra of their score estimates and through the rate of change of their density estimates under various noise perturbations. While these methods can accurately quantify LID, they require either many forward passes of the diffusion model or use of gradient computation, limiting their applicability in compute- and memory-constrained scenarios. We show that the LID is a lower bound on the denoising score matching loss, motivating use of the denoising score matching loss as a LID estimator. Moreover, we show that the equivalent implicit score matching loss also approximates LID via the normal dimension and is closely related to a recent LID estimator, FLIPD. Our experiments on a manifold benchmark and with Stable Diffusion 3.5 indicate that the denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory footprint under increasing problem size and quantization level.
academic

A Connection Between Score Matching and Local Intrinsic Dimension

Basic Information

  • Paper ID: 2510.12975
  • Title: A Connection Between Score Matching and Local Intrinsic Dimension
  • Authors: Eric Yeats, Aaron Jacobson, Darryl Hannan, Yiran Jia, Timothy Doster, Henry Kvinge, Scott Mahan (PNNL, UNC Chapel Hill, UC San Diego)
  • Classification: cs.LG stat.ML
  • Publication Venue/Date: Accepted at 3rd SPIGM Workshop @ NeurIPS 2025
  • Paper Link: https://arxiv.org/abs/2510.12975

Abstract

Local Intrinsic Dimension (LID) is a fundamental quantity in signal processing and learning theory; however, quantifying the LID of high-dimensional complex data has historically been a challenging task. Recent research has found that diffusion models capture data LID through the spectral properties of their score estimates and the rate of change in density estimation under various noise perturbations. While these methods can accurately quantify LID, they require multiple forward passes through diffusion models or gradient computations, which limits their applicability in computationally and memory-constrained scenarios.

This paper demonstrates that LID serves as a lower bound for denoising score matching loss, thereby providing theoretical justification for using denoising score matching loss as an LID estimator. Furthermore, the authors prove that the equivalent implicit score matching loss also approximates LID through normal dimension and is closely related to the recent LID estimator FLIPD. Experiments on manifold benchmarks and Stable Diffusion 3.5 demonstrate that denoising score matching loss is a highly competitive and scalable LID estimator, achieving superior accuracy and memory efficiency as problem scale and quantization levels increase.

Research Background and Motivation

Problem Definition

High-dimensional data typically exhibits low-dimensional structure, known as the manifold hypothesis, which is a core assumption in machine learning. Local Intrinsic Dimension (LID) is a fundamental quantity that encapsulates the low-dimensional structure of data. For a point x, LID represents the local dimensionality required to losslessly encode data in the neighborhood of x.

Significance

  1. Signal Processing Implications: LID determines the boundaries of (local) compressibility of distributions
  2. Deep Learning Value: Lower LID improves statistical efficiency of learning, making learning and generalization easier
  3. Practical Applications: Widely applied in engineering tasks such as anomaly detection, clustering, and segmentation

Limitations of Existing Methods

  1. Non-parametric Methods: Require substantial sampled data, are strongly influenced by hyperparameter selection, and fail to generalize in low-data settings
  2. Parametric Methods: While leveraging deep generative models for scalability, LIDL requires multiple generative models, and FLIPD and normal bundle methods require gradient computation or numerous forward passes

Research Motivation

Existing parametric LID estimation methods have limitations in computational and memory efficiency, particularly in large-scale applications. This paper aims to discover a more efficient and scalable LID estimation method.

Core Contributions

  1. Theoretical Contribution: Proves that denoising score matching loss has LID as a lower bound, providing theoretical foundation for its use as a scalable LID estimator
  2. Method Connection: Establishes close relationships between score matching loss and current leading estimators (FLIPD and normal bundle methods)
  3. Experimental Validation: Experiments on manifold benchmarks and Stable Diffusion 3.5/2.0 demonstrate that denoising score matching loss is a highly competitive LID estimator
  4. Practical Advantages: Demonstrates superior scalability in memory consumption and quantization consistency

Methodology Details

Task Definition

Given a point x sampled from a d-dimensional data manifold M⊂Rⁿ, estimate its local intrinsic dimension d. Input consists of high-dimensional data points, with output being the corresponding LID estimate.

Core Theory

Theorem 3.1: Denoising Score Matching Loss Lower Bound

For a random variable x sampled from a d-dimensional manifold M, when σ→0⁺ is sufficiently small:

E_x[L_DSM(x,σ,θ)] ≥ d

where denoising score matching loss is defined as:

E_x[L_DSM(x,σ,θ)] := E_{x~p(x),ε~N(0,I)} σ²||ε/σ + s_θ(x+σε)||²

Proof Strategy:

  1. Decompose noise ε into tangent space and normal space components
  2. Tangent space components: expected squared error for each dimension is approximately 1
  3. Normal space components: expected squared error is approximately 0 due to manifold structure
  4. Summation yields LID as lower bound

Theorem 3.3: Implicit Score Matching Loss Lower Bound

E_{x̃}[L_ISM(x̃,σ,θ)] ≥ -(n-d)

This indicates that implicit score matching loss has negative normal dimension as a lower bound.

Connections to Existing Methods

Relationship with FLIPD

FLIPD computation at point x is:

FLIPD(x,σ,θ) := L_ISM(x,σ,θ) + σ²/2||s_θ(x)||² + n

Through Theorem 3.3, it can be proven that:

E_{x̃}[FLIPD(x̃,σ,θ)] ≥ d

Relationship with Normal Bundle Method

The normal bundle method computes singular values of an m×n matrix, while the proposed error bundle method computes eigenvalues of the error vector matrix. The denoising loss equals the trace (area) of Gram matrix eigenvalues, remaining accurate with small samples.

Experimental Setup

Datasets

Manifolds with known LID from scikit-dimension package:

  • Hyperspheres and hyperballs with d=16, n=64
  • HyperTwinPeaks with d=128, n=256
  • Clifford torus and nonlinear manifolds with d=32, n=128

Model Architecture

  1. DiT (Diffusion Transformer): patch size=4, hidden dim=128, 16 attention heads, 8 layers
  2. MLP: with skip connections, similar to architecture used in FLIPD

Evaluation Metrics

  • Primary Metric: Mean Absolute Error (MAE) between true LID and estimated LID
  • Secondary Metrics: Peak GPU memory usage, performance changes after quantization

Comparison Methods

  • Non-parametric Methods: MLE, TwoNN, ESS
  • Parametric Methods: FLIPD
  • Noise Levels: σ = 0.01, 0.02, 0.05

Experimental Results

Main Results

Manifold Benchmark Experiments

Key Findings from Table 1:

  1. DiT Architecture:
    • Denoising loss method average MAE: 2.21 (σ=0.05)
    • FLIPD average MAE: 23.05 (σ=0.05)
    • Significant differences on high-dimensional high-curvature manifolds
  2. MLP Architecture:
    • Denoising loss method average MAE: 7.27 (σ=0.05)
    • FLIPD average MAE: 11.11 (σ=0.05)
    • FLIPD performs better on MLP
  3. Non-parametric Methods:
    • ESS performs best: MAE 7.12 (k=100)
    • Severe performance degradation on high-dimensional manifolds

Scalability Experiments

Figure 2 Results:

  • Both parametric methods maintain low MAE as manifold dimension increases
  • FLIPD memory usage grows rapidly due to gradient computation
  • Denoising loss method shows slow memory growth

Stable Diffusion Experiments

SD 3.5 Experimental Findings

  1. Correlation: FLIPD and denoising loss estimates are highly correlated
  2. Numerical Differences: FLIPD typically provides higher LID estimates
  3. Quantization Stability: Denoising loss shows smaller changes after quantization
  4. Memory Efficiency: Denoising loss peak memory approximately 60% of FLIPD

SD 2.0 Experiments

  • Similar high correlation patterns
  • FLIPD produces negative values at high noise levels (invalid estimates)
  • Attributed to high Lipschitz constant of U-Net architecture

Ablation Studies

Experiments with different σ values reveal:

  • σ=0.05 typically yields best performance
  • Smaller σ values may cause numerical instability
  • DiT architecture is more robust to σ selection

Non-parametric LID Estimation

  • MLE Method: Fits Poisson distribution parameters via maximum likelihood
  • TwoNN Method: Analyzes ratio of second and first nearest neighbor distances
  • ESS Method: Measures simplex volume skewness formed by points and their neighbors
  • Fractal Dimension Method: Handles self-similar or fractal structure data

Parametric LID Estimation

  • LIDL: Uses ensemble of models with normalizing flows
  • Normal Bundle Method: Counts singular values of score estimate matrix
  • FLIPD: Uses Fokker-Planck equation, requires single diffusion model

Conclusions and Discussion

Main Conclusions

  1. Denoising score matching loss provides theoretically grounded lower bound for LID
  2. The method achieves good balance between accuracy and computational efficiency
  3. Possesses deep theoretical connections with existing state-of-the-art methods

Theoretical Insights

  1. Constant Term Interpretation: C_DSM equals negative value of average data LID
  2. Multi-scale Training: Training at each scale can be viewed as identifying average LID of that specific noise manifold
  3. Likelihood Computation: May attribute higher likelihood to higher learned normal dimension

Limitations

  1. Experiments use only single H100 GPU, without leveraging distributed computing
  2. Quantization limited to half precision
  3. Does not include "knee point search" in LID curves
  4. Theoretical assumptions require σ sufficiently small and negligible manifold curvature

Future Directions

  1. Extend to larger-scale distributed experiments
  2. Study performance under more extreme quantization conditions
  3. Develop adaptive σ selection strategies
  4. Explore applications on more complex manifold structures

In-Depth Evaluation

Strengths

  1. Solid Theoretical Contribution: Provides rigorous mathematical proofs establishing fundamental connections between score matching and LID
  2. Simple and Efficient Method: Requires no gradient computation or multiple forward passes, with high computational efficiency
  3. Comprehensive Experiments: Covers synthetic manifolds, real data, and large-scale models
  4. High Practical Value: Shows clear advantages in memory-constrained scenarios

Weaknesses

  1. Theoretical Assumption Limitations: Requires conditions that σ be sufficiently small and manifold curvature be negligible
  2. Architecture Dependency: Performance varies across different neural network architectures
  3. Parameter Sensitivity: σ selection significantly impacts results
  4. Limited Verification Scope: Primarily validated on relatively simple synthetic manifolds

Impact

  1. Theoretical Value: Provides new perspective for understanding diffusion models and manifold learning
  2. Practical Significance: Offers viable solution for large-scale LID estimation
  3. Methodological Contribution: Demonstrates how to extract geometric information from training loss

Applicable Scenarios

  1. Large-scale Data Analysis: Computationally and memory-constrained scenarios
  2. Real-time LID Estimation: Applications requiring rapid response
  3. Pre-trained Diffusion Models: Direct utilization of existing models for LID estimation
  4. Manifold Learning Research: Tool for understanding data geometric structure

References

The paper cites multiple important related works, including:

  • Vincent (2011): Connection between denoising and generative modeling
  • Hyvärinen & Dayan (2005): Foundational theory of score matching
  • Kamkari et al. (2024): FLIPD method
  • Stanczuk et al. (2024): Normal bundle method
  • Related literature on diffusion models and flow matching

Overall Assessment: This is an excellent paper combining theory and practice, providing new theoretical perspectives and practical methods for LID estimation. While certain technical details could be improved, its core contributions hold significant value for understanding the geometric properties of diffusion models and advancing LID estimation methods.