2025-11-13T21:58:11.125664

Hypothesis testing for the dimension of random geometric graph

Yuan, Yu
Random geometric graphs (RGGs) offer a powerful tool for analyzing the geometric and dependence structures in real-world networks. For example, it has been observed that RGGs are a good model for protein-protein interaction networks. In RGGs, nodes are randomly distributed over an $m$-dimensional metric space, and edges connect the nodes if and only if their distance is less than some threshold. When fitting RGGs to real-world networks, the first step is probably to input or estimate the dimension $m$. However, it is not clear whether the prespecified dimension is equal to the true dimension. In this paper, we investigate this problem using hypothesis testing. Under the null hypothesis, the dimension is equal to a specific value, while the alternative hypothesis asserts the dimension is not equal to that value. We propose the first statistical test. Under the null hypothesis, the proposed test statistic converges in law to the standard normal distribution, and under the alternative hypothesis, the test statistic is unbounded in probability. We derive the asymptotic distribution by leveraging the asymptotic theory of degenerate U-statistics with kernel function dependent on the number of nodes. This approach differs significantly from prevailing methods used in network hypothesis testing problems. Moreover, we also propose an efficient approach to compute the test statistic based on the adjacency matrix. Simulation studies show that the proposed test performs well. We also apply the proposed test to multiple real-world networks to test their dimensions.
academic

Hypothesis testing for the dimension of random geometric graph

Basic Information

  • Paper ID: 2510.11844
  • Title: Hypothesis testing for the dimension of random geometric graph
  • Authors: Mingao Yuan, Feng Yu (The University of Texas at El Paso)
  • Classification: stat.ME (Statistics - Methodology)
  • Publication Date: October 13, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.11844

Abstract

Random geometric graphs (RGGs) provide powerful tools for analyzing geometric and dependency structures in real-world networks. In RGGs, nodes are randomly distributed in an m-dimensional metric space and connected by edges if and only if the distance between nodes is below a certain threshold. When fitting RGGs to real networks, a primary step is to input or estimate the dimension m. However, it remains unclear whether the preset dimension equals the true dimension. This paper addresses this question through hypothesis testing: the null hypothesis states that the dimension equals a specific value, while the alternative hypothesis states that the dimension differs from that value. The authors propose the first statistical testing method, where the test statistic converges in distribution to a standard normal distribution under the null hypothesis and becomes unbounded in probability under the alternative hypothesis.

Research Background and Motivation

Problem Definition

  1. Core Problem: When fitting random geometric graphs to real networks, how can one verify whether the preset or estimated dimension m equals the true dimension?
  2. Practical Need: In existing research, researchers typically assume dimension values directly (e.g., assuming m=2,3,4 in protein interaction networks), but lack statistical verification methods
  3. Application Importance: RGGs are widely applied in protein interaction networks, social networks, brain networks, and other domains

Research Motivation

  1. Methodological Gap: This is the first hypothesis testing method for RGG dimension
  2. Theoretical Challenge: Requires handling asymptotic theory of degenerate U-statistics with kernel functions dependent on network size
  3. Practical Value: Provides rigorous dimension verification tools for network analysis

Core Contributions

  1. Novel Method: Proposes the first statistical method for hypothesis testing of random geometric graph dimension
  2. Theoretical Innovation:
    • Establishes asymptotic distribution of test statistic based on degenerate U-statistics theory
    • Kernel function depends on sample size n, differing from standard U-statistics theory
  3. Computational Efficiency: Provides efficient computation methods based on adjacency matrices, avoiding multiple nested loops
  4. Theoretical Guarantees:
    • Test statistic converges to standard normal distribution under null hypothesis
    • Test power approaches 1 under alternative hypothesis
  5. Empirical Verification: Validates method effectiveness on simulated data and 6 real networks

Methodology Details

Task Definition

Given network adjacency matrix A ~ G_n(m, r_n), test the hypothesis:

  • H_0: m = m_0 (null hypothesis: dimension equals preset value m_0)
  • H_1: m ≠ m_0 (alternative hypothesis: dimension differs from m_0)

Random Geometric Graph Model

Definition: On the unit hypercube 0,1^m, nodes X_i are independently and uniformly distributed. Distance is defined as:

d(X_i, X_j) = max_{1≤k≤m} {min{|X_{ik} - X_{jk}|, 1 - |X_{ik} - X_{jk}|}}

Nodes i and j are connected by an edge when d(X_i, X_j) ≤ r_n.

Test Statistic Construction

The core statistic D_n is defined as:

D_n = Σ_{i≠j≠k} A_{ij}A_{jk}A_{ki} - (3/4)^{m_0} Σ_{i≠j≠k} A_{ij}A_{ik}

Design Rationale:

  • First term counts the number of triangles in the network
  • Second term is the expected value correction under null hypothesis
  • Under H_0, D_n ≈ 0; under H_1, D_n deviates significantly from 0

Asymptotic Distribution Theory

Main Theorem: Under conditions r_n = o(1) and nr_n^m = ω(1), under null hypothesis H_0:

√(2D_n)/(n²σ̂_{n2}) ⇒ N(0,1)

where the variance estimator σ̂²_ is given by a linear combination of five statistics S_1 through S_5.

Technical Innovations

  1. Degenerate U-statistics Handling:
    • Expresses D_n as degenerate U-statistics form
    • Handles non-standard case where kernel function depends on n
    • Applies asymptotic theory from Fan-Li (1996)
  2. Matrix Computation Optimization:
    D_n = tr(A³) + 2tr(A) - (3/4)^{m_0}(1^T(A² - A)1 + 2tr(A))
    S_1 = 1^T[A² ⊙ A² ⊙ A - A² ⊙ A]1
    

    Avoids O(n⁴) nested loop computation
  3. Power Analysis: Under alternative hypothesis, test statistic has order Θ(n√(r_n^m)), ensuring test power approaches 1

Experimental Setup

Simulation Experiments

  1. Parameter Settings:
    • Network size: n ∈ {40, 50, 60, 70, 100, 130}
    • Connection radius: r_n ∈ {0.09, 0.10, 0.11, 0.27, 0.29, 0.31}
    • Dimension: m ∈ {1, 2, 3}
    • Significance level: α = 0.05
  2. Experimental Design:
    • Type I error: Generate 1000 networks under null hypothesis
    • Test power: Generate 1000 networks under alternative hypothesis

Real Data

Tested on 6 real networks:

  1. Cheminformatics Networks (4): ENZYMES series, nodes as compounds
  2. Brain Network (1): macaque-rhesus-brain-2, nodes as brain regions
  3. Social Network (1): reptilia-tortoise-network-bsv, tortoise social network

Evaluation Metrics

  1. Type I Error Rate: Probability of rejecting null hypothesis when true
  2. Test Power: Probability of rejecting null hypothesis when alternative is true
  3. p-value: Used for dimension inference on real networks

Experimental Results

Simulation Results

Type I Error Control:

  • Empirical Type I error rates across all settings range from 0.040-0.064, close to nominal level 0.05
  • Indicates asymptotic normality approximation performs well in finite samples

Test Power:

  • H_0: m=1, power for m=2 ranges from 0.920-1.000, power for m=3 ranges from 0.645-0.997
  • H_0: m=2, power for m=1 is consistently 1.000, power for m=3 ranges from 0.927-1.000
  • Power increases with n and r_n, consistent with theoretical predictions

Real Network Results

NetworknDensityInferred Dimensionp-value
ENZYMES-g147400.210m=20.696
ENZYMES-g196500.138m=30.653
ENZYMES-g532740.085m=50.140
macaque-rhesus-brain-2910.152m=30.161
reptilia-tortoise-network-bsv1360.040m=40.162

Key Finding: Different networks exhibit different dimensions, emphasizing the importance of dimension testing.

Random Geometric Graph Theory

  1. Classical Literature: Foundational work by Penrose et al.
  2. Recent Developments: Survey by Duchemin & De Castro (2023)
  3. Dimension Estimation: Consistent estimation methods by Atamanchuk et al. (2024)

Network Hypothesis Testing

  1. Graph Structure Testing: Gao & Lafferty (2017), Jin et al. (2018)
  2. Community Structure Testing: Lei (2016), Yuan et al. (2022)
  3. Paper Innovation: First hypothesis testing for geometric graph dimension

Application Domains

  1. Biological Networks: Applications in protein networks by Higham et al. (2008)
  2. Brain Networks: Functional connectivity network analysis
  3. Social Networks: Opinion propagation and spatial distribution modeling

Conclusions and Discussion

Main Conclusions

  1. Theoretical Contribution: Establishes complete theoretical framework for RGG dimension hypothesis testing
  2. Method Validity: Simulation and empirical results verify method reliability
  3. Practical Value: Provides important statistical tools for network analysis

Limitations

  1. Model Assumptions:
    • Assumes nodes uniformly distributed on unit hypercube
    • Uses specific distance metric function
    • Requires sparse networks (r_n = o(1))
  2. Computational Complexity: While optimized, may face challenges for ultra-large-scale networks
  3. Dimension Range: Primarily validated in low-dimensional cases; high-dimensional performance requires further investigation

Future Directions

  1. Model Extension: Consider non-uniform distributions, other distance metrics
  2. High-Dimensional Cases: Develop testing methods for high-dimensional RGGs
  3. Multiple Testing: Methods for simultaneously testing multiple dimension values
  4. Bayesian Methods: Develop Bayesian inference methods for dimension

In-Depth Evaluation

Strengths

  1. Theoretical Rigor:
    • Based on solid U-statistics theory
    • Complete asymptotic analysis and power study
    • Rigorous mathematical proofs
  2. Methodological Innovation:
    • First RGG dimension testing method
    • Clever test statistic design
    • Efficient computational implementation
  3. Comprehensive Experiments:
    • Sufficient simulation verification
    • Diverse real network testing
    • Detailed performance analysis
  4. Practical Value:
    • Addresses practical needs
    • Easy to implement and apply
    • Lays foundation for subsequent research

Weaknesses

  1. Application Scope:
    • Only applicable to sparse networks
    • Sensitive to model assumptions
    • Real networks may not fully conform to RGG model
  2. Method Limitations:
    • Only enables two-sided testing
    • Does not account for estimation error effects
    • Robustness to outliers insufficiently studied
  3. Experimental Depth:
    • Relatively limited number of real networks
    • Lacks comparison with other dimension estimation methods
    • Insufficient analysis of method failure cases

Impact

  1. Academic Value:
    • Fills important methodological gap
    • Provides new tools for network analysis
    • May catalyze related research directions
  2. Practical Significance:
    • Direct applications in bioinformatics, social network analysis
    • Improves scientific rigor of network modeling
    • Provides statistical basis for model selection
  3. Reproducibility:
    • Provides detailed computational formulas
    • Clear algorithm description
    • Facilitates software implementation

Applicable Scenarios

  1. Biological Networks: Dimension verification for protein interaction networks
  2. Social Networks: Dimension selection for spatial embedding models
  3. Brain Networks: Geometric structure analysis of functional connectivity networks
  4. Communication Networks: Topology analysis of wireless sensor networks

References

This paper cites 40 important references covering random geometric graph theory, network analysis, and statistical theory, providing solid theoretical foundation. Key references include Fan & Li (1996) on U-statistics theory, Higham et al. (2008) on protein network applications, and recent related survey articles.


Overall Assessment: This is a high-quality statistical methodology paper with excellent performance in theoretical innovation, method design, and experimental verification. Despite some limitations, it makes important contributions to the network analysis field with significant academic value and practical significance.