2025-11-13T21:58:11.125664

Hypothesis testing for the dimension of random geometric graph

Yuan, Yu

Random geometric graphs (RGGs) offer a powerful tool for analyzing the geometric and dependence structures in real-world networks. For example, it has been observed that RGGs are a good model for protein-protein interaction networks. In RGGs, nodes are randomly distributed over an $m$-dimensional metric space, and edges connect the nodes if and only if their distance is less than some threshold. When fitting RGGs to real-world networks, the first step is probably to input or estimate the dimension $m$. However, it is not clear whether the prespecified dimension is equal to the true dimension. In this paper, we investigate this problem using hypothesis testing. Under the null hypothesis, the dimension is equal to a specific value, while the alternative hypothesis asserts the dimension is not equal to that value. We propose the first statistical test. Under the null hypothesis, the proposed test statistic converges in law to the standard normal distribution, and under the alternative hypothesis, the test statistic is unbounded in probability. We derive the asymptotic distribution by leveraging the asymptotic theory of degenerate U-statistics with kernel function dependent on the number of nodes. This approach differs significantly from prevailing methods used in network hypothesis testing problems. Moreover, we also propose an efficient approach to compute the test statistic based on the adjacency matrix. Simulation studies show that the proposed test performs well. We also apply the proposed test to multiple real-world networks to test their dimensions.

academic

Hypothesis testing for the dimension of random geometric graph

基本信息

论文ID: 2510.11844
标题: Hypothesis testing for the dimension of random geometric graph
作者: Mingao Yuan, Feng Yu (The University of Texas at El Paso)
分类: stat.ME (Statistics - Methodology)
发表时间: 2025年10月13日 (arXiv预印本)
论文链接: https://arxiv.org/abs/2510.11844

摘要

随机几何图(RGGs)为分析真实网络中的几何和依赖结构提供了强大工具。在RGGs中，节点随机分布在m维度量空间中，当且仅当节点间距离小于某个阈值时才通过边连接。在将RGGs拟合到真实网络时，首要步骤是输入或估计维度m。然而，预设维度是否等于真实维度并不明确。本文通过假设检验研究这一问题：零假设为维度等于特定值，备择假设为维度不等于该值。作者提出了首个统计检验方法，在零假设下检验统计量依分布收敛到标准正态分布，在备择假设下检验统计量在概率意义下无界。

研究背景与动机

问题定义

核心问题: 在将随机几何图拟合到真实网络时，如何验证预设或估计的维度m是否等于真实维度
实际需求: 现有研究中，研究者通常直接假设维度值（如蛋白质相互作用网络中假设m=2,3,4），但缺乏统计验证方法
应用重要性: RGGs广泛应用于蛋白质相互作用网络、社交网络、脑网络等多个领域

研究动机

方法论空白: 这是首个针对RGG维度的假设检验方法
理论挑战: 需要处理退化U统计量的渐近理论，其核函数依赖于网络规模
实用价值: 为网络分析提供rigorous的维度验证工具

核心贡献

首创性方法: 提出了第一个用于随机几何图维度假设检验的统计方法
理论创新:
- 基于退化U统计量理论建立了检验统计量的渐近分布
- 核函数依赖于样本量n，不同于标准U统计量理论
计算效率: 提供了基于邻接矩阵的高效计算方法，避免了多重嵌套循环
理论保证:
- 零假设下统计量收敛到标准正态分布
- 备择假设下检验功效趋于1
实证验证: 在模拟数据和6个真实网络上验证了方法的有效性

方法详解

任务定义

给定网络邻接矩阵A ~ G_n(m, r_n)，检验假设：

H_0: m = m_0 (零假设：维度等于预设值m_0)
H_1: m ≠ m_0 (备择假设：维度不等于m_0)

随机几何图模型

定义: 在单位正方形0,1^m上，节点X_i独立均匀分布，距离定义为：

d(X_i, X_j) = max_{1≤k≤m} {min{|X_{ik} - X_{jk}|, 1 - |X_{ik} - X_{jk}|}}

当d(X_i, X_j) ≤ r_n时，节点i和j之间存在边。

检验统计量构造

核心统计量D_n定义为：

D_n = Σ_{i≠j≠k} A_{ij}A_{jk}A_{ki} - (3/4)^{m_0} Σ_{i≠j≠k} A_{ij}A_{ik}

设计思想:

第一项计算网络中三角形数量
第二项是零假设下的期望修正
在零假设下D_n ≈ 0，在备择假设下D_n显著偏离0

渐近分布理论

主要定理: 在条件r_n = o(1)和nr_n^m = ω(1)下，零假设H_0下：

√(2D_n)/(n²σ̂_{n2}) ⇒ N(0,1)

其中方差估计量σ̂²_由五个统计量S_1至S_5的线性组合给出。

技术创新点

退化U统计量处理:
- 将D_n表示为退化U统计量形式
- 处理核函数依赖于n的非标准情况
- 应用Fan-Li (1996)的渐近理论

矩阵计算优化:

D_n = tr(A³) + 2tr(A) - (3/4)^{m_0}(1^T(A² - A)1 + 2tr(A))
S_1 = 1^T[A² ⊙ A² ⊙ A - A² ⊙ A]1

避免了O(n⁴)的嵌套循环计算

功效分析: 备择假设下统计量的阶为Θ(n√(r_n^m))，保证检验功效趋于1

实验设置

模拟实验

参数设置:
- 网络规模: n ∈ {40, 50, 60, 70, 100, 130}
- 连接半径: r_n ∈ {0.09, 0.10, 0.11, 0.27, 0.29, 0.31}
- 维度: m ∈ {1, 2, 3}
- 显著性水平: α = 0.05
实验设计:
- 第一类错误：生成1000个零假设下的网络
- 检验功效：生成1000个备择假设下的网络

真实数据

测试了6个真实网络：

化学信息学网络 (4个): ENZYMES系列，节点为化合物
脑网络 (1个): macaque-rhesus-brain-2，节点为脑区域
社交网络 (1个): reptilia-tortoise-network-bsv，乌龟社交网络

评价指标

第一类错误率: 零假设为真时拒绝的概率
检验功效: 备择假设为真时拒绝零假设的概率
p值: 用于实际网络的维度推断

实验结果

模拟结果

第一类错误控制:

所有设置下经验第一类错误率在0.040-0.064之间，接近名义水平0.05
表明渐近正态分布近似在有限样本下表现良好

检验功效:

H_0: m=1时，m=2的功效在0.920-1.000之间，m=3的功效在0.645-0.997之间
H_0: m=2时，m=1的功效恒为1.000，m=3的功效在0.927-1.000之间
功效随n和r_n增加而提高，符合理论预期

真实网络结果

网络	n	密度	推断维度	p值
ENZYMES-g147	40	0.210	m=2	0.696
ENZYMES-g196	50	0.138	m=3	0.653
ENZYMES-g532	74	0.085	m=5	0.140
macaque-rhesus-brain-2	91	0.152	m=3	0.161
reptilia-tortoise-network-bsv	136	0.040	m=4	0.162