2025-11-12T17:04:10.344292

Bootstrap tests for almost goodness-of-fit

BaÃllo, CÃ¡rcamo

We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{ G(\boldsymbolÎ¸) : \boldsymbolÎ¸ \in Î\}$, we consider the testing problem \[ H_0: \| F - G(\boldsymbolÎ¸_F) \|_p \geq Îµ\quad \text{vs} \quad H_1: \| F - G(\boldsymbolÎ¸_F) \|_p < Îµ, \] where $Îµ>0$ is a margin of error and $G(\boldsymbolÎ¸_F)$ denotes a representative of $F$ within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.

academic

Bootstrap tests for almost goodness-of-fit

基本信息

论文ID: 2410.20918
标题: Bootstrap tests for almost goodness-of-fit
作者: Amparo Báıllo (Universidad Autónoma de Madrid), Javier Cárcamo (Universidad del Páıs Vasco)
分类: stat.ME (统计方法论), math.ST (数理统计), stat.AP (应用统计), stat.TH (统计理论)
发表时间: October 15, 2025 (arXiv预印本)
论文链接: https://arxiv.org/abs/2410.20918

摘要

本文引入了"近似拟合优度"(almost goodness-of-fit, AGoF)检验，用于评估参数模型是否能够很好地表示观测样本的概率分布。具体地，给定分布函数 $F$ 和参数族 $\mathcal{G}=\{G(\theta) : \theta \in \Theta\}$ ，考虑假设检验问题： $H_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon$ 其中 $\epsilon > 0$ 是误差容限， $G(\theta_F)$ 表示 $F$ 在参数类中的代表。通过M-估计确定近似模型，并提供了两种一致且易于实现的bootstrap方案来执行检验。

研究背景与动机

问题背景

传统的拟合优度检验存在一个根本性问题：它们将"模型是数据的合理近似"这一陈述置于零假设 $H_0$ 中，因此只能为模型的"不拟合"提供统计证据，而无法为实际的"拟合优度"提供证据。

研究动机

传统GoF检验的局限性：经典方法只能拒绝模型，无法验证模型的适用性
实际需求：在实践中，我们更关心模型是否"足够好"，而非是否完全精确
近似建模的重要性：现实中很少有模型能完美描述数据，需要容忍一定程度的偏差

现有方法的不足

Kolmogorov-Smirnov类统计量在参数估计情况下的极限分布复杂且非高斯
Bootstrap方法在估计sup-范数时通常不一致
缺乏统一的框架来处理参数族的近似验证

核心贡献

提出AGoF检验框架：将"近似拟合"置于备择假设中，能够为模型的适用性提供统计证据
使用 $L^p$ 距离：相比传统的supremum范数， $L^p$ 范数具有更好的理论性质和计算优势
开发了两种bootstrap方案：证明了它们的一致性，并提供了实用的实现算法
引入AGoF统计量：量化模型相对于非信息性基准的改进百分比
提供完整的理论分析：包括渐近分布、bootstrap一致性等理论保证

Hadamard可微性：对于 $1 < p < \infty$ ， $L^p$ 范数是Hadamard可微的，便于应用函数delta方法
高斯极限：在一般假设下，渐近分布是高斯的
Bootstrap一致性：在适当条件下，标准bootstrap估计量是一致的
灵活性：通过调节 $p$ 值可以控制对分布尾部的敏感度

2. 理论框架

建立了完整的渐近理论，包括：

经验过程在 $L^p$ 空间中的弱收敛
带估计参数的过程的极限分布
Bootstrap过程的一致性

当 $p = 1$ 时： $T(F,G(\theta_F),1) = \int_{C_{\theta_F}} |G_{\theta_F}| + \int_{\mathbb{R}\setminus C_{\theta_F}} G_{\theta_F}\text{sgn}(F-G(\theta_F))$
当 $1 < p < \infty$ 时： $T(F,G(\theta_F),p) = \frac{1}{\|F-G(\theta_F)\|_p^{p-1}} \int G_{\theta_F} |F-G(\theta_F)|^{p-1}\text{sgn}(F-G(\theta_F))$

推论1：正态性条件

极限分布为正态的充要条件：

$p = 1$ ：接触集 $C_{\theta_F} = \{F = G(\theta_F)\}$ 的Lebesgue测度为零
$1 < p < \infty$ ： $F \neq G(\theta_F)$

样本量： $n = 30, 50, 100, 500$
Bootstrap次数： $B = 2000$
显著性水平： $\alpha = 0.05$
Monte Carlo重复：1000次

测试场景

Weibull vs 指数模型： $p = 1$ ，真实分布为Weibull(2,1)
高斯混合 vs 正态模型： $p = 2$ ，真实分布为两分量高斯混合
负二项 vs 泊松模型： $p = 1$ ，离散分布情况
Kumaraswamy vs Beta模型： $p = 1$ ，有界支撑情况
Student t vs 正态模型： $p = 4$ ，重尾分布情况
对数正态 vs Gamma模型： $p = 1$ ，偏态分布情况

两种Bootstrap方法

Bootstrap 1：基于分位数的方法，拒绝条件： $2\|F_n - G(\hat{\theta}_n)\|_p - \hat{\epsilon}^*(\alpha) < \epsilon$
Bootstrap 2：基于正态近似的方法，拒绝条件： $\|F_n - G(\hat{\theta}_n)\|_p - \hat{\sigma}_{\text{boot}}z_\alpha < \epsilon$