2025-11-12T17:04:10.344292

Bootstrap tests for almost goodness-of-fit

BaÃllo, CÃ¡rcamo

We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{ G(\boldsymbolÎ¸) : \boldsymbolÎ¸ \in Î\}$, we consider the testing problem \[ H_0: \| F - G(\boldsymbolÎ¸_F) \|_p \geq Îµ\quad \text{vs} \quad H_1: \| F - G(\boldsymbolÎ¸_F) \|_p < Îµ, \] where $Îµ>0$ is a margin of error and $G(\boldsymbolÎ¸_F)$ denotes a representative of $F$ within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.

academic

Bootstrap tests for almost goodness-of-fit

Basic Information

Paper ID: 2410.20918
Title: Bootstrap tests for almost goodness-of-fit
Authors: Amparo Báıllo (Universidad Autónoma de Madrid), Javier Cárcamo (Universidad del Páıs Vasco)
Classification: stat.ME (Statistical Methodology), math.ST (Mathematical Statistics), stat.AP (Applied Statistics), stat.TH (Statistical Theory)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2410.20918

Abstract

This paper introduces the "almost goodness-of-fit" (AGoF) test to assess whether a parametric model adequately represents the probability distribution of an observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{G(\theta) : \theta \in \Theta\}$ , the paper considers the hypothesis testing problem: $H_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon$ where $\epsilon > 0$ is the error tolerance and $G(\theta_F)$ represents the best approximation of $F$ within the parametric class. The approximate model is determined through M-estimation, and two consistent and easily implementable bootstrap schemes are provided for conducting the test.

Research Background and Motivation

Problem Background

Traditional goodness-of-fit (GoF) tests suffer from a fundamental issue: they place the statement "the model is a reasonable approximation of the data" in the null hypothesis $H_0$ , thereby providing statistical evidence only for model "misfit" rather than for actual "goodness-of-fit."

Research Motivation

Limitations of Traditional GoF Tests: Classical methods can only reject models but cannot verify model applicability
Practical Needs: In practice, we are more concerned with whether a model is "good enough" rather than whether it is perfectly exact
Importance of Approximate Modeling: In reality, few models can perfectly describe data; some degree of deviation must be tolerated

Inadequacies of Existing Methods

The limiting distribution of Kolmogorov-Smirnov type statistics under parameter estimation is complex and non-Gaussian
Bootstrap methods are typically inconsistent when estimating sup-norms
Lack of a unified framework for handling approximate verification of parametric families

Core Contributions

Proposes the AGoF Testing Framework: Places "approximate fit" in the alternative hypothesis, enabling statistical evidence for model applicability
Uses $L^p$ Distance: Compared to traditional supremum norms, $L^p$ norms possess superior theoretical properties and computational advantages
Develops Two Bootstrap Schemes: Proves their consistency and provides practical implementation algorithms
Introduces AGoF Statistic: Quantifies the percentage improvement of the model relative to a non-informative baseline
Provides Complete Theoretical Analysis: Including asymptotic distributions, bootstrap consistency, and other theoretical guarantees

Methodology Details

Problem Formulation

Given a sample $X_1, \ldots, X_n$ from an unknown distribution $F$ and a parametric model family $\mathcal{G} = \{G(\theta) : \theta \in \Theta \subset \mathbb{R}^k\}$ , test: $H_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon$

where $\theta_F$ is determined through M-estimation: $E_F[\psi_{\theta_F}(X)] = 0$ .

Core Methodological Framework

1. Parameter Estimation

Use M-estimators to solve: $\Psi_n(\theta) = \frac{1}{n}\sum_{i=1}^n \psi_\theta(X_i) = 0$

2. Test Statistic

The standardized statistic is: $T_n(F,G(\theta_F),p) = \sqrt{n}(\|F_n - G(\hat{\theta}_n)\|_p - \|F - G(\theta_F)\|_p)$

3. Rejection Region Construction

The rejection region is: $R_n = \{\|F_n - G(\hat{\theta}_n)\|_p < \epsilon - c_n(\alpha)\}$ where $c_n(\alpha) = -Q_T(\alpha)/\sqrt{n}$ and $Q_T(\alpha)$ is the $\alpha$ -quantile of the limiting distribution.

Technical Innovations

1. Advantages of $L^p$ Distance Selection

Hadamard Differentiability: For $1 < p < \infty$ , the $L^p$ norm is Hadamard differentiable, facilitating the application of the functional delta method
Gaussian Limiting Distribution: Under general assumptions, the asymptotic distribution is Gaussian
Bootstrap Consistency: Under appropriate conditions, the standard bootstrap estimator is consistent
Flexibility: The sensitivity to distribution tails can be controlled by adjusting the $p$ value

2. Theoretical Framework

Establishes a complete asymptotic theory including:

Weak convergence of empirical processes in $L^p$ spaces
Limiting distributions of processes with estimated parameters
Consistency of bootstrap processes

Theoretical Results

Main Theorems

Theorem 1: Process Weak Convergence

Under Assumptions 1-2, $X \in L^{2/p,1}$ if and only if: $G_n(\theta_F) \rightsquigarrow G_{\theta_F} \text{ in } L^p$ where $G_{\theta_F}$ is a centered Gaussian process.

Theorem 2: Asymptotic Distribution of Test Statistic

When $p = 1$ : $T(F,G(\theta_F),1) = \int_{C_{\theta_F}} |G_{\theta_F}| + \int_{\mathbb{R}\setminus C_{\theta_F}} G_{\theta_F}\text{sgn}(F-G(\theta_F))$
When $1 < p < \infty$ : $T(F,G(\theta_F),p) = \frac{1}{\|F-G(\theta_F)\|_p^{p-1}} \int G_{\theta_F} |F-G(\theta_F)|^{p-1}\text{sgn}(F-G(\theta_F))$

Corollary 1: Normality Conditions

The limiting distribution is normal if and only if:

$p = 1$ : The Lebesgue measure of the contact set $C_{\theta_F} = \{F = G(\theta_F)\}$ is zero
$1 < p < \infty$ : $F \neq G(\theta_F)$

Bootstrap Consistency

Theorems 3 and Corollary 2 prove that under appropriate assumptions, the bootstrap statistic weakly converges to the same limiting distribution.

Experimental Design

Simulation Study Setup

Sample Sizes: $n = 30, 50, 100, 500$
Bootstrap Replicates: $B = 2000$
Significance Level: $\alpha = 0.05$
Monte Carlo Replications: 1000

Test Scenarios

Weibull vs Exponential Model: $p = 1$ , true distribution is Weibull(2,1)
Gaussian Mixture vs Normal Model: $p = 2$ , true distribution is two-component Gaussian mixture
Negative Binomial vs Poisson Model: $p = 1$ , discrete distribution case
Kumaraswamy vs Beta Model: $p = 1$ , bounded support case
Student t vs Normal Model: $p = 4$ , heavy-tailed distribution case
Lognormal vs Gamma Model: $p = 1$ , skewed distribution case

Two Bootstrap Methods

Bootstrap 1: Quantile-based method, rejection condition: $2\|F_n - G(\hat{\theta}_n)\|_p - \hat{\epsilon}^*(\alpha) < \epsilon$
Bootstrap 2: Normal approximation-based method, rejection condition: $\|F_n - G(\hat{\theta}_n)\|_p - \hat{\sigma}_{\text{boot}}z_\alpha < \epsilon$

Experimental Results

Main Findings

1. Method Performance Comparison

Moderate Sample Sizes ( $n = 500$ ): Both methods perform similarly and control test level well
Small Sample Sizes ( $n \leq 100$ ): Bootstrap 2 typically better controls nominal significance level
High AGoF Statistics (> 0.9): Bootstrap 1 performs better

2. Specific Results Example

For Weibull vs Exponential model:

$\|F - G(\theta_F)\|_1 = 0.3002$
AGoF Statistic: $G(F,G) = 0.194$ (only 19.4% improvement over constant model)
Power functions show the two methods are nearly indistinguishable at $n = 500$

3. Practical Recommendations

AGoF Statistic between 0-0.9: Recommend Bootstrap 2
AGoF Statistic exceeding 0.9: Recommend Bootstrap 1
Exercise caution when interpreting results with small sample sizes

Practical Applications

Application 1: Haiti Serology Survey

Data: 4308 IgG antibody samples (Bm33 antigen) from Haiti's national serology survey

Analysis: Test AGoF of 1-5 component normal mixture models

Two-component model performs best: $\epsilon^*_2(0.05) \approx 0.022$ ( $L^1$ ), $G^*(F,G_2) > 0.97$
Single-component normal model insufficient: improvement rate < 78%
Three or more component models show limited improvement (< 1%)

Application 2: Carbon Fiber Fracture Stress

Data: Approximately 1200 carbon fibers' tensile properties at different gauge lengths

Model Comparison: Weibull, three-parameter Weibull, skew-normal, bimodal Weibull

Main Findings:

Bimodal Weibull performs best at most gauge lengths
Model performance significantly decreases with gauge length (except bimodal Weibull)
Linear regression analysis confirms the statistical significance of this trend

Traditional Goodness-of-Fit Tests

Kolmogorov-Smirnov test and its limitations
Cramér-von Mises test's distribution dependence issues

Equivalence Testing

Wellek (2021)'s Lehmann alternative hypothesis approach
Liu and Lindsay (2009)'s tolerance regions for multinomial models
Romano (2005)'s optimal equivalence testing

Berger and Delampady (1987)'s exact hypothesis testing
Dette and Sen (2013)'s related hypothesis consistent testing procedures
Baringhaus and Henze (2024)'s neighborhood verification testing

Conclusions and Discussion

Main Conclusions

Method Effectiveness: The AGoF test successfully addresses the problem that traditional GoF tests can only provide evidence of "misfit"
Theoretical Completeness: Provides complete asymptotic theory and bootstrap consistency proofs
Practicality: Two bootstrap schemes are easy to implement and applicable to a wide range of parametric models

Limitations

Integrability Conditions: Requires satisfaction of $X \in L^{2/p,1}$ condition, limiting applicability
Parameter Selection: The choice of error tolerance $\epsilon$ still requires domain expertise
Computational Complexity: Higher computational cost compared to simple GoF tests

Future Directions

Multivariate Extension: Extend the method to multivariate distribution cases
Nonparametric Alternatives: Consider approximate verification for nonparametric or semiparametric models
Adaptive Methods: Develop data-driven methods for automatic selection of $\epsilon$

In-Depth Evaluation

Strengths

Theoretical Innovation: First systematic placement of "approximate fit" in the alternative hypothesis, representing an important conceptual breakthrough
Methodological Completeness: Very comprehensive from theoretical analysis to implementation algorithms
Practical Value: AGoF statistic provides intuitive model quality measurement
Technical Advantages: $L^p$ distance selection has clear advantages in both theory and computation

Weaknesses

Assumption Conditions: M-estimation framework and integrability conditions may limit applicability
Parameter Tuning: Lack of systematic guidance for selecting $p$ value and $\epsilon$
Computational Efficiency: High computational cost of bootstrap process

Impact

Academic Contribution: Provides new research direction for goodness-of-fit testing field
Practical Value: Important application prospects in model selection and verification
Reproducibility: Complete theoretical results and clear algorithm descriptions facilitate reproduction

Applicable Scenarios

Situations requiring verification of parametric model applicability
Model selection and comparison
Model verification in regulatory and quality control contexts
Distribution model assessment in risk management

References

The paper cites abundant relevant literature covering multiple fields including empirical process theory, M-estimation, and bootstrap methods, providing a solid theoretical foundation for the research.