2025-11-12T17:04:10.344292

Bootstrap tests for almost goodness-of-fit

Baíllo, Cárcamo
We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{ G(\boldsymbolθ) : \boldsymbolθ \in Θ\}$, we consider the testing problem \[ H_0: \| F - G(\boldsymbolθ_F) \|_p \geq ε\quad \text{vs} \quad H_1: \| F - G(\boldsymbolθ_F) \|_p < ε, \] where $ε>0$ is a margin of error and $G(\boldsymbolθ_F)$ denotes a representative of $F$ within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.
academic

Bootstrap tests for almost goodness-of-fit

Basic Information

  • Paper ID: 2410.20918
  • Title: Bootstrap tests for almost goodness-of-fit
  • Authors: Amparo Báıllo (Universidad Autónoma de Madrid), Javier Cárcamo (Universidad del Páıs Vasco)
  • Classification: stat.ME (Statistical Methodology), math.ST (Mathematical Statistics), stat.AP (Applied Statistics), stat.TH (Statistical Theory)
  • Publication Date: October 15, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2410.20918

Abstract

This paper introduces the "almost goodness-of-fit" (AGoF) test to assess whether a parametric model adequately represents the probability distribution of an observed sample. Specifically, given a distribution function FF and a parametric family G={G(θ):θΘ}\mathcal{G}=\{G(\theta) : \theta \in \Theta\}, the paper considers the hypothesis testing problem: H0:FG(θF)pϵvsH1:FG(θF)p<ϵH_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon where ϵ>0\epsilon > 0 is the error tolerance and G(θF)G(\theta_F) represents the best approximation of FF within the parametric class. The approximate model is determined through M-estimation, and two consistent and easily implementable bootstrap schemes are provided for conducting the test.

Research Background and Motivation

Problem Background

Traditional goodness-of-fit (GoF) tests suffer from a fundamental issue: they place the statement "the model is a reasonable approximation of the data" in the null hypothesis H0H_0, thereby providing statistical evidence only for model "misfit" rather than for actual "goodness-of-fit."

Research Motivation

  1. Limitations of Traditional GoF Tests: Classical methods can only reject models but cannot verify model applicability
  2. Practical Needs: In practice, we are more concerned with whether a model is "good enough" rather than whether it is perfectly exact
  3. Importance of Approximate Modeling: In reality, few models can perfectly describe data; some degree of deviation must be tolerated

Inadequacies of Existing Methods

  • The limiting distribution of Kolmogorov-Smirnov type statistics under parameter estimation is complex and non-Gaussian
  • Bootstrap methods are typically inconsistent when estimating sup-norms
  • Lack of a unified framework for handling approximate verification of parametric families

Core Contributions

  1. Proposes the AGoF Testing Framework: Places "approximate fit" in the alternative hypothesis, enabling statistical evidence for model applicability
  2. Uses LpL^p Distance: Compared to traditional supremum norms, LpL^p norms possess superior theoretical properties and computational advantages
  3. Develops Two Bootstrap Schemes: Proves their consistency and provides practical implementation algorithms
  4. Introduces AGoF Statistic: Quantifies the percentage improvement of the model relative to a non-informative baseline
  5. Provides Complete Theoretical Analysis: Including asymptotic distributions, bootstrap consistency, and other theoretical guarantees

Methodology Details

Problem Formulation

Given a sample X1,,XnX_1, \ldots, X_n from an unknown distribution FF and a parametric model family G={G(θ):θΘRk}\mathcal{G} = \{G(\theta) : \theta \in \Theta \subset \mathbb{R}^k\}, test: H0:FG(θF)pϵvsH1:FG(θF)p<ϵH_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon

where θF\theta_F is determined through M-estimation: EF[ψθF(X)]=0E_F[\psi_{\theta_F}(X)] = 0.

Core Methodological Framework

1. Parameter Estimation

Use M-estimators to solve: Ψn(θ)=1ni=1nψθ(Xi)=0\Psi_n(\theta) = \frac{1}{n}\sum_{i=1}^n \psi_\theta(X_i) = 0

2. Test Statistic

The standardized statistic is: Tn(F,G(θF),p)=n(FnG(θ^n)pFG(θF)p)T_n(F,G(\theta_F),p) = \sqrt{n}(\|F_n - G(\hat{\theta}_n)\|_p - \|F - G(\theta_F)\|_p)

3. Rejection Region Construction

The rejection region is: Rn={FnG(θ^n)p<ϵcn(α)}R_n = \{\|F_n - G(\hat{\theta}_n)\|_p < \epsilon - c_n(\alpha)\} where cn(α)=QT(α)/nc_n(\alpha) = -Q_T(\alpha)/\sqrt{n} and QT(α)Q_T(\alpha) is the α\alpha-quantile of the limiting distribution.

Technical Innovations

1. Advantages of LpL^p Distance Selection

  • Hadamard Differentiability: For 1<p<1 < p < \infty, the LpL^p norm is Hadamard differentiable, facilitating the application of the functional delta method
  • Gaussian Limiting Distribution: Under general assumptions, the asymptotic distribution is Gaussian
  • Bootstrap Consistency: Under appropriate conditions, the standard bootstrap estimator is consistent
  • Flexibility: The sensitivity to distribution tails can be controlled by adjusting the pp value

2. Theoretical Framework

Establishes a complete asymptotic theory including:

  • Weak convergence of empirical processes in LpL^p spaces
  • Limiting distributions of processes with estimated parameters
  • Consistency of bootstrap processes

Theoretical Results

Main Theorems

Theorem 1: Process Weak Convergence

Under Assumptions 1-2, XL2/p,1X \in L^{2/p,1} if and only if: Gn(θF)GθF in LpG_n(\theta_F) \rightsquigarrow G_{\theta_F} \text{ in } L^p where GθFG_{\theta_F} is a centered Gaussian process.

Theorem 2: Asymptotic Distribution of Test Statistic

  • When p=1p = 1: T(F,G(θF),1)=CθFGθF+RCθFGθFsgn(FG(θF))T(F,G(\theta_F),1) = \int_{C_{\theta_F}} |G_{\theta_F}| + \int_{\mathbb{R}\setminus C_{\theta_F}} G_{\theta_F}\text{sgn}(F-G(\theta_F))
  • When 1<p<1 < p < \infty: T(F,G(θF),p)=1FG(θF)pp1GθFFG(θF)p1sgn(FG(θF))T(F,G(\theta_F),p) = \frac{1}{\|F-G(\theta_F)\|_p^{p-1}} \int G_{\theta_F} |F-G(\theta_F)|^{p-1}\text{sgn}(F-G(\theta_F))

Corollary 1: Normality Conditions

The limiting distribution is normal if and only if:

  • p=1p = 1: The Lebesgue measure of the contact set CθF={F=G(θF)}C_{\theta_F} = \{F = G(\theta_F)\} is zero
  • 1<p<1 < p < \infty: FG(θF)F \neq G(\theta_F)

Bootstrap Consistency

Theorems 3 and Corollary 2 prove that under appropriate assumptions, the bootstrap statistic weakly converges to the same limiting distribution.

Experimental Design

Simulation Study Setup

  • Sample Sizes: n=30,50,100,500n = 30, 50, 100, 500
  • Bootstrap Replicates: B=2000B = 2000
  • Significance Level: α=0.05\alpha = 0.05
  • Monte Carlo Replications: 1000

Test Scenarios

  1. Weibull vs Exponential Model: p=1p = 1, true distribution is Weibull(2,1)
  2. Gaussian Mixture vs Normal Model: p=2p = 2, true distribution is two-component Gaussian mixture
  3. Negative Binomial vs Poisson Model: p=1p = 1, discrete distribution case
  4. Kumaraswamy vs Beta Model: p=1p = 1, bounded support case
  5. Student t vs Normal Model: p=4p = 4, heavy-tailed distribution case
  6. Lognormal vs Gamma Model: p=1p = 1, skewed distribution case

Two Bootstrap Methods

  • Bootstrap 1: Quantile-based method, rejection condition: 2FnG(θ^n)pϵ^(α)<ϵ2\|F_n - G(\hat{\theta}_n)\|_p - \hat{\epsilon}^*(\alpha) < \epsilon
  • Bootstrap 2: Normal approximation-based method, rejection condition: FnG(θ^n)pσ^bootzα<ϵ\|F_n - G(\hat{\theta}_n)\|_p - \hat{\sigma}_{\text{boot}}z_\alpha < \epsilon

Experimental Results

Main Findings

1. Method Performance Comparison

  • Moderate Sample Sizes (n=500n = 500): Both methods perform similarly and control test level well
  • Small Sample Sizes (n100n \leq 100): Bootstrap 2 typically better controls nominal significance level
  • High AGoF Statistics (> 0.9): Bootstrap 1 performs better

2. Specific Results Example

For Weibull vs Exponential model:

  • FG(θF)1=0.3002\|F - G(\theta_F)\|_1 = 0.3002
  • AGoF Statistic: G(F,G)=0.194G(F,G) = 0.194 (only 19.4% improvement over constant model)
  • Power functions show the two methods are nearly indistinguishable at n=500n = 500

3. Practical Recommendations

  • AGoF Statistic between 0-0.9: Recommend Bootstrap 2
  • AGoF Statistic exceeding 0.9: Recommend Bootstrap 1
  • Exercise caution when interpreting results with small sample sizes

Practical Applications

Application 1: Haiti Serology Survey

Data: 4308 IgG antibody samples (Bm33 antigen) from Haiti's national serology survey

Analysis: Test AGoF of 1-5 component normal mixture models

  • Two-component model performs best: ϵ2(0.05)0.022\epsilon^*_2(0.05) \approx 0.022 (L1L^1), G(F,G2)>0.97G^*(F,G_2) > 0.97
  • Single-component normal model insufficient: improvement rate < 78%
  • Three or more component models show limited improvement (< 1%)

Application 2: Carbon Fiber Fracture Stress

Data: Approximately 1200 carbon fibers' tensile properties at different gauge lengths

Model Comparison: Weibull, three-parameter Weibull, skew-normal, bimodal Weibull

Main Findings:

  • Bimodal Weibull performs best at most gauge lengths
  • Model performance significantly decreases with gauge length (except bimodal Weibull)
  • Linear regression analysis confirms the statistical significance of this trend

Traditional Goodness-of-Fit Tests

  • Kolmogorov-Smirnov test and its limitations
  • Cramér-von Mises test's distribution dependence issues

Equivalence Testing

  • Wellek (2021)'s Lehmann alternative hypothesis approach
  • Liu and Lindsay (2009)'s tolerance regions for multinomial models
  • Romano (2005)'s optimal equivalence testing
  • Berger and Delampady (1987)'s exact hypothesis testing
  • Dette and Sen (2013)'s related hypothesis consistent testing procedures
  • Baringhaus and Henze (2024)'s neighborhood verification testing

Conclusions and Discussion

Main Conclusions

  1. Method Effectiveness: The AGoF test successfully addresses the problem that traditional GoF tests can only provide evidence of "misfit"
  2. Theoretical Completeness: Provides complete asymptotic theory and bootstrap consistency proofs
  3. Practicality: Two bootstrap schemes are easy to implement and applicable to a wide range of parametric models

Limitations

  1. Integrability Conditions: Requires satisfaction of XL2/p,1X \in L^{2/p,1} condition, limiting applicability
  2. Parameter Selection: The choice of error tolerance ϵ\epsilon still requires domain expertise
  3. Computational Complexity: Higher computational cost compared to simple GoF tests

Future Directions

  1. Multivariate Extension: Extend the method to multivariate distribution cases
  2. Nonparametric Alternatives: Consider approximate verification for nonparametric or semiparametric models
  3. Adaptive Methods: Develop data-driven methods for automatic selection of ϵ\epsilon

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: First systematic placement of "approximate fit" in the alternative hypothesis, representing an important conceptual breakthrough
  2. Methodological Completeness: Very comprehensive from theoretical analysis to implementation algorithms
  3. Practical Value: AGoF statistic provides intuitive model quality measurement
  4. Technical Advantages: LpL^p distance selection has clear advantages in both theory and computation

Weaknesses

  1. Assumption Conditions: M-estimation framework and integrability conditions may limit applicability
  2. Parameter Tuning: Lack of systematic guidance for selecting pp value and ϵ\epsilon
  3. Computational Efficiency: High computational cost of bootstrap process

Impact

  1. Academic Contribution: Provides new research direction for goodness-of-fit testing field
  2. Practical Value: Important application prospects in model selection and verification
  3. Reproducibility: Complete theoretical results and clear algorithm descriptions facilitate reproduction

Applicable Scenarios

  • Situations requiring verification of parametric model applicability
  • Model selection and comparison
  • Model verification in regulatory and quality control contexts
  • Distribution model assessment in risk management

References

The paper cites abundant relevant literature covering multiple fields including empirical process theory, M-estimation, and bootstrap methods, providing a solid theoretical foundation for the research.