We introduce the \textit{almost goodness-of-fit} test, a procedure to assess whether a (parametric) model provides a good representation of the probability distribution generating the observed sample. Specifically, given a distribution function $F$ and a parametric family $\mathcal{G}=\{ G(\boldsymbolθ) : \boldsymbolθ \in Î\}$, we consider the testing problem \[ H_0: \| F - G(\boldsymbolθ_F) \|_p \geq ε\quad \text{vs} \quad H_1: \| F - G(\boldsymbolθ_F) \|_p < ε, \] where $ε>0$ is a margin of error and $G(\boldsymbolθ_F)$ denotes a representative of $F$ within the parametric class. The approximate model is determined via an M-estimator of the parameters. %The objective is the approximate validation of a distribution or an entire parametric family up to a pre-specified threshold value. The methodology also quantifies the percentage improvement of the proposed model relative to a non-informative (constant) benchmark. The test statistic is the $\mathrm{L}^p$-distance between the empirical distribution function and that of the estimated model. We present two consistent, easy-to-implement, and flexible bootstrap schemes to carry out the test. The performance of the proposal is illustrated through simulation studies and analysis and real-data applications.
Paper ID : 2410.20918Title : Bootstrap tests for almost goodness-of-fitAuthors : Amparo Báıllo (Universidad Autónoma de Madrid), Javier Cárcamo (Universidad del Páıs Vasco)Classification : stat.ME (Statistical Methodology), math.ST (Mathematical Statistics), stat.AP (Applied Statistics), stat.TH (Statistical Theory)Publication Date : October 15, 2025 (arXiv preprint)Paper Link : https://arxiv.org/abs/2410.20918 This paper introduces the "almost goodness-of-fit" (AGoF) test to assess whether a parametric model adequately represents the probability distribution of an observed sample. Specifically, given a distribution function F F F and a parametric family G = { G ( θ ) : θ ∈ Θ } \mathcal{G}=\{G(\theta) : \theta \in \Theta\} G = { G ( θ ) : θ ∈ Θ } , the paper considers the hypothesis testing problem:
H 0 : ∥ F − G ( θ F ) ∥ p ≥ ϵ vs H 1 : ∥ F − G ( θ F ) ∥ p < ϵ H_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon H 0 : ∥ F − G ( θ F ) ∥ p ≥ ϵ vs H 1 : ∥ F − G ( θ F ) ∥ p < ϵ
where ϵ > 0 \epsilon > 0 ϵ > 0 is the error tolerance and G ( θ F ) G(\theta_F) G ( θ F ) represents the best approximation of F F F within the parametric class. The approximate model is determined through M-estimation, and two consistent and easily implementable bootstrap schemes are provided for conducting the test.
Traditional goodness-of-fit (GoF) tests suffer from a fundamental issue: they place the statement "the model is a reasonable approximation of the data" in the null hypothesis H 0 H_0 H 0 , thereby providing statistical evidence only for model "misfit" rather than for actual "goodness-of-fit."
Limitations of Traditional GoF Tests : Classical methods can only reject models but cannot verify model applicabilityPractical Needs : In practice, we are more concerned with whether a model is "good enough" rather than whether it is perfectly exactImportance of Approximate Modeling : In reality, few models can perfectly describe data; some degree of deviation must be toleratedThe limiting distribution of Kolmogorov-Smirnov type statistics under parameter estimation is complex and non-Gaussian Bootstrap methods are typically inconsistent when estimating sup-norms Lack of a unified framework for handling approximate verification of parametric families Proposes the AGoF Testing Framework : Places "approximate fit" in the alternative hypothesis, enabling statistical evidence for model applicabilityUses L p L^p L p Distance : Compared to traditional supremum norms, L p L^p L p norms possess superior theoretical properties and computational advantagesDevelops Two Bootstrap Schemes : Proves their consistency and provides practical implementation algorithmsIntroduces AGoF Statistic : Quantifies the percentage improvement of the model relative to a non-informative baselineProvides Complete Theoretical Analysis : Including asymptotic distributions, bootstrap consistency, and other theoretical guaranteesGiven a sample X 1 , … , X n X_1, \ldots, X_n X 1 , … , X n from an unknown distribution F F F and a parametric model family G = { G ( θ ) : θ ∈ Θ ⊂ R k } \mathcal{G} = \{G(\theta) : \theta \in \Theta \subset \mathbb{R}^k\} G = { G ( θ ) : θ ∈ Θ ⊂ R k } , test:
H 0 : ∥ F − G ( θ F ) ∥ p ≥ ϵ vs H 1 : ∥ F − G ( θ F ) ∥ p < ϵ H_0: \|F - G(\theta_F)\|_p \geq \epsilon \quad \text{vs} \quad H_1: \|F - G(\theta_F)\|_p < \epsilon H 0 : ∥ F − G ( θ F ) ∥ p ≥ ϵ vs H 1 : ∥ F − G ( θ F ) ∥ p < ϵ
where θ F \theta_F θ F is determined through M-estimation: E F [ ψ θ F ( X ) ] = 0 E_F[\psi_{\theta_F}(X)] = 0 E F [ ψ θ F ( X )] = 0 .
Use M-estimators to solve:
Ψ n ( θ ) = 1 n ∑ i = 1 n ψ θ ( X i ) = 0 \Psi_n(\theta) = \frac{1}{n}\sum_{i=1}^n \psi_\theta(X_i) = 0 Ψ n ( θ ) = n 1 ∑ i = 1 n ψ θ ( X i ) = 0
The standardized statistic is:
T n ( F , G ( θ F ) , p ) = n ( ∥ F n − G ( θ ^ n ) ∥ p − ∥ F − G ( θ F ) ∥ p ) T_n(F,G(\theta_F),p) = \sqrt{n}(\|F_n - G(\hat{\theta}_n)\|_p - \|F - G(\theta_F)\|_p) T n ( F , G ( θ F ) , p ) = n ( ∥ F n − G ( θ ^ n ) ∥ p − ∥ F − G ( θ F ) ∥ p )
The rejection region is:
R n = { ∥ F n − G ( θ ^ n ) ∥ p < ϵ − c n ( α ) } R_n = \{\|F_n - G(\hat{\theta}_n)\|_p < \epsilon - c_n(\alpha)\} R n = { ∥ F n − G ( θ ^ n ) ∥ p < ϵ − c n ( α )}
where c n ( α ) = − Q T ( α ) / n c_n(\alpha) = -Q_T(\alpha)/\sqrt{n} c n ( α ) = − Q T ( α ) / n and Q T ( α ) Q_T(\alpha) Q T ( α ) is the α \alpha α -quantile of the limiting distribution.
Hadamard Differentiability : For 1 < p < ∞ 1 < p < \infty 1 < p < ∞ , the L p L^p L p norm is Hadamard differentiable, facilitating the application of the functional delta methodGaussian Limiting Distribution : Under general assumptions, the asymptotic distribution is GaussianBootstrap Consistency : Under appropriate conditions, the standard bootstrap estimator is consistentFlexibility : The sensitivity to distribution tails can be controlled by adjusting the p p p valueEstablishes a complete asymptotic theory including:
Weak convergence of empirical processes in L p L^p L p spaces Limiting distributions of processes with estimated parameters Consistency of bootstrap processes Under Assumptions 1-2, X ∈ L 2 / p , 1 X \in L^{2/p,1} X ∈ L 2/ p , 1 if and only if:
G n ( θ F ) ⇝ G θ F in L p G_n(\theta_F) \rightsquigarrow G_{\theta_F} \text{ in } L^p G n ( θ F ) ⇝ G θ F in L p
where G θ F G_{\theta_F} G θ F is a centered Gaussian process.
When p = 1 p = 1 p = 1 : T ( F , G ( θ F ) , 1 ) = ∫ C θ F ∣ G θ F ∣ + ∫ R ∖ C θ F G θ F sgn ( F − G ( θ F ) ) T(F,G(\theta_F),1) = \int_{C_{\theta_F}} |G_{\theta_F}| + \int_{\mathbb{R}\setminus C_{\theta_F}} G_{\theta_F}\text{sgn}(F-G(\theta_F)) T ( F , G ( θ F ) , 1 ) = ∫ C θ F ∣ G θ F ∣ + ∫ R ∖ C θ F G θ F sgn ( F − G ( θ F )) When 1 < p < ∞ 1 < p < \infty 1 < p < ∞ : T ( F , G ( θ F ) , p ) = 1 ∥ F − G ( θ F ) ∥ p p − 1 ∫ G θ F ∣ F − G ( θ F ) ∣ p − 1 sgn ( F − G ( θ F ) ) T(F,G(\theta_F),p) = \frac{1}{\|F-G(\theta_F)\|_p^{p-1}} \int G_{\theta_F} |F-G(\theta_F)|^{p-1}\text{sgn}(F-G(\theta_F)) T ( F , G ( θ F ) , p ) = ∥ F − G ( θ F ) ∥ p p − 1 1 ∫ G θ F ∣ F − G ( θ F ) ∣ p − 1 sgn ( F − G ( θ F )) The limiting distribution is normal if and only if:
p = 1 p = 1 p = 1 : The Lebesgue measure of the contact set C θ F = { F = G ( θ F ) } C_{\theta_F} = \{F = G(\theta_F)\} C θ F = { F = G ( θ F )} is zero1 < p < ∞ 1 < p < \infty 1 < p < ∞ : F ≠ G ( θ F ) F \neq G(\theta_F) F = G ( θ F ) Theorems 3 and Corollary 2 prove that under appropriate assumptions, the bootstrap statistic weakly converges to the same limiting distribution.
Sample Sizes : n = 30 , 50 , 100 , 500 n = 30, 50, 100, 500 n = 30 , 50 , 100 , 500 Bootstrap Replicates : B = 2000 B = 2000 B = 2000 Significance Level : α = 0.05 \alpha = 0.05 α = 0.05 Monte Carlo Replications : 1000Weibull vs Exponential Model : p = 1 p = 1 p = 1 , true distribution is Weibull(2,1)Gaussian Mixture vs Normal Model : p = 2 p = 2 p = 2 , true distribution is two-component Gaussian mixtureNegative Binomial vs Poisson Model : p = 1 p = 1 p = 1 , discrete distribution caseKumaraswamy vs Beta Model : p = 1 p = 1 p = 1 , bounded support caseStudent t vs Normal Model : p = 4 p = 4 p = 4 , heavy-tailed distribution caseLognormal vs Gamma Model : p = 1 p = 1 p = 1 , skewed distribution caseBootstrap 1 : Quantile-based method, rejection condition: 2 ∥ F n − G ( θ ^ n ) ∥ p − ϵ ^ ∗ ( α ) < ϵ 2\|F_n - G(\hat{\theta}_n)\|_p - \hat{\epsilon}^*(\alpha) < \epsilon 2∥ F n − G ( θ ^ n ) ∥ p − ϵ ^ ∗ ( α ) < ϵ Bootstrap 2 : Normal approximation-based method, rejection condition: ∥ F n − G ( θ ^ n ) ∥ p − σ ^ boot z α < ϵ \|F_n - G(\hat{\theta}_n)\|_p - \hat{\sigma}_{\text{boot}}z_\alpha < \epsilon ∥ F n − G ( θ ^ n ) ∥ p − σ ^ boot z α < ϵ Moderate Sample Sizes (n = 500 n = 500 n = 500 ): Both methods perform similarly and control test level wellSmall Sample Sizes (n ≤ 100 n \leq 100 n ≤ 100 ): Bootstrap 2 typically better controls nominal significance levelHigh AGoF Statistics (> 0.9): Bootstrap 1 performs betterFor Weibull vs Exponential model:
∥ F − G ( θ F ) ∥ 1 = 0.3002 \|F - G(\theta_F)\|_1 = 0.3002 ∥ F − G ( θ F ) ∥ 1 = 0.3002 AGoF Statistic: G ( F , G ) = 0.194 G(F,G) = 0.194 G ( F , G ) = 0.194 (only 19.4% improvement over constant model) Power functions show the two methods are nearly indistinguishable at n = 500 n = 500 n = 500 AGoF Statistic between 0-0.9: Recommend Bootstrap 2 AGoF Statistic exceeding 0.9: Recommend Bootstrap 1 Exercise caution when interpreting results with small sample sizes Data : 4308 IgG antibody samples (Bm33 antigen) from Haiti's national serology survey
Analysis : Test AGoF of 1-5 component normal mixture models
Two-component model performs best: ϵ 2 ∗ ( 0.05 ) ≈ 0.022 \epsilon^*_2(0.05) \approx 0.022 ϵ 2 ∗ ( 0.05 ) ≈ 0.022 (L 1 L^1 L 1 ), G ∗ ( F , G 2 ) > 0.97 G^*(F,G_2) > 0.97 G ∗ ( F , G 2 ) > 0.97 Single-component normal model insufficient: improvement rate < 78% Three or more component models show limited improvement (< 1%) Data : Approximately 1200 carbon fibers' tensile properties at different gauge lengths
Model Comparison : Weibull, three-parameter Weibull, skew-normal, bimodal Weibull
Main Findings :
Bimodal Weibull performs best at most gauge lengths Model performance significantly decreases with gauge length (except bimodal Weibull) Linear regression analysis confirms the statistical significance of this trend Kolmogorov-Smirnov test and its limitations Cramér-von Mises test's distribution dependence issues Wellek (2021)'s Lehmann alternative hypothesis approach Liu and Lindsay (2009)'s tolerance regions for multinomial models Romano (2005)'s optimal equivalence testing Berger and Delampady (1987)'s exact hypothesis testing Dette and Sen (2013)'s related hypothesis consistent testing procedures Baringhaus and Henze (2024)'s neighborhood verification testing Method Effectiveness : The AGoF test successfully addresses the problem that traditional GoF tests can only provide evidence of "misfit"Theoretical Completeness : Provides complete asymptotic theory and bootstrap consistency proofsPracticality : Two bootstrap schemes are easy to implement and applicable to a wide range of parametric modelsIntegrability Conditions : Requires satisfaction of X ∈ L 2 / p , 1 X \in L^{2/p,1} X ∈ L 2/ p , 1 condition, limiting applicabilityParameter Selection : The choice of error tolerance ϵ \epsilon ϵ still requires domain expertiseComputational Complexity : Higher computational cost compared to simple GoF testsMultivariate Extension : Extend the method to multivariate distribution casesNonparametric Alternatives : Consider approximate verification for nonparametric or semiparametric modelsAdaptive Methods : Develop data-driven methods for automatic selection of ϵ \epsilon ϵ Theoretical Innovation : First systematic placement of "approximate fit" in the alternative hypothesis, representing an important conceptual breakthroughMethodological Completeness : Very comprehensive from theoretical analysis to implementation algorithmsPractical Value : AGoF statistic provides intuitive model quality measurementTechnical Advantages : L p L^p L p distance selection has clear advantages in both theory and computationAssumption Conditions : M-estimation framework and integrability conditions may limit applicabilityParameter Tuning : Lack of systematic guidance for selecting p p p value and ϵ \epsilon ϵ Computational Efficiency : High computational cost of bootstrap processAcademic Contribution : Provides new research direction for goodness-of-fit testing fieldPractical Value : Important application prospects in model selection and verificationReproducibility : Complete theoretical results and clear algorithm descriptions facilitate reproductionSituations requiring verification of parametric model applicability Model selection and comparison Model verification in regulatory and quality control contexts Distribution model assessment in risk management The paper cites abundant relevant literature covering multiple fields including empirical process theory, M-estimation, and bootstrap methods, providing a solid theoretical foundation for the research.