2025-11-24T11:16:24.556584

StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics

Abasov, Dudko, Gorin et al.
We present StatTestCalculator (STC), a new open-source statistical analysis tool designed for analysis high energy physics experiments. STC provides both asymptotic calculations and Monte Carlo simulations for computing the exact statistical significance of a discovery or for setting upper limits on signal model parameters. We review the underlying statistical formalism, including profile likelihood ratio test statistics for discovery and exclusion hypotheses, and the asymptotic distributions that allow quick significance estimates. We explain the relevant formulas for the likelihood functions, test statistic distributions, and significance metrics (both with and without incorporating systematic uncertainties). The implementation and capabilities of STC are described, and we validate its performance against the widely-used CMS Combine tool. We find excellent agreement in both the expected discovery significances and upper limit calculations. STC is a flexible framework that can accommodate systematic uncertainties and user-defined statistical models, making it suitable for a broad range of analyses.
academic

StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics

Basic Information

  • Paper ID: 2510.11637
  • Title: StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics
  • Authors: E. Abasov, L.V. Dudko, D.E. Gorin, O.S. Vasilevskii (Faculty of Physics, Moscow State University; Skobeltsyn Institute of Nuclear Physics)
  • Classification: hep-ph (High Energy Physics - Phenomenology), stat.CO (Statistics - Computation)
  • Publication Date/Conference: Moscow University Physics Bulletin 80(8), 2025; The XXV International Workshop-School High Energy Physics and Quantum Field Theory
  • Paper Link: https://arxiv.org/abs/2510.11637v1

Abstract

This paper introduces StatTestCalculator (STC), a novel open-source statistical analysis tool designed specifically for high energy physics experimental analysis. STC provides both asymptotic calculations and Monte Carlo simulation methods for computing the precise statistical significance of discoveries or setting upper limits on signal model parameters. The paper reviews the underlying statistical formalism, including profile likelihood ratio test statistics for discovery and exclusion hypotheses, as well as asymptotic distributions enabling rapid significance estimation. The authors provide detailed explanations of relevant formulas for likelihood functions, test statistic distributions, and significance measures, both with and without systematic uncertainties. The paper describes STC's implementation and functionality, and validates its performance through comprehensive comparison with the widely-used CMS Combine tool, demonstrating excellent consistency in both expected discovery significance and upper limit calculations.

Research Background and Motivation

Problem Definition

High energy physics (HEP) experiments rely on statistical analysis of observed data to draw conclusions about new phenomena. Since collider experiment results are inherently probabilistic in nature, rigorous statistical methods are required to estimate parameters and assess the significance of potential discoveries.

Limitations of Existing Tools

Although numerous sophisticated statistical tools exist for HEP analysis, such as:

  • RooFit and RooStats frameworks
  • CMS Combine tool
  • Theta
  • HistFactory

These tools are typically designed for complex large-scale analyses and lack a lightweight tool capable of providing fast and accurate general statistical computations for various common scenarios.

Research Motivation

  1. Usability Requirements: Need for an easy-to-use and multifunctional Python tool
  2. Integration Convenience: Ability to integrate seamlessly into neural network pipelines
  3. Rapid Verification: Facilitate preliminary sensitivity studies, cross-checking of official results, or educational purposes
  4. Extensibility: Support for user-defined statistical models and test statistics

Core Contributions

  1. Development of a New Statistical Analysis Tool STC: A lightweight, Python-based open-source tool specifically designed for HEP statistical analysis
  2. Dual Computational Methods: Support for both asymptotic formulas (closed-form approximations) and exact Monte Carlo simulations
  3. Comprehensive Systematic Uncertainty Handling: Support for normal, lognormal, or user-defined systematic effect distributions
  4. Validation of Tool Accuracy: Extensive comparison with the CMS Combine tool demonstrating excellent consistency
  5. Extended Mathematical Framework: Generalized formulas extending single-bin analysis to multi-bin shape analysis

Methodology Details

Statistical Assumptions and Likelihood Formalization

Task Definition

In collider experiments, consider two hypotheses:

  • Null Hypothesis H₀ (Background Only): Assumes data contains no contribution from new signals
  • Alternative Hypothesis H₁ (Signal + Background): Assumes signal events exist in addition to background

Define the signal strength parameter μ, where μ=0 corresponds to H₀ and μ=1 corresponds to the nominal signal prediction under H₁.

Likelihood Function Construction

For counting experiments with N signal regions, observed counts nᵢ are assumed to follow Poisson distributions: nᵢ ~ Poisson(μsᵢ + κᵢbᵢ)

The complete likelihood function is:

L(μ,θ) = ∏ᵢ₌₁ᴺ [(μsᵢ + κᵢbᵢ)^nᵢ e^-(μsᵢ+bᵢ)]/nᵢ! × ∏ⱼ₌₁ᴹ Systematic(θ)

Where:

  • sᵢ: Expected number of signal events
  • bᵢ: Expected background yield
  • κ: Systematic uncertainty parameter
  • θ: Vector of nuisance parameters

Profile Likelihood Ratio and Test Statistics

Profile Likelihood Ratio Definition

λ(μ) = L(μ, θ̂(μ)) / L(μ̂, θ̂)

Test Statistics

Define the test statistic:

qμ = -2 ln λ(μ) = -2 ln [L(μ, θ̂(μ)) / L(μ̂, θ̂)]

Discovery Test Statistic q₀:

q₀ = {
  -2 ln λ(0),  if μ̂ ≥ 0
  0,           if μ̂ < 0
}

Exclusion Test Statistic qμ:

qμ = {
  -2 ln λ(μ),  if μ̂ ≤ μ
  0,           if μ̂ > μ
}

Analytical Formulas for Discovery Significance

For cases including systematic uncertainties, the discovery significance formula is:

Zdisc = √{2[(s+b)ln((s+b)(1+δ²b))/(b+δ²b(s+b)) - (1/δ²)ln(1+δ²s/(1+δ²b))]}

Where δ = σb/b is the relative background uncertainty.

In the limit of no systematic uncertainties (δ→0):

Zdisc = √{2[(s+b)ln(1+s/b) - s]}

Analytical Formulas for Exclusion Significance (Upper Limits)

The exclusion significance formula including background uncertainty:

Zexcl = √{2[s - b ln((b+s+x)/(2b)) - (1/δ²)ln((b-s+x)/(2b))] - (b+s-x)(1+1/(δ²b))}

Where:

x = √[(b+s)² - 4δ²b²s/(1+δ²b)]

Experimental Setup

Monte Carlo Simulation Framework

Toy Experiment Generation

  1. Signal Events: Drawn from Poisson distribution Poisson(μs)
  2. Background Events: Drawn from Poisson distribution Poisson(b)
  3. Systematic Uncertainties: Applied to signal and background distributions

Systematic Uncertainty Handling

  • Normal Distribution: κ ~ N(1, δ²)
  • Lognormal Distribution: κ ~ LogNormal(1, δ²)
  • Shape Uncertainty: Each bin multiplied by scalar κ value
  • Single-bin Uncertainty: Each bin has independent κ factor

Validation Experimental Setup

Comparison Tool

Primary comparison with CMS Combine tool for validation

Test Scenarios

  1. Discovery Significance Calculation:
    • Background b = 100 events
    • Signal s = 10, 20, 30, ..., 50 events
    • Systematic uncertainties: 0% and 20%
  2. Upper Limit Calculation:
    • 95% confidence level limits
    • Same signal and background configurations
    • Monte Carlo simulations using 10⁵ toy experiments

Experimental Results

Main Results

Discovery Significance Comparison

Experimental results demonstrate excellent consistency between STC and the Combine tool in the following aspects:

  1. Asymptotic Calculations:
    • Without systematic uncertainties: Perfect agreement
    • With 20% systematic uncertainties: High consistency
  2. Monte Carlo Calculations:
    • MC results from both tools show good agreement with asymptotic formulas
    • Statistical uncertainties within expected ranges

Upper Limit Calculation Comparison

95% confidence level upper limit calculations show:

  1. Asymptotic Formula Validation: STC's asymptotic formulas perfectly match Combine
  2. Monte Carlo Validation: Toy experiment results confirm the accuracy of asymptotic approximations
  3. Systematic Uncertainty Impact: Correctly reflects the degradation of exclusion power due to systematic uncertainties

Performance Evaluation

Computational Efficiency

  • Asymptotic Calculations: Nearly instantaneous (fractions of seconds)
  • Monte Carlo Simulations: 10⁵ toy experiments completed in seconds to minutes

Accuracy Verification

All test scenarios demonstrate that STC accurately reproduces standard calculations, confirming:

  1. Correct implementation of mathematical formulas
  2. Reliability of Monte Carlo algorithms
  3. Accuracy of systematic uncertainty handling

Extended Functionality Verification

Multi-bin Shape Analysis

STC successfully applied to more complex multi-bin shape analysis scenarios using formulas extended from reference 7.

User-Defined Capabilities

Verified the following extension capabilities:

  1. Custom test statistic definitions
  2. Alternative likelihood function forms
  3. User-defined systematic uncertainty distributions

Comparison of Existing Statistical Tools

ToolFeaturesLimitations
RooFit/RooStatsPowerful, widely usedComplex, steep learning curve
CMS CombineStandard tool, complete functionalityPrimarily for large-scale analyses
ThetaBayesian methodsSpecialized purpose
HistFactoryModel constructionRequires auxiliary tools

STC's Position

STC fills the gap for lightweight, user-friendly, and rapid statistical analysis tools, particularly suitable for:

  • Preliminary sensitivity studies
  • Cross-verification of results
  • Educational and learning purposes
  • Neural network pipeline integration

Conclusions and Discussion

Main Conclusions

  1. Tool Effectiveness: STC successfully implements accurate statistical analysis functionality with excellent consistency with standard tool Combine
  2. Methodological Completeness: Provides a complete statistical framework from simple counting experiments to complex shape analysis
  3. Practical Value: Lightweight design makes it suitable for rapid analysis and educational purposes
  4. Extensibility: Modular design supports user customization and method extensions

Limitations

  1. Complexity Constraints: While supporting multi-bin analysis, may not match specialized tools for extremely complex statistical models
  2. Performance Optimization: Room for improvement in performance optimization when handling large-scale data
  3. Documentation Completeness: As a new tool, requires more usage examples and documentation

Future Directions

  1. Feature Extensions:
    • Support for additional statistical distributions
    • Integration of Bayesian methods
    • Extension to more complex experimental designs
  2. Performance Optimization:
    • Parallelization of Monte Carlo calculations
    • Memory usage optimization
    • Large-scale data processing capabilities
  3. Community Building:
    • Increase usage examples
    • Improve documentation
    • Encourage community contributions

In-Depth Evaluation

Strengths

  1. Technical Innovation:
    • Successfully transforms complex statistical theory into a user-friendly tool
    • Provides complete mathematical derivations and implementations
    • Dual verification methods (asymptotic + MC) enhance result reliability
  2. Experimental Sufficiency:
    • Comprehensive comparison with standard tools
    • Test coverage across multiple scenarios
    • Correct handling of systematic uncertainties
  3. Practical Value:
    • Fills the gap for lightweight statistical tools
    • Python implementation facilitates integration and modification
    • Open-source nature promotes community development
  4. Writing Clarity:
    • Detailed and correct mathematical derivations
    • Clear description of implementation details
    • Transparent verification process

Shortcomings

  1. Methodological Limitations:
    • Primarily based on frequentist methods
    • Limited support for certain specialized statistical models
    • Large-scale parallel computing capabilities need enhancement
  2. Experimental Setup:
    • Validation primarily based on simple models
    • Lacks test cases with real complex experiments
    • Performance benchmarking relatively simple
  3. Comparative Analysis:
    • Primarily compared with Combine, lacking comparison with other tools
    • Insufficient quantitative analysis of computational efficiency

Impact Assessment

  1. Academic Contribution:
    • Provides new tool options for HEP statistical analysis
    • Complete mathematical framework has educational value
    • Open-source implementation promotes method transparency
  2. Practical Impact:
    • Lowers technical barriers for statistical analysis
    • Facilitates rapid prototyping and verification
    • Supports teaching and learning activities
  3. Reproducibility:
    • Open-source code ensures complete reproducibility
    • Detailed mathematical derivations support independent verification
    • Comparison with standard tools enhances credibility

Applicable Scenarios

  1. Ideal Applications:
    • Preliminary sensitivity studies
    • Learning and teaching statistical methods
    • Rapid prototype development
    • Cross-verification of results
  2. Limited Scenarios:
    • Extremely large-scale complex analyses
    • Cases requiring specialized statistical methods
    • Production environments with extreme performance requirements

References

1 W. Verkerke and D. Kirkby, The RooFit toolkit for data modeling, Statistical Problems in Particle Physics, Astrophysics and Cosmology (2006)

2 L. Moneta et al., The RooStats Project, arXiv:1009.1003 (2010)

3 CMS Collaboration, The CMS Statistical Analysis and Combination Tool: Combine, arXiv:2404.06614 (2024)

6 G. Cowan, K. Cranmer, E. Gross, and O. Vitells, Asymptotic formulae for likelihood-based tests of new physics, Eur. Phys. J. C 71, 1554 (2011)

7 D. E. Gorin et al., Asymptotic formulas for estimating statistical significance in collider experiments, Uchenye Zapiski Fiz. Fak. MGU No. 1 (2024)


Tool Access: StatTestCalculator software and documentation are available on GitHub: https://github.com/skottver/stattestcalculator