2025-11-24T11:16:24.556584

StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics

Abasov, Dudko, Gorin et al.

We present StatTestCalculator (STC), a new open-source statistical analysis tool designed for analysis high energy physics experiments. STC provides both asymptotic calculations and Monte Carlo simulations for computing the exact statistical significance of a discovery or for setting upper limits on signal model parameters. We review the underlying statistical formalism, including profile likelihood ratio test statistics for discovery and exclusion hypotheses, and the asymptotic distributions that allow quick significance estimates. We explain the relevant formulas for the likelihood functions, test statistic distributions, and significance metrics (both with and without incorporating systematic uncertainties). The implementation and capabilities of STC are described, and we validate its performance against the widely-used CMS Combine tool. We find excellent agreement in both the expected discovery significances and upper limit calculations. STC is a flexible framework that can accommodate systematic uncertainties and user-defined statistical models, making it suitable for a broad range of analyses.

academic

StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics

Basic Information

Paper ID: 2510.11637
Title: StatTestCalculator: A New General Tool for Statistical Analysis in High Energy Physics
Authors: E. Abasov, L.V. Dudko, D.E. Gorin, O.S. Vasilevskii (Faculty of Physics, Moscow State University; Skobeltsyn Institute of Nuclear Physics)
Classification: hep-ph (High Energy Physics - Phenomenology), stat.CO (Statistics - Computation)
Publication Date/Conference: Moscow University Physics Bulletin 80(8), 2025; The XXV International Workshop-School High Energy Physics and Quantum Field Theory
Paper Link: https://arxiv.org/abs/2510.11637v1

Abstract

This paper introduces StatTestCalculator (STC), a novel open-source statistical analysis tool designed specifically for high energy physics experimental analysis. STC provides both asymptotic calculations and Monte Carlo simulation methods for computing the precise statistical significance of discoveries or setting upper limits on signal model parameters. The paper reviews the underlying statistical formalism, including profile likelihood ratio test statistics for discovery and exclusion hypotheses, as well as asymptotic distributions enabling rapid significance estimation. The authors provide detailed explanations of relevant formulas for likelihood functions, test statistic distributions, and significance measures, both with and without systematic uncertainties. The paper describes STC's implementation and functionality, and validates its performance through comprehensive comparison with the widely-used CMS Combine tool, demonstrating excellent consistency in both expected discovery significance and upper limit calculations.

Research Background and Motivation

Problem Definition

High energy physics (HEP) experiments rely on statistical analysis of observed data to draw conclusions about new phenomena. Since collider experiment results are inherently probabilistic in nature, rigorous statistical methods are required to estimate parameters and assess the significance of potential discoveries.

Limitations of Existing Tools

Although numerous sophisticated statistical tools exist for HEP analysis, such as:

RooFit and RooStats frameworks
CMS Combine tool
Theta
HistFactory

These tools are typically designed for complex large-scale analyses and lack a lightweight tool capable of providing fast and accurate general statistical computations for various common scenarios.

Research Motivation

Usability Requirements: Need for an easy-to-use and multifunctional Python tool
Integration Convenience: Ability to integrate seamlessly into neural network pipelines
Rapid Verification: Facilitate preliminary sensitivity studies, cross-checking of official results, or educational purposes
Extensibility: Support for user-defined statistical models and test statistics

Core Contributions

Development of a New Statistical Analysis Tool STC: A lightweight, Python-based open-source tool specifically designed for HEP statistical analysis
Dual Computational Methods: Support for both asymptotic formulas (closed-form approximations) and exact Monte Carlo simulations
Comprehensive Systematic Uncertainty Handling: Support for normal, lognormal, or user-defined systematic effect distributions
Validation of Tool Accuracy: Extensive comparison with the CMS Combine tool demonstrating excellent consistency
Extended Mathematical Framework: Generalized formulas extending single-bin analysis to multi-bin shape analysis

Methodology Details

Statistical Assumptions and Likelihood Formalization

Task Definition

In collider experiments, consider two hypotheses:

Null Hypothesis H₀ (Background Only): Assumes data contains no contribution from new signals
Alternative Hypothesis H₁ (Signal + Background): Assumes signal events exist in addition to background

Define the signal strength parameter μ, where μ=0 corresponds to H₀ and μ=1 corresponds to the nominal signal prediction under H₁.

Likelihood Function Construction

For counting experiments with N signal regions, observed counts nᵢ are assumed to follow Poisson distributions: nᵢ ~ Poisson(μsᵢ + κᵢbᵢ)

The complete likelihood function is:

L(μ,θ) = ∏ᵢ₌₁ᴺ [(μsᵢ + κᵢbᵢ)^nᵢ e^-(μsᵢ+bᵢ)]/nᵢ! × ∏ⱼ₌₁ᴹ Systematic(θ)

Where:

sᵢ: Expected number of signal events
bᵢ: Expected background yield
κ: Systematic uncertainty parameter
θ: Vector of nuisance parameters

Profile Likelihood Ratio and Test Statistics

Profile Likelihood Ratio Definition

λ(μ) = L(μ, θ̂(μ)) / L(μ̂, θ̂)

Test Statistics

Define the test statistic:

qμ = -2 ln λ(μ) = -2 ln [L(μ, θ̂(μ)) / L(μ̂, θ̂)]

Discovery Test Statistic q₀:

q₀ = {
  -2 ln λ(0),  if μ̂ ≥ 0
  0,           if μ̂ < 0
}

Exclusion Test Statistic qμ:

qμ = {
  -2 ln λ(μ),  if μ̂ ≤ μ
  0,           if μ̂ > μ
}

Analytical Formulas for Discovery Significance

For cases including systematic uncertainties, the discovery significance formula is:

Zdisc = √{2[(s+b)ln((s+b)(1+δ²b))/(b+δ²b(s+b)) - (1/δ²)ln(1+δ²s/(1+δ²b))]}

Where δ = σb/b is the relative background uncertainty.

In the limit of no systematic uncertainties (δ→0):

Zdisc = √{2[(s+b)ln(1+s/b) - s]}

Analytical Formulas for Exclusion Significance (Upper Limits)

The exclusion significance formula including background uncertainty:

Zexcl = √{2[s - b ln((b+s+x)/(2b)) - (1/δ²)ln((b-s+x)/(2b))] - (b+s-x)(1+1/(δ²b))}

Where:

x = √[(b+s)² - 4δ²b²s/(1+δ²b)]

Experimental Setup

Monte Carlo Simulation Framework

Toy Experiment Generation

Signal Events: Drawn from Poisson distribution Poisson(μs)
Background Events: Drawn from Poisson distribution Poisson(b)
Systematic Uncertainties: Applied to signal and background distributions

Systematic Uncertainty Handling

Normal Distribution: κ ~ N(1, δ²)
Lognormal Distribution: κ ~ LogNormal(1, δ²)
Shape Uncertainty: Each bin multiplied by scalar κ value
Single-bin Uncertainty: Each bin has independent κ factor

Validation Experimental Setup

Comparison Tool

Primary comparison with CMS Combine tool for validation

Test Scenarios

Discovery Significance Calculation:
- Background b = 100 events
- Signal s = 10, 20, 30, ..., 50 events
- Systematic uncertainties: 0% and 20%
Upper Limit Calculation:
- 95% confidence level limits
- Same signal and background configurations
- Monte Carlo simulations using 10⁵ toy experiments

Experimental Results

Main Results

Discovery Significance Comparison

Experimental results demonstrate excellent consistency between STC and the Combine tool in the following aspects:

Asymptotic Calculations:
- Without systematic uncertainties: Perfect agreement
- With 20% systematic uncertainties: High consistency
Monte Carlo Calculations:
- MC results from both tools show good agreement with asymptotic formulas
- Statistical uncertainties within expected ranges

Upper Limit Calculation Comparison

95% confidence level upper limit calculations show:

Asymptotic Formula Validation: STC's asymptotic formulas perfectly match Combine
Monte Carlo Validation: Toy experiment results confirm the accuracy of asymptotic approximations
Systematic Uncertainty Impact: Correctly reflects the degradation of exclusion power due to systematic uncertainties

Performance Evaluation

Computational Efficiency

Asymptotic Calculations: Nearly instantaneous (fractions of seconds)
Monte Carlo Simulations: 10⁵ toy experiments completed in seconds to minutes

Accuracy Verification

All test scenarios demonstrate that STC accurately reproduces standard calculations, confirming:

Correct implementation of mathematical formulas
Reliability of Monte Carlo algorithms
Accuracy of systematic uncertainty handling

Extended Functionality Verification

Multi-bin Shape Analysis

STC successfully applied to more complex multi-bin shape analysis scenarios using formulas extended from reference 7.

User-Defined Capabilities

Verified the following extension capabilities:

Custom test statistic definitions
Alternative likelihood function forms
User-defined systematic uncertainty distributions

Comparison of Existing Statistical Tools

Tool	Features	Limitations
RooFit/RooStats	Powerful, widely used	Complex, steep learning curve
CMS Combine	Standard tool, complete functionality	Primarily for large-scale analyses
Theta	Bayesian methods	Specialized purpose
HistFactory	Model construction	Requires auxiliary tools

STC's Position

STC fills the gap for lightweight, user-friendly, and rapid statistical analysis tools, particularly suitable for:

Preliminary sensitivity studies
Cross-verification of results
Educational and learning purposes
Neural network pipeline integration

Conclusions and Discussion

Main Conclusions

Tool Effectiveness: STC successfully implements accurate statistical analysis functionality with excellent consistency with standard tool Combine
Methodological Completeness: Provides a complete statistical framework from simple counting experiments to complex shape analysis
Practical Value: Lightweight design makes it suitable for rapid analysis and educational purposes
Extensibility: Modular design supports user customization and method extensions

Limitations

Complexity Constraints: While supporting multi-bin analysis, may not match specialized tools for extremely complex statistical models
Performance Optimization: Room for improvement in performance optimization when handling large-scale data
Documentation Completeness: As a new tool, requires more usage examples and documentation

Future Directions

Feature Extensions:
- Support for additional statistical distributions
- Integration of Bayesian methods
- Extension to more complex experimental designs
Performance Optimization:
- Parallelization of Monte Carlo calculations
- Memory usage optimization
- Large-scale data processing capabilities
Community Building:
- Increase usage examples
- Improve documentation
- Encourage community contributions

In-Depth Evaluation

Strengths

Technical Innovation:
- Successfully transforms complex statistical theory into a user-friendly tool
- Provides complete mathematical derivations and implementations
- Dual verification methods (asymptotic + MC) enhance result reliability
Experimental Sufficiency:
- Comprehensive comparison with standard tools
- Test coverage across multiple scenarios
- Correct handling of systematic uncertainties
Practical Value:
- Fills the gap for lightweight statistical tools
- Python implementation facilitates integration and modification
- Open-source nature promotes community development
Writing Clarity:
- Detailed and correct mathematical derivations
- Clear description of implementation details
- Transparent verification process

Shortcomings

Methodological Limitations:
- Primarily based on frequentist methods
- Limited support for certain specialized statistical models
- Large-scale parallel computing capabilities need enhancement
Experimental Setup:
- Validation primarily based on simple models
- Lacks test cases with real complex experiments
- Performance benchmarking relatively simple
Comparative Analysis:
- Primarily compared with Combine, lacking comparison with other tools
- Insufficient quantitative analysis of computational efficiency

Impact Assessment

Academic Contribution:
- Provides new tool options for HEP statistical analysis
- Complete mathematical framework has educational value
- Open-source implementation promotes method transparency
Practical Impact:
- Lowers technical barriers for statistical analysis
- Facilitates rapid prototyping and verification
- Supports teaching and learning activities
Reproducibility:
- Open-source code ensures complete reproducibility
- Detailed mathematical derivations support independent verification
- Comparison with standard tools enhances credibility

Applicable Scenarios

Ideal Applications:
- Preliminary sensitivity studies
- Learning and teaching statistical methods
- Rapid prototype development
- Cross-verification of results
Limited Scenarios:
- Extremely large-scale complex analyses
- Cases requiring specialized statistical methods
- Production environments with extreme performance requirements

References

1 W. Verkerke and D. Kirkby, The RooFit toolkit for data modeling, Statistical Problems in Particle Physics, Astrophysics and Cosmology (2006)

2 L. Moneta et al., The RooStats Project, arXiv:1009.1003 (2010)

3 CMS Collaboration, The CMS Statistical Analysis and Combination Tool: Combine, arXiv:2404.06614 (2024)

6 G. Cowan, K. Cranmer, E. Gross, and O. Vitells, Asymptotic formulae for likelihood-based tests of new physics, Eur. Phys. J. C 71, 1554 (2011)

7 D. E. Gorin et al., Asymptotic formulas for estimating statistical significance in collider experiments, Uchenye Zapiski Fiz. Fak. MGU No. 1 (2024)

Tool Access: StatTestCalculator software and documentation are available on GitHub: https://github.com/skottver/stattestcalculator