Correctness in scientific computing (SC) is gaining increasing attention in the formal methods (FM) and programming languages (PL) community. Existing PL/FM verification techniques struggle with the complexities of realistic SC applications. Part of the problem is a lack of a common understanding between the SC and PL/FM communities of machine-verifiable correctness challenges and dimensions of correctness in SC applications.
To address this gap, we call for specialized challenge problems to inform the development and evaluation of FM/PL verification techniques for correctness in SC. These specialized challenges are intended to augment existing problems studied by FM/PL researchers for general programs to ensure the needs of SC applications can be met. We propose several dimensions of correctness relevant to scientific computing, and discuss some guidelines and criteria for designing challenge problems to evaluate correctness in scientific computing.
- Paper ID: 2510.13423
- Title: Towards Richer Challenge Problems for Scientific Computing Correctness
- Authors: Matthew Sottile, Mohit Tekriwal, John Sarracino (Lawrence Livermore National Laboratory)
- Classification: cs.SE cs.MS
- Published Conference: International Workshop on Verification of Scientific Software (VSS 2025), EPTCS 432
- Paper Link: https://arxiv.org/abs/2510.13423
The correctness of scientific computing (SC) has received increasing attention from the formal methods (FM) and programming languages (PL) communities. Existing PL/FM verification techniques face difficulties in handling the complexity of real-world scientific computing applications. Part of the problem lies in the lack of shared understanding between the SC and PL/FM communities regarding machine-verifiable correctness challenges and the dimensions of correctness in SC applications. To address this gap, the authors call for the establishment of specialized challenge problems to guide the development and evaluation of FM/PL verification techniques in SC. These specialized challenges aim to augment existing generic program problems studied by FM/PL researchers to ensure they meet the needs of SC applications.
- Gap in Inter-community Understanding: Lack of shared understanding between the scientific computing community and the formal methods/programming languages community regarding correctness challenges
- Limitations of Existing Verification Techniques: Current PL/FM verification techniques struggle to handle the complexity of real-world scientific computing applications
- Insufficient Challenge Problems: Lack of standardized challenge problem sets specifically designed for scientific computing correctness
Scientific computing applications involve complex numerical computations, parallel processing, physical modeling, and multiple other dimensions. Their correctness directly impacts the reliability of scientific research results. Traditional software verification methods often fail to adequately cover the correctness requirements unique to scientific computing.
- Existing formal verification challenge problems primarily target generic programs and lack the complexity specific to scientific computing
- While the numerical verification community has relevant work, there is no unified challenge problem set
- Existing benchmark suites primarily focus on performance rather than correctness
Drawing on the successful experience of performance benchmark suites in high-performance computing (such as NAS Parallel Benchmarks and Mantevo), establish a similar challenge problem framework for scientific computing correctness.
- Proposes Six Dimensions of Scientific Computing Correctness: Numerics, data structures, domain-modeling structures, differential equations, concurrency and parallelism, and approximation schemes
- Identifies Key Pitfalls in Challenge Problem Design: Over-specialization, "toy" problems, neglecting scientific computing uniqueness, etc.
- Establishes Distinction Between Challenge Problems and Benchmarks: Challenge problems define objectives and evaluation criteria, while benchmarks provide objective measurements
- Provides Design Guidance Principles: Consider uncertainty, separate mathematics from implementation, allow unchecked assumptions, etc.
This paper is a position paper aimed at establishing a comprehensive challenge problem framework for scientific computing correctness, rather than proposing specific technical methods.
The authors categorize scientific computing correctness into three abstraction levels:
- Low-level: Numerics and traditional data structures
- Mid-level: Model-specific data structures and computations
- High-level: Mathematical abstractions and physical system invariants
- Numerics
- Correct correspondence between mathematical operations and hardware/software implementations
- Precision issues in floating-point arithmetic
- Challenges in mixed-precision algorithms
- Data Structures
- Correctness of standard data structures
- Structure transformations due to performance optimization (e.g., SOA to AOS conversion)
- Semantic equivalence guarantees
- Domain-modeling Structures
- Complex data structures such as grids and graphs
- Satisfaction of physical system constraints
- High-level invariants such as conservation laws
- Differential Equations
- Consistency between PDEs and physical modeling
- Numerical stability and boundary condition compatibility
- Well-posedness
- Concurrency and Parallelism
- Combinations of multiple parallel programming models
- Shared memory, vectorization, and distributed memory parallelism
- Balance between performance and correctness
- Approximation Schemes
- Algorithmic heuristic methods
- Interpolation methods
- Distinction from numerical methods
- Multi-level Abstraction Integration: First systematic framework unifying scientific computing correctness issues from low-level implementation details to high-level physical constraints
- Community Bridge Role: Establishes common language between formal methods and scientific computing communities
- Practical Orientation: Avoids over-theorization, focusing on real-world correctness requirements
As a position paper, this work does not include traditional experimental setup but supports its arguments through analysis of existing benchmark suites and challenge problems.
- Performance Benchmarks: NAS Parallel Benchmarks, Mantevo, Salishan problems, Shonan challenge
- Correctness Challenges: VerifyThis, NSV-3 benchmarks, Gallery of Verified Programs
- Specialized Benchmarks: FPbench, DataRaceBench, SPEC
The authors propose that challenge problems should possess the following characteristics:
- Cover multiple correctness dimensions
- Avoid over-specialization
- Possess real-world relevance
- Address unique scientific computing needs
The authors' analysis reveals the following deficiencies in existing challenge problems:
- Insufficient Coverage: Lack of complex algorithm categories such as graph algorithms and sparse matrix representations
- Simple Data Structures: Inadequate coverage of complex sparse matrix representations beyond basic CSR
- Incomplete Mathematical Domains: Insufficient coverage of fundamental mathematical domains
Evolution of performance benchmarks:
- Linpack → Livermore Loops → NAS Parallel Benchmarks → Mantevo
- Gradually increasing complexity, from simple linear algebra to complete application code
- Evaluation metrics expanding from pure performance to correctness and scalability
- Early Benchmarks: Linpack focused on floating-point arithmetic performance
- Loop Kernels: Livermore Loops tested specific computational patterns
- Parallel Benchmarks: NAS Parallel Benchmarks introduced parallelism considerations
- Application-oriented: Mantevo provides mini-apps close to real applications
- Generic Verification: VerifyThis competitions provide foundational challenges
- Numerical Verification: coq-num-analysis, Mathematical Components Library
- Specialized Domains: FPbench (floating-point), DataRaceBench (data races)
- ASME V&V guidance principles provide verification and validation standards for engineering disciplines
- Distinguishes between verification and validation problems
- Scientific computing correctness requires specialized challenge problems to advance formal methods development
- Challenge problems must span computational abstraction levels, integrating low-level implementation and high-level domain constraints
- Need to avoid over-specialization and "toy" problems, focusing on real-world application relevance
- Theoretical Nature: As a position paper, lacks concrete challenge problem examples
- Implementation Complexity: Establishing a comprehensive challenge problem set requires interdisciplinary collaboration
- Evaluation Standards: How to objectively assess the quality of challenge problems requires further research
- Collaborate with scientific computing and formal methods experts to design concrete challenge problems
- Establish standardized evaluation frameworks and metrics
- Consider integration of uncertainty quantification and statistical modeling
- Extend to verification and validation (V&V) problems
- Accurate Problem Identification: Accurately identifies key challenges in scientific computing correctness verification
- Systematic Framework: The proposed correctness dimension framework demonstrates excellent systematicity and completeness
- Practical Orientation: Avoids pure theoretical discussion, focusing on real-world application requirements
- Interdisciplinary Perspective: Effectively bridges formal methods and scientific computing communities
- Historical Insights: Draws valuable lessons from the evolution of performance benchmarks
- Lack of Concrete Examples: Provides no concrete challenge problem examples to validate framework feasibility
- Unclear Implementation Path: Lacks detailed planning for transitioning from theoretical framework to practical challenge problem sets
- Missing Evaluation Mechanisms: Lacks specific mechanisms for assessing challenge problem quality and effectiveness
- Unknown Community Acceptance: Uncertain about the two communities' acceptance and willingness to participate in this framework
- Academic Value: Provides important theoretical framework for scientific computing correctness research
- Practical Potential: May promote development of more practical formal verification techniques
- Community Building: May foster deeper collaboration between scientific computing and formal methods communities
- Long-term Significance: Provides new research directions for scientific computing software quality assurance
- Research Guidance: Provides formal methods researchers with scientific computing application research directions
- Tool Development: Guides design and evaluation of scientific computing verification tools
- Education and Training: Provides systematic framework for scientific computing correctness education
- Standard Development: Offers reference for scientific computing software quality standard formulation
The paper cites 26 important references, covering:
- Performance Benchmarks: NAS Parallel Benchmarks 7,8, Mantevo 11, Linpack 14
- Formal Verification: Gallery of Verified Programs 1,17, VerifyThis 20
- Numerical Verification: coq-num-analysis 6, Mathematical Components 2
- Specialized Benchmarks: FPbench 12, DataRaceBench 21, SPEC 13
- V&V Standards: ASME Guidance Principles 18
Although this paper is a position paper, it provides an important theoretical framework and development direction for scientific computing correctness verification. The proposed six-dimensional correctness framework and design guidance principles are significant for advancing the field, but require subsequent work to concretely implement and validate these concepts.