2025-11-23T21:58:17.757337

Towards Richer Challenge Problems for Scientific Computing Correctness

Sottile, Tekriwal, Sarracino

Correctness in scientific computing (SC) is gaining increasing attention in the formal methods (FM) and programming languages (PL) community. Existing PL/FM verification techniques struggle with the complexities of realistic SC applications. Part of the problem is a lack of a common understanding between the SC and PL/FM communities of machine-verifiable correctness challenges and dimensions of correctness in SC applications. To address this gap, we call for specialized challenge problems to inform the development and evaluation of FM/PL verification techniques for correctness in SC. These specialized challenges are intended to augment existing problems studied by FM/PL researchers for general programs to ensure the needs of SC applications can be met. We propose several dimensions of correctness relevant to scientific computing, and discuss some guidelines and criteria for designing challenge problems to evaluate correctness in scientific computing.

academic

Towards Richer Challenge Problems for Scientific Computing Correctness

Basic Information

Paper ID: 2510.13423
Title: Towards Richer Challenge Problems for Scientific Computing Correctness
Authors: Matthew Sottile, Mohit Tekriwal, John Sarracino (Lawrence Livermore National Laboratory)
Classification: cs.SE cs.MS
Published Conference: International Workshop on Verification of Scientific Software (VSS 2025), EPTCS 432
Paper Link: https://arxiv.org/abs/2510.13423

Abstract

The correctness of scientific computing (SC) has received increasing attention from the formal methods (FM) and programming languages (PL) communities. Existing PL/FM verification techniques face difficulties in handling the complexity of real-world scientific computing applications. Part of the problem lies in the lack of shared understanding between the SC and PL/FM communities regarding machine-verifiable correctness challenges and the dimensions of correctness in SC applications. To address this gap, the authors call for the establishment of specialized challenge problems to guide the development and evaluation of FM/PL verification techniques in SC. These specialized challenges aim to augment existing generic program problems studied by FM/PL researchers to ensure they meet the needs of SC applications.

Research Background and Motivation

Problems to Address

Gap in Inter-community Understanding: Lack of shared understanding between the scientific computing community and the formal methods/programming languages community regarding correctness challenges
Limitations of Existing Verification Techniques: Current PL/FM verification techniques struggle to handle the complexity of real-world scientific computing applications
Insufficient Challenge Problems: Lack of standardized challenge problem sets specifically designed for scientific computing correctness

Problem Significance

Scientific computing applications involve complex numerical computations, parallel processing, physical modeling, and multiple other dimensions. Their correctness directly impacts the reliability of scientific research results. Traditional software verification methods often fail to adequately cover the correctness requirements unique to scientific computing.

Limitations of Existing Approaches

Existing formal verification challenge problems primarily target generic programs and lack the complexity specific to scientific computing
While the numerical verification community has relevant work, there is no unified challenge problem set
Existing benchmark suites primarily focus on performance rather than correctness

Research Motivation

Drawing on the successful experience of performance benchmark suites in high-performance computing (such as NAS Parallel Benchmarks and Mantevo), establish a similar challenge problem framework for scientific computing correctness.

Core Contributions

Proposes Six Dimensions of Scientific Computing Correctness: Numerics, data structures, domain-modeling structures, differential equations, concurrency and parallelism, and approximation schemes
Identifies Key Pitfalls in Challenge Problem Design: Over-specialization, "toy" problems, neglecting scientific computing uniqueness, etc.
Establishes Distinction Between Challenge Problems and Benchmarks: Challenge problems define objectives and evaluation criteria, while benchmarks provide objective measurements
Provides Design Guidance Principles: Consider uncertainty, separate mathematics from implementation, allow unchecked assumptions, etc.

Methodology Details

Task Definition

This paper is a position paper aimed at establishing a comprehensive challenge problem framework for scientific computing correctness, rather than proposing specific technical methods.

Framework Design

Correctness Dimension Classification

The authors categorize scientific computing correctness into three abstraction levels:

Low-level: Numerics and traditional data structures
Mid-level: Model-specific data structures and computations
High-level: Mathematical abstractions and physical system invariants

Six Core Dimensions

Numerics
- Correct correspondence between mathematical operations and hardware/software implementations
- Precision issues in floating-point arithmetic
- Challenges in mixed-precision algorithms
Data Structures
- Correctness of standard data structures
- Structure transformations due to performance optimization (e.g., SOA to AOS conversion)
- Semantic equivalence guarantees
Domain-modeling Structures
- Complex data structures such as grids and graphs
- Satisfaction of physical system constraints
- High-level invariants such as conservation laws
Differential Equations
- Consistency between PDEs and physical modeling
- Numerical stability and boundary condition compatibility
- Well-posedness
Concurrency and Parallelism
- Combinations of multiple parallel programming models
- Shared memory, vectorization, and distributed memory parallelism
- Balance between performance and correctness
Approximation Schemes
- Algorithmic heuristic methods
- Interpolation methods
- Distinction from numerical methods

Technical Innovations

Multi-level Abstraction Integration: First systematic framework unifying scientific computing correctness issues from low-level implementation details to high-level physical constraints
Community Bridge Role: Establishes common language between formal methods and scientific computing communities
Practical Orientation: Avoids over-theorization, focusing on real-world correctness requirements

Experimental Setup

As a position paper, this work does not include traditional experimental setup but supports its arguments through analysis of existing benchmark suites and challenge problems.

Analysis Objects

Performance Benchmarks: NAS Parallel Benchmarks, Mantevo, Salishan problems, Shonan challenge
Correctness Challenges: VerifyThis, NSV-3 benchmarks, Gallery of Verified Programs
Specialized Benchmarks: FPbench, DataRaceBench, SPEC

Evaluation Criteria

The authors propose that challenge problems should possess the following characteristics:

Cover multiple correctness dimensions
Avoid over-specialization
Possess real-world relevance
Address unique scientific computing needs

Experimental Results

Current State Analysis

The authors' analysis reveals the following deficiencies in existing challenge problems:

Insufficient Coverage: Lack of complex algorithm categories such as graph algorithms and sparse matrix representations
Simple Data Structures: Inadequate coverage of complex sparse matrix representations beyond basic CSR
Incomplete Mathematical Domains: Insufficient coverage of fundamental mathematical domains

Lessons from Success Cases

Evolution of performance benchmarks:

Linpack → Livermore Loops → NAS Parallel Benchmarks → Mantevo
Gradually increasing complexity, from simple linear algebra to complete application code
Evaluation metrics expanding from pure performance to correctness and scalability

Development of Performance Benchmarks

Early Benchmarks: Linpack focused on floating-point arithmetic performance
Loop Kernels: Livermore Loops tested specific computational patterns
Parallel Benchmarks: NAS Parallel Benchmarks introduced parallelism considerations
Application-oriented: Mantevo provides mini-apps close to real applications

Correctness Verification Challenges

Generic Verification: VerifyThis competitions provide foundational challenges
Numerical Verification: coq-num-analysis, Mathematical Components Library
Specialized Domains: FPbench (floating-point), DataRaceBench (data races)

Verification and Validation (V&V)

ASME V&V guidance principles provide verification and validation standards for engineering disciplines
Distinguishes between verification and validation problems

Conclusions and Discussion

Main Conclusions

Scientific computing correctness requires specialized challenge problems to advance formal methods development
Challenge problems must span computational abstraction levels, integrating low-level implementation and high-level domain constraints
Need to avoid over-specialization and "toy" problems, focusing on real-world application relevance

Limitations

Theoretical Nature: As a position paper, lacks concrete challenge problem examples
Implementation Complexity: Establishing a comprehensive challenge problem set requires interdisciplinary collaboration
Evaluation Standards: How to objectively assess the quality of challenge problems requires further research

Future Directions

Collaborate with scientific computing and formal methods experts to design concrete challenge problems
Establish standardized evaluation frameworks and metrics
Consider integration of uncertainty quantification and statistical modeling
Extend to verification and validation (V&V) problems

In-depth Evaluation

Strengths

Accurate Problem Identification: Accurately identifies key challenges in scientific computing correctness verification
Systematic Framework: The proposed correctness dimension framework demonstrates excellent systematicity and completeness
Practical Orientation: Avoids pure theoretical discussion, focusing on real-world application requirements
Interdisciplinary Perspective: Effectively bridges formal methods and scientific computing communities
Historical Insights: Draws valuable lessons from the evolution of performance benchmarks

Weaknesses

Lack of Concrete Examples: Provides no concrete challenge problem examples to validate framework feasibility
Unclear Implementation Path: Lacks detailed planning for transitioning from theoretical framework to practical challenge problem sets
Missing Evaluation Mechanisms: Lacks specific mechanisms for assessing challenge problem quality and effectiveness
Unknown Community Acceptance: Uncertain about the two communities' acceptance and willingness to participate in this framework

Impact

Academic Value: Provides important theoretical framework for scientific computing correctness research
Practical Potential: May promote development of more practical formal verification techniques
Community Building: May foster deeper collaboration between scientific computing and formal methods communities
Long-term Significance: Provides new research directions for scientific computing software quality assurance

Applicable Scenarios

Research Guidance: Provides formal methods researchers with scientific computing application research directions
Tool Development: Guides design and evaluation of scientific computing verification tools
Education and Training: Provides systematic framework for scientific computing correctness education
Standard Development: Offers reference for scientific computing software quality standard formulation

References

The paper cites 26 important references, covering:

Performance Benchmarks: NAS Parallel Benchmarks 7,8, Mantevo 11, Linpack 14
Formal Verification: Gallery of Verified Programs 1,17, VerifyThis 20
Numerical Verification: coq-num-analysis 6, Mathematical Components 2
Specialized Benchmarks: FPbench 12, DataRaceBench 21, SPEC 13
V&V Standards: ASME Guidance Principles 18

Although this paper is a position paper, it provides an important theoretical framework and development direction for scientific computing correctness verification. The proposed six-dimensional correctness framework and design guidance principles are significant for advancing the field, but require subsequent work to concretely implement and validate these concepts.