2025-11-23T21:58:17.757337

Towards Richer Challenge Problems for Scientific Computing Correctness

Sottile, Tekriwal, Sarracino

Correctness in scientific computing (SC) is gaining increasing attention in the formal methods (FM) and programming languages (PL) community. Existing PL/FM verification techniques struggle with the complexities of realistic SC applications. Part of the problem is a lack of a common understanding between the SC and PL/FM communities of machine-verifiable correctness challenges and dimensions of correctness in SC applications. To address this gap, we call for specialized challenge problems to inform the development and evaluation of FM/PL verification techniques for correctness in SC. These specialized challenges are intended to augment existing problems studied by FM/PL researchers for general programs to ensure the needs of SC applications can be met. We propose several dimensions of correctness relevant to scientific computing, and discuss some guidelines and criteria for designing challenge problems to evaluate correctness in scientific computing.

academic

Towards Richer Challenge Problems for Scientific Computing Correctness

基本信息

论文ID: 2510.13423
标题: Towards Richer Challenge Problems for Scientific Computing Correctness
作者: Matthew Sottile, Mohit Tekriwal, John Sarracino (Lawrence Livermore National Laboratory)
分类: cs.SE cs.MS
发表会议: International Workshop on Verification of Scientific Software (VSS 2025), EPTCS 432
论文链接: https://arxiv.org/abs/2510.13423

社区间理解差距：科学计算社区和形式化方法/编程语言社区之间缺乏对正确性挑战的共同理解
现有验证技术局限性：现有的PL/FM验证技术难以处理现实科学计算应用的复杂性
挑战问题不足：缺乏专门针对科学计算正确性的标准化挑战问题集

问题重要性

科学计算应用涉及复杂的数值计算、并行处理、物理建模等多个层面，其正确性直接影响科学研究结果的可靠性。传统的软件验证方法往往无法充分覆盖科学计算特有的正确性需求。

现有方法局限性

现有的形式化验证挑战问题主要针对通用程序，缺乏科学计算特有的复杂性
数值验证社区虽然有相关工作，但缺乏统一的挑战问题集
现有基准测试套件主要关注性能而非正确性

研究动机

借鉴高性能计算领域性能基准测试套件(如NAS Parallel Benchmarks、Mantevo等)的成功经验，为科学计算正确性建立类似的挑战问题框架。

核心贡献

提出科学计算正确性的六个维度：数值计算、数据结构、领域建模结构、微分方程、并发并行、近似方案
识别挑战问题设计的关键陷阱：过度专业化、"玩具"问题、忽视科学计算独特性等
建立挑战问题与基准测试的区别：挑战问题定义目标和评估标准，基准测试提供客观度量
提供设计指导原则：考虑不确定性、分离数学与实现、允许未检查假设等

低级层面：数值计算、传统数据结构
中级层面：模型特定的数据结构和计算
高级层面：数学抽象、物理系统不变量

六个核心维度

数值计算(Numerics)
- 数学运算与硬件/软件实现的正确对应
- 浮点运算的精度问题
- 混合精度算法的挑战
数据结构(Data Structures)
- 标准数据结构的正确性
- 性能优化导致的结构变换(如SOA到AOS转换)
- 语义等价性保证
领域建模结构(Domain-modeling Structures)
- 网格、图等复杂数据结构
- 物理系统约束的满足
- 守恒定律等高级不变量
微分方程(Differential Equations)
- PDE与物理建模的一致性
- 数值稳定性、边界条件兼容性
- 适定性(well-posedness)
并发并行(Concurrency and Parallelism)
- 多种并行编程模型的组合
- 共享内存、向量化、分布式内存并行
- 性能与正确性的平衡
近似方案(Approximation Schemes)
- 算法启发式方法
- 插值方法
- 与数值方法的区别