2025-11-10T02:34:56.080990

Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part I

Zhang

Sparse matrix vector multiplication (SpMV) is a fundamental kernel in scientific codes that rely on iterative solvers. In this first part of our work, we present both a sequential and a basic MPI parallel implementations of SpMV, aiming to provide a challenge problem for the scientific software verification community. The implementations are described in the context of the PETSc library.

academic

Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part I

Basic Information

Paper ID: 2510.13427
Title: Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part I
Author: Junchao Zhang (Argonne National Laboratory)
Classification: cs.LO cs.DC cs.MS
Conference: International Workshop on Verification of Scientific Software (VSS 2025)
Publication Information: EPTCS 432, 2025, pp. 98–105
Paper Link: https://arxiv.org/abs/2510.13427

Abstract

Research Background and Motivation

Problem Definition

This research addresses software verification challenges for sparse matrix vector multiplication (SpMV) in high performance computing. SpMV is the core operation for solving sparse linear systems Ax=b and is widely employed in scientific computing codes based on iterative solvers, particularly in large-scale Krylov subspace methods.

Significance

Fundamental Nature: SpMV is a foundational algorithm in scientific computing, and its correctness directly impacts the reliability of higher-level applications
Complexity: Although mathematically simple (yi = Σ(aij·xj)), storage formats, data distribution, parallelization, and optimization make implementation complex
Verification Challenges: Existing highly optimized implementations present significant challenges for software verification

Limitations of Existing Approaches

The PETSc library contains highly optimized MPI parallel SpMV implementations, but their complexity makes verification difficult
Lack of standardized challenge problems specifically designed for the software verification community

Research Motivation

To provide a structured challenge problem for the scientific software verification community by offering SpMV implementations ranging from simple to complex, thereby facilitating the development and evaluation of verification tools and methods.

Core Contributions

Provision of Standardized Verification Challenge Problem: Designed standard test cases for SpMV specifically for the scientific software verification community
Implementation of Two SpMV Algorithms with Different Complexity Levels:
- Sequential implementation (seq.c)
- Basic MPI parallel implementation (mpibasic.c)
Establishment of Complete Verification Framework: Including input data generation, correctness checking, and error detection mechanisms
Clear Definition of Verification Objectives: Provided specific verification requirements and challenges for each implementation

Methodology Details

Task Definition

Input:

Sparse matrix A (M×N with NNZ nonzero elements) stored in CSR format
Vector x (N-dimensional)
Expected result z = Ax (M-dimensional)

Output:

Computed result y = Ax
Correctness verification: ||y-z||² should be 0 (ignoring floating-point rounding errors)

Constraints:

Use Compressed Sparse Row (CSR) format
Support MPI parallel distributed computation

Data Structure Design

CSR Format Representation

Sparse matrix A is represented using three arrays:

a[nnz]: Stores nonzero element values
j[nnz]: Stores column indices of nonzero elements
i[m+1]: Row pointer; ik points to the starting position of row k in a and j

Data Type Definition

// Sequential version
typedef struct {
    int m, n;        // Matrix dimensions
    int *i, *j;      // CSR format indices
    double *a;       // Nonzero element values
} Mat;

// MPI parallel version
typedef struct {
    int m, n, M, N;  // Local and global dimensions
    int rstart, cstart; // Starting row and column indices
    int *i, *j;
    double *a;
} Mat;

Algorithm Implementation

Sequential Implementation

for (i = 0; i < A.m; i++) {
    y.a[i] = 0.0;
    for (j = A.i[i]; j < A.i[i + 1]; j++)
        y.a[i] += A.a[j] * x.a[A.j[j]];
}

MPI Parallel Implementation

Data Distribution Strategy:
- Matrix distributed by row blocks: m = M/size + (M%size > rank ? 1 : 0)
- Vector x distributed by column layout: n = N/size + (N%size > rank ? 1 : 0)
Communication Patterns:
- Use MPI_Allgather or MPI_Allgatherv to collect complete vector x
- Use MPI_Allreduce to compute global norm
Computational Flow:
- Compute local matrix layout (rstart, cstart)
- Extract local data from global arrays
- Gather distributed vector x into local complete copy X
- Execute local SpMV computation
- Compute local error norm and global reduction

Technical Innovations

Progressive Complexity Design: From simple sequential implementation to basic parallel implementation, facilitating progressive testing of verification tools
Standardized Verification Interface: Provides unified input/output format and correctness checking mechanisms
Real-World Application Context: Based on actual implementation patterns from the PETSc library with practical significance
Extensible Framework: Establishes foundation for future more complex optimized versions (Part II)

Experimental Setup

Datasets

Matrix Scale: 32×36 matrix with 50 nonzero elements
Data Generation: Test data generated using accompanying Python script csr.py
Hardcoded Input: Test data embedded directly in source code for simplified usage
Customizability: Users can modify Python script parameters to generate different test cases

Evaluation Metrics

Correctness Verification: Compute squared norm ||y-z||²
Success Criterion: Norm ≤ 1e-6 (accounting for floating-point rounding errors)
Return Code: Returns 0 on success, non-zero on error

Implementation Details

Programming Language: C
Parallel Framework: MPI
Compilation Requirements: Only C compiler or MPI compiler needed
Code Availability: Publicly released on GitHub repository

Experimental Results

Verification Objectives

Sequential Implementation Verification Requirements

Verify that computed result y satisfies: yi = Σ(aij·xj), where aij not stored in CSR representation is treated as 0

MPI Parallel Implementation Verification Requirements

Layout Correctness: Verify Σm = M, Σn = N
Local Computation Correctness: Verify that y on each process is the correct product of the corresponding local submatrix and complete vector x

Test Cases

The paper provides specific test data:

Input matrix: 32×36 sparse matrix (50 nonzero elements)
Input vector: 36-dimensional vector
Expected output: 32-dimensional result vector
All data provided as integer and floating-point arrays

Krylov Subspace Methods: SpMV is the core component of iterative solvers
PETSc Library: Provides rich suite of Krylov methods and SpMV implementations
Scientific Software Verification: Correctness verification for high-performance scientific computing software

Paper Positioning

Fills the gap of lacking standardized SpMV verification challenges in the scientific software verification community
Provides foundational reference for verification of complex optimized implementations
Bridges theoretical verification methods and practical HPC application needs

Conclusions and Discussion

Main Conclusions

Successfully established standardized challenge problem for SpMV software verification
Provided two implementations with different complexity levels suitable for progressive verification tool testing
Based on real PETSc library patterns with practical application value

Limitations

Scale Limitation: Current test cases are relatively small (32×36 matrix)
Missing Optimizations: Basic MPI implementation does not include complex optimizations present in production environments
Verification Scope: Covers only basic correctness, not addressing performance and numerical stability verification

Future Directions

Part II Development: Plan to provide more complex optimized MPI implementations in subsequent work
Extended Test Cases: Add test matrices with larger scales and different sparsity patterns
Verification Tool Integration: Integration testing with existing verification tools

In-Depth Evaluation

Strengths

High Practical Value: Addresses actual needs of the scientific software verification community
Reasonable Design: Progressive complexity design facilitates verification tool development and testing
High Standardization: Provides complete input/output specifications and correctness checking mechanisms
Strong Reproducibility: Code publicly available, test data embedded, easy to reproduce
Real-World Context: Based on widely-used PETSc library with genuine application significance

Weaknesses

Lack of Theoretical Analysis: No algorithm complexity analysis or theoretical property discussion
Limited Test Coverage: Only single test case provided, insufficient diversity
Shallow Verification Depth: Primarily focuses on functional correctness, lacking numerical precision and boundary condition analysis
Performance Considerations: Does not address performance verification-related challenges

Impact

Field Contribution: Provides important standardized testing platform for scientific software verification
Practical Value: Can be directly used for verification tool development and evaluation
Extensibility: Establishes foundational framework for subsequent more complex implementations
Community Value: Promotes communication and collaboration between HPC and software verification communities

Applicable Scenarios

Verification Tool Development: As standard test case for verifying tool effectiveness
Educational Applications: Suitable as teaching case for parallel computing and software verification
Benchmark Testing: Can serve as reference benchmark for SpMV implementation correctness
Research Platform: Provides standardized platform for related algorithm and optimization research

References

S Balay et al. (2025): PETSc/TAO Users Manual. Technical Report ANL-21/39 - Revision 3.23, Argonne National Laboratory
Yousef Saad (2003): Iterative Methods for Sparse Linear Systems, second edition. Society for Industrial and Applied Mathematics

Overall Assessment: This is a highly practical work paper that, while contributing limited theoretical innovation, provides the scientific software verification community with much-needed standardized challenge problems. The paper is well-structured, complete in implementation, and demonstrates good reproducibility and extensibility, making it valuable for advancing the development of HPC software verification.