2025-11-13T19:19:11.174126

Formal Verification of COO to CSR Sparse Matrix Conversion (Invited Paper)

Appel
We describe a machine-checked correctness proof of a C program that converts a coordinate-form (COO) sparse matrix to a compressed-sparse-row (CSR) matrix. The classic algorithm (sort the COO entries in lexicographic order by row,column; fill in the CSR arrays left to right) is concise but has rather intricate invariants. We illustrate a bottom-up methodology for deriving the invariants from the program.
academic

Formal Verification of COO to CSR Sparse Matrix Conversion (Invited Paper)

Basic Information

  • Paper ID: 2510.13412
  • Title: Formal Verification of COO to CSR Sparse Matrix Conversion (Invited Paper)
  • Author: Andrew W. Appel (Princeton University)
  • Classification: math.NA cs.LO cs.NA
  • Publication Venue/Conference: VSS 2025 (International Workshop on Verification of Scientific Software), EPTCS 432, 2025
  • Paper Link: https://arxiv.org/abs/2510.13412

Abstract

This paper describes a machine-checked correctness proof for verifying a C program that converts sparse matrices from Coordinate (COO) format to Compressed Sparse Row (CSR) format. Although the classical algorithm—sorting COO entries in lexicographic order by row and column, then filling CSR arrays from left to right—is concise, it exhibits considerably complex invariants. The paper demonstrates a bottom-up methodology for deriving invariants from the program.

Research Background and Motivation

Problem Definition

  1. Core Problem: Sparse matrices are widely used in numerical computation, but existing sparse matrix conversion algorithms lack formal verification, potentially harboring difficult-to-detect errors
  2. Significance: Sparse matrix-vector multiplication is a fundamental operation in numerical methods; errors in conversion can cause failure throughout the entire computational pipeline
  3. Existing Limitations: Traditional verification methods rely on regression testing and similar techniques, which cannot provide mathematical correctness guarantees
  4. Research Motivation: Through machine-checked formal proofs, ensure absolute correctness of COO to CSR conversion, establishing a foundation for verifiable numerical software

Application Context

  • Sparse Matrix Representations: COO format facilitates construction; CSR format facilitates multiplication operations
  • Finite Element Analysis: Each internal vertex in a mesh generates multiple coordinate tuples, naturally producing duplicate entries
  • Numerical Precision: The non-associativity of floating-point arithmetic makes the summation order of duplicate entries affect results

Core Contributions

  1. First Machine-Checked Correctness Proof for COO to CSR Conversion: Using the Verifiable Software Toolchain (VST) and Coq proof assistant
  2. Innovative Loop Invariant Derivation Methodology: Proposes a bottom-up approach to derive complex loop invariants from properties the program must satisfy
  3. Separation of Data Structure and Approximation Reasoning: Decouples data structure reasoning in C programs from floating-point approximation reasoning, simplifying verification complexity
  4. Composable Verification Components: Provides verified sparse matrix conversion modules reusable in larger verification systems
  5. Practical Error Discovery: Identified and fixed 5 program errors during proof (4 off-by-one errors and 1 complex unsigned integer handling error)

Methodology Details

Task Definition

Input: COO sparse matrix - containing dimensions rows×cols and n coordinate triples (row_indk, col_indk, valk) Output: CSR sparse matrix - containing three one-dimensional arrays (val, col_ind, row_ptr) Constraints: Maintain mathematical matrix semantic equivalence; correctly handle floating-point summation of duplicate entries

Core Algorithm

struct csr_matrix *coo_to_csr_matrix(struct coo_matrix *p) {
    // 1. Sort COO entries in lexicographic order by (row, column)
    coo_quicksort(p, 0, n);
    
    // 2. Count the number of unique coordinate pairs
    k = coo_count(p);
    
    // 3. Allocate CSR array space
    // 4. Traverse sorted COO entries and construct CSR structure
    for (i=0; i<n; i++) {
        // Handle duplicate entries, new columns, new rows, etc.
    }
}

Formal Specification Architecture

1. Mathematical Object Definitions

Record coo_matrix (t: type) := {
    coo_rows: Z;
    coo_cols: Z;
    coo_entries: list (Z * Z * ftype t)
}.

Record csr_matrix {t: type} := {
    csr_cols: Z;
    csr_vals: list (ftype t);
    csr_col_ind: list Z;
    csr_row_ptr: list Z;
    csr_rows: Z := Zlength (csr_row_ptr) - 1
}.

2. Memory Representation Relations

  • coo_rep (sh: share) (coo: coo_matrix Tdouble) (p: val) : mpred
  • csr_rep (sh: share) (csr: csr_matrix Tdouble) (p: val) : mpred

3. Matrix Semantic Relations

Definition coo_to_matrix {t: type} (coo: coo_matrix t) (m: matrix t) : Prop :=
    coo_rows coo = matrix_rows m ∧
    matrix_cols m (coo_cols coo) ∧
    ∀ i j, sum_any (list of duplicate entry values) (matrix_index m i j).

Technical Innovations

1. Non-Deterministic Modeling of Floating-Point Summation

Inductive sum_any {t}: list (ftype t) → ftype t → Prop :=
| Sum_Any_0: sum_any nil (Zconst t 0)
| Sum_Any_1: ∀ x, sum_any [x] x  
| Sum_Any_split: ∀ al bl a b, sum_any al a → sum_any bl b →
    sum_any (al++bl) (BPLUS a b)
| Sum_Any_perm: ∀ al bl s, Permutation al bl → sum_any al s → sum_any bl s.

2. Partial CSR Relation Definition

A key innovation is the partial_CSR i r coo ROWPTR COLIND VAL relation, representing the partial CSR state when processing the i-th COO entry at row r.

3. Systematic Loop Invariant Derivation

By analyzing critical state transition points in the program, systematically derive required invariant properties:

  • partial_CSR_0: Initial state
  • partial_CSR_duplicate: Processing duplicate entries
  • partial_CSR_newcol: New column entry
  • partial_CSR_newrow: New row entry
  • partial_CSR_skiprow: Skipping empty rows

Experimental Setup

Verification Toolchain

  • Coq Proof Assistant: For formal specification and proof
  • Verifiable Software Toolchain (VST): For Hoare logic verification of C programs
  • Verifiable C: Program logic in VST, embedded in Coq

Verification Scale

  • Definitions and Lemmas: 1571 lines of Coq code for coo_csr and partial_CSR definitions and properties
  • VST Proof: 412 lines for main Hoare logic proof
  • Trusted Base: Core specification approximately 39 lines, requiring manual inspection of critical components

Verification Methodology

  1. Layered Verification: First prove C program implements low-level specification, then prove mathematical correctness of low-level specification
  2. Modular Design: Separate data structure reasoning from floating-point approximation reasoning
  3. Bottom-Up Derivation: Reverse-engineer loop invariants from Hoare logic proof requirements

Experimental Results

Main Achievements

  1. Complete Correctness Proof: Successfully proved complete correctness of COO to CSR conversion
  2. Error Discovery: Identified 5 actual errors during verification:
    • 4 off-by-one errors
    • 1 complex unsigned integer initialization error (initializing r to -1)
  3. Composability: Verified modules can be combined with other verified sparse matrix operations

Verification Coverage

  • Function Specifications: Complete preconditions and postconditions
  • Loop Invariants: Complex invariants for three nested loops
  • Memory Safety: Array bounds and memory allocation safety
  • Mathematical Correctness: Preservation of matrix semantics

Performance Characteristics

  • Compile-Time Verification: No runtime overhead
  • Trustworthiness: Machine-checked proof based on Coq kernel
  • Maintainability: Modular specification design facilitates subsequent modifications and extensions

Formal Verification Domain

  1. Numerical Software Verification: This paper is an important case study in end-to-end verification of numerical algorithms
  2. VST Toolchain: Extends existing C program verification framework to sparse matrix applications
  3. Floating-Point Arithmetic Verification: Incorporates tools like VCFloat2 for floating-point precision analysis

Sparse Matrix Algorithms

  1. Classical Algorithms: COO to CSR conversion is a standard algorithm from decades of practice
  2. Numerical Linear Algebra: Consistent with algorithms in classical literature like Templates for Linear Systems
  3. High-Performance Computing: Provides foundational components for verifiable scientific computing software

Program Verification Methodology

  1. Loop Invariant Derivation: Proposed bottom-up method has general applicability
  2. Separation Logic: Effectively handles complex memory data structures
  3. Specification Engineering: Demonstrates specification design principles for large-scale verification projects

Conclusions and Discussion

Main Conclusions

  1. Feasibility Demonstration: Complete formal verification of complex numerical algorithms is feasible
  2. Methodological Contribution: Bottom-up invariant derivation method has broad applicability
  3. Practical Value: Actual errors discovered during verification demonstrate the value of formal methods
  4. Infrastructure: Establishes foundation for larger-scale scientific computing software verification

Limitations

  1. Bit-for-Bit Reproducibility: Current specification allows different floating-point summation orders, not guaranteeing bit-level reproducibility
  2. Performance Considerations: Verified algorithm not optimized for performance
  3. Reassembly Functionality: Optimized version reusing CSR structures not implemented
  4. Verification Cost: Relatively short program requires substantial verification effort

Future Directions

  1. Stronger Specifications: Specification versions supporting bit-for-bit reproducibility
  2. Performance Optimization: Verify high-performance sparse matrix algorithm variants
  3. Larger Systems: Integrate this module into complete PDE solver verification
  4. Automation: Develop better tools supporting automatic loop invariant derivation

In-Depth Evaluation

Strengths

  1. Technical Depth: Addresses multiple technical challenges in sparse matrix algorithms, including floating-point arithmetic, memory management, and complex control flow
  2. Methodological Innovation: Bottom-up invariant derivation method provides a replicable paradigm for similar verifications
  3. Practical Value: Discovery of actual errors demonstrates real benefits of formal verification
  4. Engineering Quality: Modular design and clear specification structure reflect high-quality verification engineering
  5. Completeness: End-to-end verification from C code to mathematical specification

Weaknesses

  1. Verification Cost: 1983 lines of verification code for relatively short C program represents high cost
  2. Limited Generality: Specifically targets COO to CSR conversion with limited generalization capability
  3. Insufficient Performance Consideration: Does not address performance optimization needs in practical applications
  4. Tool Dependency: Highly dependent on VST and Coq ecosystem

Impact

  1. Academic Contribution: Provides important case study and methodology for numerical software verification
  2. Practical Impact: Can serve as foundational component for highly trustworthy scientific computing software
  3. Methodology Promotion: Loop invariant derivation method applicable to verification of other complex algorithms
  4. Tool Development: Promotes application of VST and similar verification tools in numerical computing domain

Applicable Scenarios

  1. Critical Scientific Computing: Numerical simulations and analyses requiring high trustworthiness
  2. Safety-Critical Systems: Software involving numerical computation in safety-critical applications
  3. Verification Research: Reference case study for formal verification of complex algorithms
  4. Educational Use: Demonstrates capabilities and methods of modern program verification techniques

References

Key references cited in this paper include:

  1. Barrett et al. (1994): Templates for the Solution of Linear Systems - Classical reference for sparse matrix algorithms
  2. Appel & Kellison (2024): VCFloat2 - Floating-point error analysis tool
  3. Cao et al. (2018): VST-Floyd - Separation logic verification tool
  4. Kellison et al. (2023): LAProof - Formal proof library for linear algebra programs

Summary: This paper demonstrates the feasibility of complete formal verification of complex numerical algorithms, proposes practical verification methodologies, and makes important contributions to the development of trustworthy scientific computing software. Despite high verification costs, its value in discovering actual errors and providing methodological guidance makes it an important work in the field.