Formal Verification of COO to CSR Sparse Matrix Conversion (Invited Paper)
Appel
We describe a machine-checked correctness proof of a C program that converts a coordinate-form (COO) sparse matrix to a compressed-sparse-row (CSR) matrix. The classic algorithm (sort the COO entries in lexicographic order by row,column; fill in the CSR arrays left to right) is concise but has rather intricate invariants. We illustrate a bottom-up methodology for deriving the invariants from the program.
academic
Formal Verification of COO to CSR Sparse Matrix Conversion (Invited Paper)
This paper describes a machine-checked correctness proof for verifying a C program that converts sparse matrices from Coordinate (COO) format to Compressed Sparse Row (CSR) format. Although the classical algorithm—sorting COO entries in lexicographic order by row and column, then filling CSR arrays from left to right—is concise, it exhibits considerably complex invariants. The paper demonstrates a bottom-up methodology for deriving invariants from the program.
Core Problem: Sparse matrices are widely used in numerical computation, but existing sparse matrix conversion algorithms lack formal verification, potentially harboring difficult-to-detect errors
Significance: Sparse matrix-vector multiplication is a fundamental operation in numerical methods; errors in conversion can cause failure throughout the entire computational pipeline
Existing Limitations: Traditional verification methods rely on regression testing and similar techniques, which cannot provide mathematical correctness guarantees
Research Motivation: Through machine-checked formal proofs, ensure absolute correctness of COO to CSR conversion, establishing a foundation for verifiable numerical software
First Machine-Checked Correctness Proof for COO to CSR Conversion: Using the Verifiable Software Toolchain (VST) and Coq proof assistant
Innovative Loop Invariant Derivation Methodology: Proposes a bottom-up approach to derive complex loop invariants from properties the program must satisfy
Separation of Data Structure and Approximation Reasoning: Decouples data structure reasoning in C programs from floating-point approximation reasoning, simplifying verification complexity
Composable Verification Components: Provides verified sparse matrix conversion modules reusable in larger verification systems
Practical Error Discovery: Identified and fixed 5 program errors during proof (4 off-by-one errors and 1 complex unsigned integer handling error)
struct csr_matrix *coo_to_csr_matrix(struct coo_matrix *p) {
// 1. Sort COO entries in lexicographic order by (row, column)
coo_quicksort(p, 0, n);
// 2. Count the number of unique coordinate pairs
k = coo_count(p);
// 3. Allocate CSR array space
// 4. Traverse sorted COO entries and construct CSR structure
for (i=0; i<n; i++) {
// Handle duplicate entries, new columns, new rows, etc.
}
}
Record coo_matrix (t: type) := {
coo_rows: Z;
coo_cols: Z;
coo_entries: list (Z * Z * ftype t)
}.
Record csr_matrix {t: type} := {
csr_cols: Z;
csr_vals: list (ftype t);
csr_col_ind: list Z;
csr_row_ptr: list Z;
csr_rows: Z := Zlength (csr_row_ptr) - 1
}.
Inductive sum_any {t}: list (ftype t) → ftype t → Prop :=
| Sum_Any_0: sum_any nil (Zconst t 0)
| Sum_Any_1: ∀ x, sum_any [x] x
| Sum_Any_split: ∀ al bl a b, sum_any al a → sum_any bl b →
sum_any (al++bl) (BPLUS a b)
| Sum_Any_perm: ∀ al bl s, Permutation al bl → sum_any al s → sum_any bl s.
A key innovation is the partial_CSR i r coo ROWPTR COLIND VAL relation, representing the partial CSR state when processing the i-th COO entry at row r.
Technical Depth: Addresses multiple technical challenges in sparse matrix algorithms, including floating-point arithmetic, memory management, and complex control flow
Methodological Innovation: Bottom-up invariant derivation method provides a replicable paradigm for similar verifications
Practical Value: Discovery of actual errors demonstrates real benefits of formal verification
Cao et al. (2018): VST-Floyd - Separation logic verification tool
Kellison et al. (2023): LAProof - Formal proof library for linear algebra programs
Summary: This paper demonstrates the feasibility of complete formal verification of complex numerical algorithms, proposes practical verification methodologies, and makes important contributions to the development of trustworthy scientific computing software. Despite high verification costs, its value in discovering actual errors and providing methodological guidance makes it an important work in the field.