2025-11-18T22:43:13.755250

Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms

Mukunoki, Ozaki

To obtain accurate results in numerical computation, high-precision arithmetic is a straightforward approach. However, most processors lack hardware support for floating-point formats beyond double precision (FP64). Double-word arithmetic (Dekker 1971) extends precision by using standard floating-point operations to represent numbers with twice the mantissa length. Building on this concept, various multi-word arithmetic methods have been proposed to further increase precision by combining additional words. Simplified variants, known as quasi algorithms, have also been introduced, which trade a certain loss of accuracy for reduced computational cost. In this study, we investigate the performance of quasi algorithms for double- and triple-word arithmetic in sparse iterative solvers based on the Conjugate Gradient method, and compare them with both non-quasi algorithms and standard FP64. We evaluate execution time on an x86 processor, the number of iterations to convergence, and solution accuracy. Although quasi algorithms require appropriate normalization to preserve accuracy - without it, convergence cannot be achieved - they can still reduce runtime when normalization is applied correctly, while maintaining accuracy comparable to full multi-word algorithms. In particular, quasi triple-word arithmetic can yield more accurate solutions without significantly increasing execution time relative to double-word arithmetic and its quasi variant. Furthermore, for certain problems, a reduction in iteration count contributes to additional speedup. Thus, quasi triple-word arithmetic can serve as a compelling alternative to conventional double-word arithmetic in sparse iterative solvers.

academic

Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms

Basic Information

Paper ID: 2510.13536
Title: Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms
Authors: Daichi Mukunoki (Nagoya University), Katsuhisa Ozaki (Shibaura Institute of Technology)
Classification: cs.MS (Mathematical Software)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13536

Abstract

To obtain accurate results in numerical computations, high-precision arithmetic is a direct approach. However, most processors lack hardware support for floating-point formats beyond double precision (FP64). Double-word arithmetic (Dekker 1971) extends precision by representing numbers with twice the mantissa length using standard floating-point operations. Building on this concept, various multi-word arithmetic methods have been proposed to further increase precision by combining additional words. Simplified variants, namely quasi-algorithms, have also been introduced, trading certain precision loss for reduced computational cost. This study investigates the performance of quasi-algorithms for double-word and triple-word arithmetic in sparse iterative solvers based on the conjugate gradient method, comparing them with non-quasi variants and standard FP64.

Research Background and Motivation

Core Problems

Hardware Limitations: Most processors lack hardware support for floating-point formats beyond double precision (FP64), limiting the implementation of high-precision numerical computations
Precision Requirements of Sparse Iterative Solvers: When solving large-scale sparse linear systems, rounding errors increase the number of iterations required for convergence, affecting solution precision and efficiency
Trade-off Between Performance and Precision: Traditional multi-word arithmetic methods can improve precision but incur significant computational overhead

Research Significance

Sparse iterative solvers are widely applied in scientific computing and engineering applications
High-precision arithmetic can improve convergence and reduce iteration counts
In memory-constrained applications, the additional overhead of multi-word arithmetic may be masked by memory latency

Limitations of Existing Methods

Traditional multi-word arithmetic (e.g., DW, TW) has high computational cost
Quasi-algorithms reduce computational cost but may result in precision loss
Lack of systematic evaluation of quasi-algorithm performance in iterative solvers

Core Contributions

Systematic Performance Evaluation of Quasi-Algorithms: First systematic evaluation of QDW and QTW algorithm performance in sparse iterative solvers
Discovery of Normalization's Critical Role: Demonstrates the importance of appropriate normalization for quasi-algorithm convergence
Proposal of QTW as Effective Alternative: Proves that quasi-triple-word arithmetic (QTW) can serve as an effective alternative to traditional double-word arithmetic
Comprehensive Performance Analysis: Comprehensive evaluation from three dimensions: execution time, iteration count, and solution precision

Methodology Details

Problem Definition

Solve the symmetric positive definite linear system Ax = b, where:

A is an n×n symmetric positive definite sparse matrix
b is the right-hand side vector
x is the solution vector to be computed

The conjugate gradient (CG) method is used for iterative solving, evaluating performance across different precision arithmetic.

Multi-Word Arithmetic Architecture

Fundamental Algorithms

Error-Free Transformation Algorithms:

TwoSum(a,b): Decomposes a+b into floating-point result x and rounding error y
QuickTwoSum(a,b): Efficient variant of TwoSum, requiring |a|≥|b|
TwoProdFMA(a,b): Decomposes a×b into result and error using FMA operations

Double-Word Arithmetic (DW)

DWadd: [c1,c2] = DWadd(a1,a2,b1,b2)
- Operands: 11 FP64 operations
- Includes normalization step (QuickTwoSum)

DWmul: [c1,c2] = DWmul(a1,a2,b1,b2)  
- Operands: 7 FP64 operations
- Includes normalization step

Quasi Double-Word Arithmetic (QDW)

Omits normalization step, allowing high and low word overlap
QDWadd: 8 operations, QDWmul: 4 operations
Significantly reduced computational cost

Quasi Triple-Word Arithmetic (QTW)

QTWadd: [c1,c2,c3] = QTWadd(a1,a2,a3,b1,b2,b3)
- Operands: 21 FP64 operations
- Does not enforce fl(c1+c2)=c1 and fl(c2+c3)=c2

QTWmul: [c1,c2,c3] = QTWmul(a1,a2,a3,b1,b2,b3)
- Operands: 24 FP64 operations

Technical Innovations

SIMD Vectorization Optimization:
- Vectorization using AVX2 and AVX-512 instruction sets
- QTW algorithm eliminates conditional branches, better suited for vectorization
Normalization Strategy:
- Normalization performed after residual vector update in CG method
- VecSum3 algorithm used to mitigate bit overlap in triple-word arithmetic
Mixed-Precision Implementation:
- Coefficient matrix A and right-hand side vector b stored in FP64
- Internal computations use corresponding multi-word arithmetic

Experimental Setup

Datasets

Eight symmetric positive definite matrices from the SuiteSparse collection:

Matrix	Dimension n	Non-zeros nnz	Application Domain
Hook_1498	1,498,023	60,917,445	Structural problems
bone010	986,703	47,851,783	Model reduction
nd24k	72,000	28,715,634	2D/3D problems
crankseg_2	63,838	14,148,858	Structural problems

Evaluation Metrics

Execution Time: Per-iteration time and total convergence time
Iteration Count: Number of iterations required to achieve convergence
Solution Precision: Relative error norm ||xk-x*||2/||x*||2

Comparison Methods

CG-FP64: Standard double-precision implementation
CG-DW: Double-word arithmetic implementation
CG-QDW: Quasi double-word arithmetic implementation
CG-TW: Triple-word arithmetic implementation
CG-QTW: Quasi triple-word arithmetic implementation

Implementation Details

Hardware Platform: Intel Xeon Gold 6230 CPU (20 cores, 2.10-3.90 GHz)
Compiler: g++ 11.3.0, optimization flags -O3 -march=native
Parallelization: OpenMP + SIMD vectorization
Convergence Tolerance: ε = 10^-16, 10^-24, 10^-32

Experimental Results

Main Results

Performance Overhead Analysis

Execution time overhead relative to FP64 (100 iterations):

CG-QDW: approximately 1.3×
CG-DW: approximately 2.1×
CG-QTW: approximately 2.4×
CG-TW: up to 67×

Convergence Performance Comparison

Typical results under ε=10^-16 convergence tolerance:

Matrix	FP64 Time(s)/Iterations	QDW Time(s)/Iterations	QTW Time(s)/Iterations
bone010	170/21780	120/12547	150/11352
pdb1HYS	5.4/12807	3.4/6285	4.9/5346

Key Findings

Necessity of Normalization:
- Quasi-algorithms fail to converge without normalization
- Normalization after residual vector update ensures convergence
Advantages of QTW:
- Significantly reduced computational overhead compared to TW
- Achieves comparable precision to TW
- Supports SIMD vectorization for better performance
Benefits of Reduced Iteration Count:
- High-precision arithmetic reduces iteration count
- Total execution time may be lower than FP64 implementation

Throughput Analysis

SpMV operation throughput (GB/s):

FP64 and QDW: Close to memory bandwidth limit (approximately 90 GB/s)
DW and QTW: Reach memory-bound performance after SIMD optimization
TW: Significantly degraded performance due to branch effects

Multi-Word Arithmetic Development

Foundational Theory: Dekker (1971) double-word arithmetic
Extended Methods: Triple-word (TW), quad-word (QW) arithmetic
Simplified Variants: Quasi-algorithms (QDW, QTW, QQW)

High-Precision Linear Algebra Libraries

QD Library: Fortran/C++ implementation of double-word and quad-word arithmetic
XBLAS: BLAS routines based on DW arithmetic
MPLAPACK: High-precision BLAS and LAPACK

Sparse Iterative Solver Applications

Research on quad-precision CG solvers
Mixed-precision methods
Ozaki scheme for accurate sparse matrix-vector multiplication

Conclusions and Discussion

Main Conclusions

Feasibility of Quasi-Algorithms: Through appropriate normalization, quasi-algorithms can be effectively applied in sparse iterative solvers
Advantages of QTW: Quasi-triple-word arithmetic provides a good balance between precision and performance
Performance Improvement Potential: On certain problems, reduced iteration counts can provide additional acceleration

Limitations

Normalization Overhead: Requires trade-off between precision and execution time
Problem Dependency: Performance improvement depends on specific problem characteristics
Evaluation Scope: Only evaluates basic CG method, does not include preconditioning techniques

Future Directions

Normalization Strategy Optimization: Study effects of more frequent normalization on precision
Extension to Other Iterative Methods: Evaluate application in other solvers
Distributed Environment Applications: Potential in communication-latency-dominated environments
Low-Precision Format Implementation: Multi-word arithmetic implementation using FP16/FP32 on AI processors

In-Depth Evaluation

Strengths

Systematic Study: First systematic evaluation of quasi-algorithm performance in iterative solvers
High Practical Value: QTW algorithm provides a practical high-precision computation scheme
Comprehensive Experiments: Thorough evaluation from multiple dimensions (time, precision, convergence)
Sound Technical Innovation: Well-designed SIMD optimization and normalization strategies

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical analysis of error accumulation in quasi-algorithms
Limited Evaluation Scope: Only evaluates CG method, lacks verification on other iterative methods
Single Normalization Strategy: Only attempts one normalization location and frequency

Impact

Academic Contribution: Provides new algorithm options for high-precision numerical computation
Practical Value: QTW algorithm can be directly applied to practical scientific computing problems
Reproducibility: Sufficient implementation details for reproduction

Applicable Scenarios

Scientific Computing: Large-scale sparse linear systems requiring high-precision solutions
Engineering Simulation: Structural analysis, electromagnetic field computation, etc.
Resource-Constrained Environments: Systems lacking hardware quad-precision support

References

This paper cites 29 relevant references covering key works in multi-word arithmetic theory, high-precision linear algebra libraries, and sparse iterative solvers, providing a solid theoretical foundation for the research.