2025-11-17T19:07:12.711716

Fast Trigonometric Functions using the RLIBM Approach

Park, Nagarakatte

This paper describes our experience developing polynomial approximations for trigonometric functions that produce correctly rounded results for multiple representations and rounding modes using the RLIBM approach. A key challenge with trigonometric functions concerns range reduction with "pi", which reduces a given input in the domain of a 32-bit float to a small domain. Any rounding error in the value of "pi" is amplified during range reduction, which can result in wrong results. We describe our experience implementing fast range reduction techniques that maintain a large number of bits of "pi" both with floating-point and integer computations. The resulting implementations for trigonometric functions are fast and produce correctly rounded results for all inputs for multiple representations up to 32-bits with a single implementation.

academic

Fast Trigonometric Functions using the RLIBM Approach

Basic Information

Paper ID: 2510.13426
Title: Fast Trigonometric Functions using the RLIBM Approach
Authors: Sehyeok Park, Santosh Nagarakatte (Rutgers University)
Classification: cs.PL (Programming Languages)
Conference: International Workshop on Verification of Scientific Software (VSS 2025)
Paper Link: https://arxiv.org/abs/2510.13426

Abstract

This paper describes the experience of developing polynomial approximations for trigonometric functions using the RLIBM method, which produces correctly rounded results for multiple representations and rounding modes. The key challenge in trigonometric functions lies in range reduction involving π, which reduces inputs from the 32-bit floating-point domain to a small domain. Any rounding error in the π value is amplified during the range reduction process, potentially leading to incorrect results. The authors describe their experience implementing fast range reduction techniques that maintain numerous bits of π in both floating-point and integer computations. The resulting trigonometric function implementations are both fast and produce correctly rounded results for all inputs, supporting multiple representations up to 32 bits with only a single implementation.

Research Background and Motivation

Core Problems

Correctly Rounded Challenge: Scientific computing widely uses basic functions provided by mathematical libraries, but producing correctly rounded results for all inputs is extremely difficult (the "table maker's dilemma"), and mainstream mathematical libraries cannot produce correct results for all inputs.
Portability and Reproducibility Issues: The lack of correctly rounded mathematical libraries causes applications to produce completely different results on different machines, affecting portability and reproducibility.
Demand for Multiple Representation Formats: With the increase of custom formats (such as bfloat16, tensorfloat32, FP8), there is a need for a reference library that provides correct results for multiple representations and rounding modes.

Limitations of Existing Methods

Minimax Polynomial Approximation: Traditional methods generate polynomial approximations that minimize the maximum error across all inputs, but when the real-valued output is very close to the rounding boundary, degrees of freedom are significantly reduced.
Performance vs. Correctness Trade-off: Existing libraries make trade-offs in either performance (e.g., Payne-Hanek implementation) or correctness (e.g., GCC's libm).

Core Contributions

Efficient Range Reduction Technique: Developed an efficient range reduction algorithm combining floating-point and integer arithmetic that maintains sufficient bits of π to produce correct results.
Single Implementation for Multiple Representations: Implemented a single polynomial approximation that produces correctly rounded results for multiple representations from 10 to 32 bits and all standard rounding modes.
Performance Optimization: Integer-based range reduction achieves 19% performance improvement compared to floating-point strategies, with overall performance faster than or comparable to mainstream libraries.
Complete Trigonometric Function Library: Provides fast and correct implementations for sin, cos, and tan functions.

Detailed Methodology

Core Concept of RLIBM Method

The key insight of the RLIBM method is to directly approximate the correctly rounded result rather than the real value of the function. For the correctly rounded result of a given input, there exists a real-valued interval within which any value will round to the correct result. This provides greater degrees of freedom than the minimax method (1 ULP for all inputs).

Multi-Representation Support Mechanism

To support multiple representations, the RLIBM project proposes generating polynomial approximations for (n+2)-bit representations using round-to-odd rounding mode. The advantages of this approach are:

Round-to-odd results preserve all information needed for direct rounding to the target representation
Subsequent rounding to lower-precision representations produces correct results
Avoids double rounding errors

Range Reduction Algorithm

Basic Principle

Trigonometric range reduction maps input x∈-∞,∞ to reduced input x'∈-π/2^(t+1), π/2^(t+1), where:

x = x' + kπ/2^t
k = [2^t * x/π]
x' = π/2^t * r, where r = 2^t*x/π - k

Floating-Point Implementation Strategy

Small Input Handling (|x| < 2^30):

Uses 80-bit 256/π stored as two double values
Avoids intermediate rounding errors
Utilizes partial products to precisely compute k and fractional part r

Large Input Handling (2^30 ≤ |x|):

Version 1: Divides 256/π into 28-bit segments stored in a double array, with each segment generated using truncation mode
Version 2: Uses 53-bit precision segments, leveraging fused-multiply-add instructions to reduce rounding errors

Integer Implementation Strategy

Small Input Optimization:

Uses 80-bit 256/π divided into two 40-bit integers P1 and P0
Identifies integer k and fractional bits through bit-shift operations
Avoids precision loss from floating-point arithmetic

Large Input Handling:

Uses 192-bit 256/π divided into three 64-bit integers
Computes 128-bit partial products
Extracts relevant bits through bit-shift operations

Output Compensation

Utilizes trigonometric identities for output compensation:

sin(x) = sin(k'π/2^t)cos(x') + cos(k'π/2^t)sin(x')
cos(x) = cos(k'π/2^t)cos(x') - sin(k'π/2^t)sin(x')

Through precomputed tables and optimization using periodicity and symmetry, the required precomputed values are reduced to 512.

Experimental Setup

Test Environment

Hardware: 2.10GHz Intel Xeon(R) Silver 4310 server, 256GB RAM
Operating System: Ubuntu 24.04.1 LTS
Measurement Tool: Performance counters

Comparison Libraries

GLIBC: float and double libm
Core-Math: Correctly rounded library
RLIBM Implementation: Variants with different range reduction strategies

Evaluation Metrics

Correctness: Verified through complete enumeration of all inputs
Performance: Speedup relative to other libraries

Experimental Results

Correctness Verification

RLIBM Functions: Produce correctly rounded results for all inputs across all representations from 10 to 32 bits
GLIBC float libm: Thousands of incorrect results for sin, cos, tan on 32-bit float inputs
GLIBC double libm: More accurate than float version but still contains errors
Core-Math: Produces correct results only for 32-bit; fails for 10-32 bit range due to double rounding errors

Performance Results

Range Reduction Optimization Effects

Hybrid method (floating-point for small inputs, integer for large inputs) compared to other strategies:

19% faster than initial floating-point method (FP V1)
Significant improvement over alternative floating-point method (FP V2)
4% faster than pure integer method

Comparison with Other Libraries

Average 10% faster than Core-Math
Average 137% faster than GLIBC double functions
Performance improvements primarily attributed to efficient range reduction and precision advantages of integer arithmetic

Technical Innovations

1. Balance Between Precision and Performance

Integer arithmetic provides higher precision than 64-bit double (uint64_t and uint128_t)
Reduces the number of partial products needed to obtain sufficient precision for reducing inputs

2. Hybrid Range Reduction Strategy

Small inputs use floating-point arithmetic (when the integer part of 256*x/π is sufficiently small)
Large inputs use integer arithmetic (providing higher precision and simpler bit operations)

3. Bit Operation Optimization

Uses bit-shift operations to identify portions of 256*x/π relevant to reduced input and low bits of k
Avoids accumulation of rounding errors in floating-point arithmetic

Traditional Methods

Minimax Approximation: Remez algorithm and others, but with limited degrees of freedom near rounding boundaries
Payne-Hanek Algorithm: Classical range reduction method, but implementation efficiency is challenging

Correctly Rounded Research

CR-LIBM: Early correctly rounded library, but with slower performance
Core-Math: Modern correctly rounded implementation, but supporting only single representation

RLIBM Project Development

Extended from basic functions (e^x, log, etc.) to trigonometric functions
Innovative approach for multi-representation support

Conclusions and Discussion

Main Conclusions

Feasibility Proof: Demonstrates that generating fast and correct implementations for trigonometric functions is possible
Criticality of Range Reduction: Efficient range reduction is equally important as low-degree polynomial approximation
Advantages of Integer Arithmetic: Integer-based implementation significantly outperforms floating-point methods for large inputs

Limitations

Complexity: High implementation complexity requiring precise bit operations and multiple strategies
Memory Overhead: Requires precomputed tables and multi-precision constant storage
Scalability: Extension to higher-precision representations requires redesign

Future Directions

GPU Platforms: Explore correctly rounded libraries for GPU platforms
Standardization: Participate in IEEE-754 standards committee to promote mandatory correct rounding
Mainstream Integration: Collaborate with mainstream mathematical library developers to integrate these methods

In-Depth Evaluation

Strengths

Theory and Practice Integration: Successfully applies RLIBM theory to challenging trigonometric functions
Comprehensive Engineering Optimization: Full-spectrum optimization from algorithm to implementation
Rigorous Verification: Correctness verified through complete enumeration
Practical Value: Addresses important problems in real-world applications

Weaknesses

Implementation Complexity: Combination of multiple strategies increases implementation and maintenance complexity
Readability: Readability and maintainability of code with extensive bit operations need improvement
Theoretical Analysis: Lacks in-depth theoretical analysis of why integer methods are superior

Impact

Academic Contribution: Provides new correctly rounded implementation methods for numerical computing
Practical Value: Directly applicable to scientific computing requiring high-precision numerical calculations
Standards Promotion: May influence future development of floating-point standards

Applicable Scenarios

Scientific Computing: Numerical simulations requiring high precision and reproducibility
Financial Computing: Financial modeling requiring exact results
Embedded Systems: Systems requiring support for multiple floating-point formats
Reference Implementation: Serves as correctness benchmark for other libraries

References

This paper cites important literature in numerical analysis, floating-point arithmetic, and correctly rounded computation, including:

Muller's reference on elementary functions
MPFR high-precision library
Payne-Hanek range reduction algorithm
IEEE-754 floating-point standard related research

This paper makes significant contributions to the numerical computing field, successfully transforming theoretical methods into practical high-performance implementations, providing effective solutions to the correctly rounded problem in scientific computing.