2025-11-17T19:07:12.711716

Fast Trigonometric Functions using the RLIBM Approach

Park, Nagarakatte
This paper describes our experience developing polynomial approximations for trigonometric functions that produce correctly rounded results for multiple representations and rounding modes using the RLIBM approach. A key challenge with trigonometric functions concerns range reduction with "pi", which reduces a given input in the domain of a 32-bit float to a small domain. Any rounding error in the value of "pi" is amplified during range reduction, which can result in wrong results. We describe our experience implementing fast range reduction techniques that maintain a large number of bits of "pi" both with floating-point and integer computations. The resulting implementations for trigonometric functions are fast and produce correctly rounded results for all inputs for multiple representations up to 32-bits with a single implementation.
academic

Fast Trigonometric Functions using the RLIBM Approach

Basic Information

  • Paper ID: 2510.13426
  • Title: Fast Trigonometric Functions using the RLIBM Approach
  • Authors: Sehyeok Park, Santosh Nagarakatte (Rutgers University)
  • Classification: cs.PL (Programming Languages)
  • Conference: International Workshop on Verification of Scientific Software (VSS 2025)
  • Paper Link: https://arxiv.org/abs/2510.13426

Abstract

This paper describes the experience of developing polynomial approximations for trigonometric functions using the RLIBM method, which produces correctly rounded results for multiple representations and rounding modes. The key challenge in trigonometric functions lies in range reduction involving π, which reduces inputs from the 32-bit floating-point domain to a small domain. Any rounding error in the π value is amplified during the range reduction process, potentially leading to incorrect results. The authors describe their experience implementing fast range reduction techniques that maintain numerous bits of π in both floating-point and integer computations. The resulting trigonometric function implementations are both fast and produce correctly rounded results for all inputs, supporting multiple representations up to 32 bits with only a single implementation.

Research Background and Motivation

Core Problems

  1. Correctly Rounded Challenge: Scientific computing widely uses basic functions provided by mathematical libraries, but producing correctly rounded results for all inputs is extremely difficult (the "table maker's dilemma"), and mainstream mathematical libraries cannot produce correct results for all inputs.
  2. Portability and Reproducibility Issues: The lack of correctly rounded mathematical libraries causes applications to produce completely different results on different machines, affecting portability and reproducibility.
  3. Demand for Multiple Representation Formats: With the increase of custom formats (such as bfloat16, tensorfloat32, FP8), there is a need for a reference library that provides correct results for multiple representations and rounding modes.

Limitations of Existing Methods

  • Minimax Polynomial Approximation: Traditional methods generate polynomial approximations that minimize the maximum error across all inputs, but when the real-valued output is very close to the rounding boundary, degrees of freedom are significantly reduced.
  • Performance vs. Correctness Trade-off: Existing libraries make trade-offs in either performance (e.g., Payne-Hanek implementation) or correctness (e.g., GCC's libm).

Core Contributions

  1. Efficient Range Reduction Technique: Developed an efficient range reduction algorithm combining floating-point and integer arithmetic that maintains sufficient bits of π to produce correct results.
  2. Single Implementation for Multiple Representations: Implemented a single polynomial approximation that produces correctly rounded results for multiple representations from 10 to 32 bits and all standard rounding modes.
  3. Performance Optimization: Integer-based range reduction achieves 19% performance improvement compared to floating-point strategies, with overall performance faster than or comparable to mainstream libraries.
  4. Complete Trigonometric Function Library: Provides fast and correct implementations for sin, cos, and tan functions.

Detailed Methodology

Core Concept of RLIBM Method

The key insight of the RLIBM method is to directly approximate the correctly rounded result rather than the real value of the function. For the correctly rounded result of a given input, there exists a real-valued interval within which any value will round to the correct result. This provides greater degrees of freedom than the minimax method (1 ULP for all inputs).

Multi-Representation Support Mechanism

To support multiple representations, the RLIBM project proposes generating polynomial approximations for (n+2)-bit representations using round-to-odd rounding mode. The advantages of this approach are:

  • Round-to-odd results preserve all information needed for direct rounding to the target representation
  • Subsequent rounding to lower-precision representations produces correct results
  • Avoids double rounding errors

Range Reduction Algorithm

Basic Principle

Trigonometric range reduction maps input x∈-∞,∞ to reduced input x'∈-π/2^(t+1), π/2^(t+1), where:

x = x' + kπ/2^t
k = [2^t * x/π]
x' = π/2^t * r, where r = 2^t*x/π - k

Floating-Point Implementation Strategy

Small Input Handling (|x| < 2^30):

  • Uses 80-bit 256/π stored as two double values
  • Avoids intermediate rounding errors
  • Utilizes partial products to precisely compute k and fractional part r

Large Input Handling (2^30 ≤ |x|):

  • Version 1: Divides 256/π into 28-bit segments stored in a double array, with each segment generated using truncation mode
  • Version 2: Uses 53-bit precision segments, leveraging fused-multiply-add instructions to reduce rounding errors

Integer Implementation Strategy

Small Input Optimization:

  • Uses 80-bit 256/π divided into two 40-bit integers P1 and P0
  • Identifies integer k and fractional bits through bit-shift operations
  • Avoids precision loss from floating-point arithmetic

Large Input Handling:

  • Uses 192-bit 256/π divided into three 64-bit integers
  • Computes 128-bit partial products
  • Extracts relevant bits through bit-shift operations

Output Compensation

Utilizes trigonometric identities for output compensation:

sin(x) = sin(k'π/2^t)cos(x') + cos(k'π/2^t)sin(x')
cos(x) = cos(k'π/2^t)cos(x') - sin(k'π/2^t)sin(x')

Through precomputed tables and optimization using periodicity and symmetry, the required precomputed values are reduced to 512.

Experimental Setup

Test Environment

  • Hardware: 2.10GHz Intel Xeon(R) Silver 4310 server, 256GB RAM
  • Operating System: Ubuntu 24.04.1 LTS
  • Measurement Tool: Performance counters

Comparison Libraries

  • GLIBC: float and double libm
  • Core-Math: Correctly rounded library
  • RLIBM Implementation: Variants with different range reduction strategies

Evaluation Metrics

  • Correctness: Verified through complete enumeration of all inputs
  • Performance: Speedup relative to other libraries

Experimental Results

Correctness Verification

  • RLIBM Functions: Produce correctly rounded results for all inputs across all representations from 10 to 32 bits
  • GLIBC float libm: Thousands of incorrect results for sin, cos, tan on 32-bit float inputs
  • GLIBC double libm: More accurate than float version but still contains errors
  • Core-Math: Produces correct results only for 32-bit; fails for 10-32 bit range due to double rounding errors

Performance Results

Range Reduction Optimization Effects

Hybrid method (floating-point for small inputs, integer for large inputs) compared to other strategies:

  • 19% faster than initial floating-point method (FP V1)
  • Significant improvement over alternative floating-point method (FP V2)
  • 4% faster than pure integer method

Comparison with Other Libraries

  • Average 10% faster than Core-Math
  • Average 137% faster than GLIBC double functions
  • Performance improvements primarily attributed to efficient range reduction and precision advantages of integer arithmetic

Technical Innovations

1. Balance Between Precision and Performance

  • Integer arithmetic provides higher precision than 64-bit double (uint64_t and uint128_t)
  • Reduces the number of partial products needed to obtain sufficient precision for reducing inputs

2. Hybrid Range Reduction Strategy

  • Small inputs use floating-point arithmetic (when the integer part of 256*x/π is sufficiently small)
  • Large inputs use integer arithmetic (providing higher precision and simpler bit operations)

3. Bit Operation Optimization

  • Uses bit-shift operations to identify portions of 256*x/π relevant to reduced input and low bits of k
  • Avoids accumulation of rounding errors in floating-point arithmetic

Traditional Methods

  • Minimax Approximation: Remez algorithm and others, but with limited degrees of freedom near rounding boundaries
  • Payne-Hanek Algorithm: Classical range reduction method, but implementation efficiency is challenging

Correctly Rounded Research

  • CR-LIBM: Early correctly rounded library, but with slower performance
  • Core-Math: Modern correctly rounded implementation, but supporting only single representation

RLIBM Project Development

  • Extended from basic functions (e^x, log, etc.) to trigonometric functions
  • Innovative approach for multi-representation support

Conclusions and Discussion

Main Conclusions

  1. Feasibility Proof: Demonstrates that generating fast and correct implementations for trigonometric functions is possible
  2. Criticality of Range Reduction: Efficient range reduction is equally important as low-degree polynomial approximation
  3. Advantages of Integer Arithmetic: Integer-based implementation significantly outperforms floating-point methods for large inputs

Limitations

  1. Complexity: High implementation complexity requiring precise bit operations and multiple strategies
  2. Memory Overhead: Requires precomputed tables and multi-precision constant storage
  3. Scalability: Extension to higher-precision representations requires redesign

Future Directions

  1. GPU Platforms: Explore correctly rounded libraries for GPU platforms
  2. Standardization: Participate in IEEE-754 standards committee to promote mandatory correct rounding
  3. Mainstream Integration: Collaborate with mainstream mathematical library developers to integrate these methods

In-Depth Evaluation

Strengths

  1. Theory and Practice Integration: Successfully applies RLIBM theory to challenging trigonometric functions
  2. Comprehensive Engineering Optimization: Full-spectrum optimization from algorithm to implementation
  3. Rigorous Verification: Correctness verified through complete enumeration
  4. Practical Value: Addresses important problems in real-world applications

Weaknesses

  1. Implementation Complexity: Combination of multiple strategies increases implementation and maintenance complexity
  2. Readability: Readability and maintainability of code with extensive bit operations need improvement
  3. Theoretical Analysis: Lacks in-depth theoretical analysis of why integer methods are superior

Impact

  1. Academic Contribution: Provides new correctly rounded implementation methods for numerical computing
  2. Practical Value: Directly applicable to scientific computing requiring high-precision numerical calculations
  3. Standards Promotion: May influence future development of floating-point standards

Applicable Scenarios

  1. Scientific Computing: Numerical simulations requiring high precision and reproducibility
  2. Financial Computing: Financial modeling requiring exact results
  3. Embedded Systems: Systems requiring support for multiple floating-point formats
  4. Reference Implementation: Serves as correctness benchmark for other libraries

References

This paper cites important literature in numerical analysis, floating-point arithmetic, and correctly rounded computation, including:

  • Muller's reference on elementary functions
  • MPFR high-precision library
  • Payne-Hanek range reduction algorithm
  • IEEE-754 floating-point standard related research

This paper makes significant contributions to the numerical computing field, successfully transforming theoretical methods into practical high-performance implementations, providing effective solutions to the correctly rounded problem in scientific computing.