Fast Trigonometric Functions using the RLIBM Approach
Park, Nagarakatte
This paper describes our experience developing polynomial approximations for trigonometric functions that produce correctly rounded results for multiple representations and rounding modes using the RLIBM approach. A key challenge with trigonometric functions concerns range reduction with "pi", which reduces a given input in the domain of a 32-bit float to a small domain. Any rounding error in the value of "pi" is amplified during range reduction, which can result in wrong results. We describe our experience implementing fast range reduction techniques that maintain a large number of bits of "pi" both with floating-point and integer computations. The resulting implementations for trigonometric functions are fast and produce correctly rounded results for all inputs for multiple representations up to 32-bits with a single implementation.
academic
Fast Trigonometric Functions using the RLIBM Approach
This paper describes the experience of developing polynomial approximations for trigonometric functions using the RLIBM method, which produces correctly rounded results for multiple representations and rounding modes. The key challenge in trigonometric functions lies in range reduction involving π, which reduces inputs from the 32-bit floating-point domain to a small domain. Any rounding error in the π value is amplified during the range reduction process, potentially leading to incorrect results. The authors describe their experience implementing fast range reduction techniques that maintain numerous bits of π in both floating-point and integer computations. The resulting trigonometric function implementations are both fast and produce correctly rounded results for all inputs, supporting multiple representations up to 32 bits with only a single implementation.
Correctly Rounded Challenge: Scientific computing widely uses basic functions provided by mathematical libraries, but producing correctly rounded results for all inputs is extremely difficult (the "table maker's dilemma"), and mainstream mathematical libraries cannot produce correct results for all inputs.
Portability and Reproducibility Issues: The lack of correctly rounded mathematical libraries causes applications to produce completely different results on different machines, affecting portability and reproducibility.
Demand for Multiple Representation Formats: With the increase of custom formats (such as bfloat16, tensorfloat32, FP8), there is a need for a reference library that provides correct results for multiple representations and rounding modes.
Minimax Polynomial Approximation: Traditional methods generate polynomial approximations that minimize the maximum error across all inputs, but when the real-valued output is very close to the rounding boundary, degrees of freedom are significantly reduced.
Performance vs. Correctness Trade-off: Existing libraries make trade-offs in either performance (e.g., Payne-Hanek implementation) or correctness (e.g., GCC's libm).
Efficient Range Reduction Technique: Developed an efficient range reduction algorithm combining floating-point and integer arithmetic that maintains sufficient bits of π to produce correct results.
Single Implementation for Multiple Representations: Implemented a single polynomial approximation that produces correctly rounded results for multiple representations from 10 to 32 bits and all standard rounding modes.
Performance Optimization: Integer-based range reduction achieves 19% performance improvement compared to floating-point strategies, with overall performance faster than or comparable to mainstream libraries.
Complete Trigonometric Function Library: Provides fast and correct implementations for sin, cos, and tan functions.
The key insight of the RLIBM method is to directly approximate the correctly rounded result rather than the real value of the function. For the correctly rounded result of a given input, there exists a real-valued interval within which any value will round to the correct result. This provides greater degrees of freedom than the minimax method (1 ULP for all inputs).
To support multiple representations, the RLIBM project proposes generating polynomial approximations for (n+2)-bit representations using round-to-odd rounding mode. The advantages of this approach are:
Round-to-odd results preserve all information needed for direct rounding to the target representation
Subsequent rounding to lower-precision representations produces correct results
This paper cites important literature in numerical analysis, floating-point arithmetic, and correctly rounded computation, including:
Muller's reference on elementary functions
MPFR high-precision library
Payne-Hanek range reduction algorithm
IEEE-754 floating-point standard related research
This paper makes significant contributions to the numerical computing field, successfully transforming theoretical methods into practical high-performance implementations, providing effective solutions to the correctly rounded problem in scientific computing.