2025-11-10T02:52:47.563865

RepDL: Bit-level Reproducible Deep Learning Training and Inference

Xie, Zhang, Chen

Non-determinism and non-reproducibility present significant challenges in deep learning, leading to inconsistent results across runs and platforms. These issues stem from two origins: random number generation and floating-point computation. While randomness can be controlled through deterministic configurations, floating-point inconsistencies remain largely unresolved. To address this, we introduce RepDL, an open-source library that ensures deterministic and bitwise-reproducible deep learning training and inference across diverse computing environments. RepDL achieves this by enforcing correct rounding and order invariance in floating-point computation. The source code is available at https://github.com/microsoft/RepDL .

academic

RepDL: Bit-level Reproducible Deep Learning Training and Inference

Basic Information

Paper ID: 2510.09180
Title: RepDL: Bit-level Reproducible Deep Learning Training and Inference
Authors: Peichen Xie, Xian Zhang, Shuo Chen (Microsoft Research)
Classification: cs.LG cs.SE
Publication Date: October 10, 2024
Paper Link: https://arxiv.org/abs/2510.09180
Code Link: https://github.com/microsoft/RepDL

Abstract

Non-determinism and irreproducibility in deep learning lead to inconsistent results across runs and platforms. These issues stem from two root causes: random number generation and floating-point arithmetic. While randomness can be controlled through deterministic configuration, floating-point inconsistency remains inadequately addressed. To this end, the authors introduce RepDL, an open-source library that ensures deterministic and bit-level reproducible deep learning training and inference across different computational environments by enforcing correct rounding and order invariance.

Research Background and Motivation

Problem Definition

Deep learning faces two critical challenges:

Non-determinism: Identical tasks executed repeatedly with identical inputs and systems produce different results (cross-run inconsistency)
Non-reproducibility: Identical tasks executed on different systems produce different results (cross-platform inconsistency)

Problem Significance

These issues result in:

Increased complexity in model deployment and debugging in production environments
Compromised correctness of cross-platform applications
Reduced credibility of published results
Diminished trustworthiness of AI systems in sensitive domains

Limitations of Existing Approaches

Although numerous solutions have been proposed by industry and academia, numerical inconsistency remains an open problem in deep learning. Existing approaches primarily focus on controlling random number generators but provide insufficient solutions to floating-point computation issues.

Research Motivation

The authors identify two root causes of the problem: random number generators and floating-point arithmetic. Compared to random number issues which have relatively well-established solutions, floating-point computation problems are more complex and require specialized solutions.

Core Contributions

Problem Analysis: Systematically analyzes the sources of non-determinism and non-reproducibility in deep learning, categorizing them into two classes: random number generation and floating-point arithmetic
Design Principles: Proposes two core design principles: correct rounding and order invariance
RepDL Library: Develops an open-source library RepDL that achieves bit-level reproducible deep learning training and inference
PyTorch Compatibility: Provides PyTorch-compatible APIs supporting deep learning operations, differentiable functions, neural network modules, and optimizers

Methodology Details

Root Cause Analysis

1. Random Number Generators

Applications: Weight initialization, data shuffling, dropout regularization, data augmentation
Issues: Different seeds, inconsistent RNG algorithms, non-deterministic call sequences in multi-threaded environments
Solutions: Employ reproducible RNG algorithms (e.g., MT19937), thread-safe implementation, fixed base seed

2. Floating-Point Arithmetic

A more complex issue, divided into two subcategories:

2.1 Basic Operation Precision

Varying precision of basic mathematical function implementations across systems
Hardware instruction precision differences (e.g., RCP instruction precision variations among x86 CPUs)

2.2 Computation Order

Order sensitivity caused by non-associativity of floating-point operations
Non-deterministic factors: atomic operations, dynamic code paths, dynamic batching, and caching
Non-reproducibility factors: software variability, compiler optimizations

RepDL Design Principles

Principle 1: Correct Rounding for Basic Operations

Adheres to IEEE-754 standard correct rounding principles
Applies standard IEEE-754 rounding rules to infinitely precise real number results
Eliminates ambiguity in numerical precision

Principle 2: Order Invariance for Composite Operations

Maintains order invariance for combinations of basic operations
Implements each operation using identical types of basic operations in identical order
Assigns different APIs for different computation orders

Implementation Details

1. Ensuring Correct Rounding

Utilizes correctly rounded mathematical libraries or high-precision algorithms
Implements correctly rounded versions of arithmetic operations, square root, exponential, logarithmic functions, etc.
Avoids hardware-dependent implementation variations

2. Fixed Summation Order

Provides two summation orders:

Sequential Summation: Default version, cache-friendly, suitable for most cases
Pairwise Summation: Alternative version, increases parallelism

For fully connected and 2D convolutional layers:

Fully connected layer: t_fc = B × M independent summation tasks, each summing n_fc = N elements
Convolutional layer: t_conv = B × O × W × H independent summation tasks, each summing n_conv = I × K_w × K_h elements

3. Computation Graph Definition

Explicitly defines computation order using computation graphs
Assigns different API names to different computation graph implementations of identical functions
Avoids mathematically equivalent but floating-point-different transformations

4. Compilation Options

Disables options causing unsafe mathematical optimizations
Enables floating-point expression contraction options (FMA operations)

Experimental Setup

Supported Features

Data Types: Support for single-precision floating-point (float32)
Compatibility: Provides PyTorch-compatible APIs
Operation Support: Deep learning operations, differentiable functions, neural network modules, optimizers

Performance Analysis

Using ResNet-50 as an example:

Convolutional layers dominate computational complexity
Multiple convolutional layers with t_conv = B × 256 × 56 × 56 = B × 802816
NVIDIA A100 GPU contains 6912 CUDA cores
Even with B=1, core count is far less than t_conv, making sequential summation efficient

Experimental Results

Reproducibility Verification

RepDL achieves bit-level consistent results, ensuring:

Consistency across multiple executions on the same system
Consistency across different CPU or GPU systems
Complete reproducibility of training and inference processes

Performance Impact

Switching from non-deterministic libraries to RepDL incurs minor performance degradation
Performance degradation is acceptable, with future optimizations potentially mitigating it

The paper references multiple related research areas:

Reproducible Floating-Point Summation Algorithms: Ahrens et al.'s order-independent summation algorithms
Deep Learning Reproducibility: Chen et al.'s work on training reproducible deep learning models
Correctly Rounded Mathematical Libraries: MPFR library and high-performance correctly rounded mathematical libraries
Numerical Precision Analysis: Research on mathematical function accuracy across different precisions

Conclusions and Discussion

Main Conclusions

RepDL addresses floating-point computation issues, providing a foundation for reliable model development and consistent model deployment. The library successfully achieves deterministic and reproducible deep learning across different computational environments.

Limitations

Insufficient Performance Optimization: Current version lacks complete optimization with performance losses
Limited Precision Support: Supports only single-precision (float32); low-precision type support presents challenges
Hardware Specificity: Non-standard and hardware-specific characteristics of low-precision computation (e.g., Tensor Cores)

Future Directions

Further performance optimization to mitigate performance degradation
Support for low-precision floating-point data types
Standardization of numerical behavior for low-precision computation
Expansion of community contributions and features

In-Depth Evaluation

Strengths

Accurate Problem Identification: Systematically analyzes root causes of deep learning reproducibility issues
Practical Solutions: Provides feasible engineering solutions rather than purely theoretical analysis
Clear Design Principles: Two principles—correct rounding and order invariance—are concise and effective
Good Compatibility: PyTorch API compatibility lowers adoption barriers
Open-Source Contribution: Provides open-source implementation promoting community development

Weaknesses

Limited Experimental Validation: Lacks large-scale experimental verification and performance benchmarking
Insufficient Theoretical Analysis: Theoretical analysis of performance losses is inadequate
Restricted Applicability: Float32-only support limits modern deep learning applications
Missing Comparative Experiments: Lacks comparison with other reproducibility solutions

Impact

Academic Value: Provides important reference for deep learning reproducibility research
Practical Value: Offers solutions for application scenarios requiring strict reproducibility
Industry Impact: May promote deep learning frameworks' attention to reproducibility

Applicable Scenarios

Scientific Research: Research projects requiring strictly reproducible results
Financial AI: Financial applications with extremely high numerical consistency requirements
Medical AI: Medical diagnostic systems requiring deterministic results
Model Verification: Cross-platform model deployment consistency verification

References

The paper cites 15 related references covering:

Reproducible floating-point summation algorithms
Deep learning reproducibility research
Correctly rounded mathematical libraries
IEEE floating-point standards
Analysis of randomness and non-determinism in deep learning

Overall Assessment: This is a practical research paper addressing reproducibility issues in deep learning. While it has limitations in experimental validation and theoretical analysis, its proposed solution has significant practical value, particularly for application scenarios requiring strict numerical consistency. The open-source release of the RepDL library provides valuable tools to the community and is expected to advance research in deep learning reproducibility.