2025-11-12T12:19:10.304562

Fair Ordering

Haseeb, Geng, Mittal et al.

A growing class of applications demands \emph{fair ordering/sequencing} of events which ensures that events generated earlier by one client are processed before later events from other clients. However, achieving such sequencing is fundamentally challenging due to the inherent limitations of clock synchronization. We advocate for an approach that embraces, rather than eliminates, clock variability. Instead of attempting to remove error from a timestamp, Tommy, our proposed system, leverages a statistical model to compare two noisy timestamps probabilistically by learning per-clock offset distributions. Our preliminary statistical model computes the probability that one event precedes another w.r.t. the wall-clock time without access to the wall-clock. This serves as a foundation for a new relation: \emph{likely-happened-before} denoted by $\xrightarrow{p}$ where $p$ represents the probability of an event to have happened before another. The $\xrightarrow{p}$ relation provides a basis for ordering multiple events which are otherwise considered \emph{concurrent} by the typical \emph{happened-before} ($\rightarrow$) relation. We highlight various related challenges including intransitivity of $\xrightarrow{p}$ relation as opposed to the transitive $\rightarrow$ relation. We also outline several research directions: online fair sequencing, stochastically fair total ordering, host-level support for fairness and more.

academic

Beyond Lamport, Towards Probabilistic Fair Ordering

Basic Information

Paper ID: 2510.13664
Title: Beyond Lamport, Towards Probabilistic Fair Ordering
Authors: Muhammad Haseeb, Jinkun Geng, Radhika Mittal, Aurojit Panda, Srinivas Narayana, Anirudh Sivaraman
Classification: cs.NI (Computer Networks)
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13664v1

Abstract

This paper addresses the challenge of fair ordering in distributed systems by proposing a novel approach that embraces rather than eliminates clock variability. The authors design the Tommy system, which learns the offset distribution of each clock and employs statistical models for probabilistic comparison of noisy timestamps. The core innovation is the introduction of the "likely-happened-before" relation ( $\xrightarrow{p}$ ), where $p$ represents the probability that one event occurred before another. This relation enables ordering of events that the traditional happened-before relation ( $\rightarrow$ ) considers concurrent, but also introduces new challenges such as non-transitivity.

Research Background and Motivation

1. Problem Definition

Fair ordering in distributed systems requires ensuring that events generated earlier by one client are processed before events generated later by other clients. This is crucial in competitive applications such as financial exchanges and advertising exchanges, collectively termed "auction-apps."

2. Problem Significance

Financial Fairness: In financial transactions, the order of message processing directly affects transaction outcomes; unfair ordering can result in substantial economic losses
Cloud Migration Requirements: Traditional financial exchanges use dedicated infrastructure to ensure fairness, but migration to public clouds necessitates new network primitives
Expanding Application Scope: Beyond financial trading, competitive markets, NFT trading, and limited-quantity product purchasing scenarios all require fair ordering

3. Limitations of Existing Approaches

Perfect Clock Synchronization Assumption: Existing solutions assume near-perfect clock synchronization, which is infeasible in practical asynchronous networks
Special Infrastructure Dependency: Traditional approaches require specialized hardware such as equal-length cables and low-jitter switches
Application Coupling: Some solutions are tightly coupled with specific application logic, making generalization difficult

4. Fundamental Challenges

Basic limitations of clock synchronization: In asynchronous or bounded-synchronous networks, the clock synchronization precision among $n$ processes cannot exceed $u(1-1/n)$ , where $u$ represents the uncertainty in link delays.

Core Contributions

New Relation Definition: Introduces the likely-happened-before relation ( $\xrightarrow{p}$ ), extending Lamport's happened-before relation
Statistical Model: Designs a probabilistic timestamp comparison method based on clock offset distributions
Tommy System: Implements a prototype fair ordering system without requiring perfect clock synchronization
Theoretical Analysis: Proves transitivity of probabilistic relations under Gaussian distributions
Research Directions: Proposes multiple research directions including online fair ordering and randomized fair total ordering

Method Details

Task Definition

Fair Ordering Definition: The order in which the server observes messages should match the order observed by an omniscient observer.

Input: Client message streams with local timestamps Output: Message batches fairly ordered by generation time Constraints: No access to global clocks; only best-effort clock synchronization available

Model Architecture

1. System Model

Each client $i$ sends a message with timestamp $T_i$
True timestamp: $T_i^* = T_i + \theta_i$ , where $\theta_i$ is the clock offset
Offset $\theta_i$ follows probability distribution $f_{\theta_i}$

2. Probability Calculation

Probability of temporal ordering between two events: $P(T_i^* < T_j^* | T_i, T_j) = P(\theta_j - \theta_i > T_i - T_j)$

Define $\Delta\theta = \theta_j - \theta_i \sim f_{\Delta\theta}$ , then: $P(T_i^* < T_j^* | T_i, T_j) = \int_{T_i-T_j}^{\infty} f_{\Delta\theta}d\Delta$

3. Closed-Form Solution for Gaussian Case

For independently distributed Gaussian errors: $P(T_i^* < T_j^* | T_i, T_j) = \Phi\left(\frac{T_j-T_i+(\mu_i-\mu_j)}{\sqrt{\sigma_i^2+\sigma_j^2}}\right)$

where $\Phi(x)$ is the cumulative distribution function of the standard normal distribution.

4. Arbitrary Distribution Handling

Client Learning: Each client learns its own offset distribution $f_{\theta_i}$
Convolution Computation: The ordering system computes $f_{\Delta\theta}$ via convolution
FFT Optimization: Uses fast Fourier transform to optimize convolution computation

Technical Innovations

1. Probabilistic Relation Construction

Models messages as nodes in a graph, with $\xrightarrow{p}$ relations as weighted directed edges. For each pair of nodes, retains the edge with higher weight.

2. Fair Ordering Algorithm

Transitive Case: Graph forms a transitive tournament with a unique topological sort
Non-Transitive Case: May contain cycles, requiring edge deletion or probability adjustment

3. Batch Formation

Creates batch boundaries according to threshold $threshold$ :

If $i \xrightarrow{p} j$ and $p > threshold$ , creates a batch boundary between $i$ and $j$
Messages whose ordering cannot be confidently determined are grouped into the same batch

4. Online Ordering

Completeness Guarantee: Waits for all clients to send timestamps greater than $t$
Safe Emission: Computes future time $T_i^B$ such that $P(T_i^* < T_i^B) > p_{safe}$

Experimental Setup

Dataset

Simulation Environment: 500 clients, each assigned a Gaussian clock offset distribution $\mathcal{N}(\mu, \sigma^2)$
Timestamp Generation: Clients read wall-clock time $t$ , sample noise $\epsilon$ , and mark messages as $T = t + \epsilon$
Ground Truth Collection: Collects ground truth timestamps for evaluation

Evaluation Metrics

Ranking Agreement Score (RAS):

Correctly ordered message pairs: +1 point
Incorrectly ordered message pairs: -1 point
Indifferent (same batch) message pairs: 0 points

Baseline Methods

TrueTime Baseline: Simulates Spanner TrueTime, assigning each message an uncertainty interval $[T-3\sigma, T+3\sigma]$ ; overlapping intervals are assigned the same rank.

Implementation Details

Threshold setting: $threshold = 0.75$
Evaluation mode: Offline ordering (ordering after all messages arrive)
Controlled variables: Clock error standard deviation, message interval

Experimental Results

Main Results

Figure 5 shows Tommy's performance relative to TrueTime:

Low Clock Error: Both systems perform comparably
High Clock Error or Small Message Interval: Tommy significantly outperforms TrueTime
Extreme Uncertainty: Tommy's probabilistic nature may result in negative RAS, while TrueTime maintains conservative 0 scores

Key Findings

Adaptive Advantage: Tommy performs better when clock synchronization quality is poor
Probabilistic Cost: High uncertainty may lead to incorrect ordering
Threshold Trade-off: Threshold selection affects batch size and confidence in fairness

Cloud Exchanges

Onyx: WFO ordering system assuming negligible clock synchronization error
CloudEx, DBO: Financial exchanges for cloud environments, but rely on strong assumptions

Traditional Exchanges

Engineering solutions using equal-length cables and low-jitter switches, where processing by arrival order is equivalent to ordering by generation time.

Byzantine Fault Tolerance

Pompe: Consensus mechanism limiting Byzantine nodes' influence on event ordering, but cannot enforce fair ordering.

Theoretical Analysis

Transitivity Proof

Proposition 1: For independent normal random variables $X \sim \mathcal{N}(\mu_X, \sigma_X^2)$ , $Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2)$ , $Z \sim \mathcal{N}(\mu_Z, \sigma_Z^2)$ , define preference relation $X \succ Y \Leftrightarrow \Pr[X > Y] > \frac{1}{2}$ , then $\succ$ is transitive.

Proof Outline: $\Pr[A > B] > \frac{1}{2} \Leftrightarrow \mu_A > \mu_B$

Since mean values are transitive, the probabilistic relation is also transitive.

Non-Transitivity Challenges

For arbitrary distributions, transitivity does not necessarily hold, and cycles like $A \succ B$ , $B \succ C$ , $C \succ A$ may occur, similar to the "rock-paper-scissors" phenomenon.

Future Research Directions

1. Relation Characterization

Study distributional conditions that make $\xrightarrow{p}$ relations transitive
Develop transformation methods for handling non-transitivity

2. Host Network Variability

Investigate the impact of host datapath jitter on clock access and message transmission delays.

3. Fair Total Ordering Extension

Extend fair partial ordering to fair total ordering; study randomized fairness mechanisms.

4. Clock Offset Learning

Develop robust mechanisms for learning clock offset distributions, handling anomalous conditions such as temperature fluctuations.

5. Byzantine Clients

Study fairness guarantees under Byzantine failures, preventing timestamp manipulation attacks.

Conclusions and Discussion

Main Conclusions

Conceptual Innovation: The likely-happened-before relation provides a new perspective for ordering concurrent events
Practical Value: Tommy outperforms traditional methods under realistic clock synchronization conditions
Theoretical Foundation: Transitivity under Gaussian distributions provides theoretical support for the method

Limitations

Transitivity Assumption: Current design assumes transitivity; non-transitive cases require further research
Offline Evaluation: Experiments only evaluate offline ordering; online performance requires verification
Distribution Assumption: Gaussian distribution assumption may not apply to all practical scenarios
Byzantine Fault Tolerance: Lacks protection mechanisms against malicious clients

Impact Assessment

Strengths

Theoretical Contribution: Extends the classical happened-before relation
Practical Orientation: Addresses real-world problems in cloud migration
General Framework: Supports arbitrary clock offset distributions
Performance Advantage: Outperforms existing methods under realistic conditions

Weaknesses

Incomplete Treatment: Non-transitivity handling is not sufficiently comprehensive
Security Analysis: Lacks in-depth security threat analysis
Experimental Limitations: Only simulation experiments; lacks real system validation

Applicable Scenarios

Multi-datacenter financial trading systems
Bidding systems with strict fairness requirements
Applications requiring fair ordering on generic network infrastructure

Research Value

This paper pioneering introduces probabilistic fair ordering, providing a new solution direction for fairness problems in distributed systems. Although facing some theoretical and implementation challenges, its core ideas possess significant academic value and practical potential, laying the foundation for subsequent research.

References

Key references include:

Lamport, L. "Time, clocks, and the ordering of events in a distributed system" (Classical happened-before relation)
Corbett, J.C. et al. "Spanner: Google's globally distributed database" (TrueTime mechanism)
Ghalayini, A. et al. "Cloudex: A fair-access financial exchange in the cloud" (Cloud exchange related work)