2025-11-12T12:19:10.304562

Fair Ordering

Haseeb, Geng, Mittal et al.
A growing class of applications demands \emph{fair ordering/sequencing} of events which ensures that events generated earlier by one client are processed before later events from other clients. However, achieving such sequencing is fundamentally challenging due to the inherent limitations of clock synchronization. We advocate for an approach that embraces, rather than eliminates, clock variability. Instead of attempting to remove error from a timestamp, Tommy, our proposed system, leverages a statistical model to compare two noisy timestamps probabilistically by learning per-clock offset distributions. Our preliminary statistical model computes the probability that one event precedes another w.r.t. the wall-clock time without access to the wall-clock. This serves as a foundation for a new relation: \emph{likely-happened-before} denoted by $\xrightarrow{p}$ where $p$ represents the probability of an event to have happened before another. The $\xrightarrow{p}$ relation provides a basis for ordering multiple events which are otherwise considered \emph{concurrent} by the typical \emph{happened-before} ($\rightarrow$) relation. We highlight various related challenges including intransitivity of $\xrightarrow{p}$ relation as opposed to the transitive $\rightarrow$ relation. We also outline several research directions: online fair sequencing, stochastically fair total ordering, host-level support for fairness and more.
academic

Beyond Lamport, Towards Probabilistic Fair Ordering

Basic Information

  • Paper ID: 2510.13664
  • Title: Beyond Lamport, Towards Probabilistic Fair Ordering
  • Authors: Muhammad Haseeb, Jinkun Geng, Radhika Mittal, Aurojit Panda, Srinivas Narayana, Anirudh Sivaraman
  • Classification: cs.NI (Computer Networks)
  • Publication Date: October 15, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.13664v1

Abstract

This paper addresses the challenge of fair ordering in distributed systems by proposing a novel approach that embraces rather than eliminates clock variability. The authors design the Tommy system, which learns the offset distribution of each clock and employs statistical models for probabilistic comparison of noisy timestamps. The core innovation is the introduction of the "likely-happened-before" relation (p\xrightarrow{p}), where pp represents the probability that one event occurred before another. This relation enables ordering of events that the traditional happened-before relation (\rightarrow) considers concurrent, but also introduces new challenges such as non-transitivity.

Research Background and Motivation

1. Problem Definition

Fair ordering in distributed systems requires ensuring that events generated earlier by one client are processed before events generated later by other clients. This is crucial in competitive applications such as financial exchanges and advertising exchanges, collectively termed "auction-apps."

2. Problem Significance

  • Financial Fairness: In financial transactions, the order of message processing directly affects transaction outcomes; unfair ordering can result in substantial economic losses
  • Cloud Migration Requirements: Traditional financial exchanges use dedicated infrastructure to ensure fairness, but migration to public clouds necessitates new network primitives
  • Expanding Application Scope: Beyond financial trading, competitive markets, NFT trading, and limited-quantity product purchasing scenarios all require fair ordering

3. Limitations of Existing Approaches

  • Perfect Clock Synchronization Assumption: Existing solutions assume near-perfect clock synchronization, which is infeasible in practical asynchronous networks
  • Special Infrastructure Dependency: Traditional approaches require specialized hardware such as equal-length cables and low-jitter switches
  • Application Coupling: Some solutions are tightly coupled with specific application logic, making generalization difficult

4. Fundamental Challenges

Basic limitations of clock synchronization: In asynchronous or bounded-synchronous networks, the clock synchronization precision among nn processes cannot exceed u(11/n)u(1-1/n), where uu represents the uncertainty in link delays.

Core Contributions

  1. New Relation Definition: Introduces the likely-happened-before relation (p\xrightarrow{p}), extending Lamport's happened-before relation
  2. Statistical Model: Designs a probabilistic timestamp comparison method based on clock offset distributions
  3. Tommy System: Implements a prototype fair ordering system without requiring perfect clock synchronization
  4. Theoretical Analysis: Proves transitivity of probabilistic relations under Gaussian distributions
  5. Research Directions: Proposes multiple research directions including online fair ordering and randomized fair total ordering

Method Details

Task Definition

Fair Ordering Definition: The order in which the server observes messages should match the order observed by an omniscient observer.

Input: Client message streams with local timestamps Output: Message batches fairly ordered by generation time Constraints: No access to global clocks; only best-effort clock synchronization available

Model Architecture

1. System Model

  • Each client ii sends a message with timestamp TiT_i
  • True timestamp: Ti=Ti+θiT_i^* = T_i + \theta_i, where θi\theta_i is the clock offset
  • Offset θi\theta_i follows probability distribution fθif_{\theta_i}

2. Probability Calculation

Probability of temporal ordering between two events: P(Ti<TjTi,Tj)=P(θjθi>TiTj)P(T_i^* < T_j^* | T_i, T_j) = P(\theta_j - \theta_i > T_i - T_j)

Define Δθ=θjθifΔθ\Delta\theta = \theta_j - \theta_i \sim f_{\Delta\theta}, then: P(Ti<TjTi,Tj)=TiTjfΔθdΔP(T_i^* < T_j^* | T_i, T_j) = \int_{T_i-T_j}^{\infty} f_{\Delta\theta}d\Delta

3. Closed-Form Solution for Gaussian Case

For independently distributed Gaussian errors: P(Ti<TjTi,Tj)=Φ(TjTi+(μiμj)σi2+σj2)P(T_i^* < T_j^* | T_i, T_j) = \Phi\left(\frac{T_j-T_i+(\mu_i-\mu_j)}{\sqrt{\sigma_i^2+\sigma_j^2}}\right)

where Φ(x)\Phi(x) is the cumulative distribution function of the standard normal distribution.

4. Arbitrary Distribution Handling

  • Client Learning: Each client learns its own offset distribution fθif_{\theta_i}
  • Convolution Computation: The ordering system computes fΔθf_{\Delta\theta} via convolution
  • FFT Optimization: Uses fast Fourier transform to optimize convolution computation

Technical Innovations

1. Probabilistic Relation Construction

Models messages as nodes in a graph, with p\xrightarrow{p} relations as weighted directed edges. For each pair of nodes, retains the edge with higher weight.

2. Fair Ordering Algorithm

  • Transitive Case: Graph forms a transitive tournament with a unique topological sort
  • Non-Transitive Case: May contain cycles, requiring edge deletion or probability adjustment

3. Batch Formation

Creates batch boundaries according to threshold thresholdthreshold:

  • If ipji \xrightarrow{p} j and p>thresholdp > threshold, creates a batch boundary between ii and jj
  • Messages whose ordering cannot be confidently determined are grouped into the same batch

4. Online Ordering

  • Completeness Guarantee: Waits for all clients to send timestamps greater than tt
  • Safe Emission: Computes future time TiBT_i^B such that P(Ti<TiB)>psafeP(T_i^* < T_i^B) > p_{safe}

Experimental Setup

Dataset

  • Simulation Environment: 500 clients, each assigned a Gaussian clock offset distribution N(μ,σ2)\mathcal{N}(\mu, \sigma^2)
  • Timestamp Generation: Clients read wall-clock time tt, sample noise ϵ\epsilon, and mark messages as T=t+ϵT = t + \epsilon
  • Ground Truth Collection: Collects ground truth timestamps for evaluation

Evaluation Metrics

Ranking Agreement Score (RAS):

  • Correctly ordered message pairs: +1 point
  • Incorrectly ordered message pairs: -1 point
  • Indifferent (same batch) message pairs: 0 points

Baseline Methods

TrueTime Baseline: Simulates Spanner TrueTime, assigning each message an uncertainty interval [T3σ,T+3σ][T-3\sigma, T+3\sigma]; overlapping intervals are assigned the same rank.

Implementation Details

  • Threshold setting: threshold=0.75threshold = 0.75
  • Evaluation mode: Offline ordering (ordering after all messages arrive)
  • Controlled variables: Clock error standard deviation, message interval

Experimental Results

Main Results

Figure 5 shows Tommy's performance relative to TrueTime:

  • Low Clock Error: Both systems perform comparably
  • High Clock Error or Small Message Interval: Tommy significantly outperforms TrueTime
  • Extreme Uncertainty: Tommy's probabilistic nature may result in negative RAS, while TrueTime maintains conservative 0 scores

Key Findings

  1. Adaptive Advantage: Tommy performs better when clock synchronization quality is poor
  2. Probabilistic Cost: High uncertainty may lead to incorrect ordering
  3. Threshold Trade-off: Threshold selection affects batch size and confidence in fairness

Cloud Exchanges

  • Onyx: WFO ordering system assuming negligible clock synchronization error
  • CloudEx, DBO: Financial exchanges for cloud environments, but rely on strong assumptions

Traditional Exchanges

Engineering solutions using equal-length cables and low-jitter switches, where processing by arrival order is equivalent to ordering by generation time.

Byzantine Fault Tolerance

Pompe: Consensus mechanism limiting Byzantine nodes' influence on event ordering, but cannot enforce fair ordering.

Theoretical Analysis

Transitivity Proof

Proposition 1: For independent normal random variables XN(μX,σX2)X \sim \mathcal{N}(\mu_X, \sigma_X^2), YN(μY,σY2)Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2), ZN(μZ,σZ2)Z \sim \mathcal{N}(\mu_Z, \sigma_Z^2), define preference relation XYPr[X>Y]>12X \succ Y \Leftrightarrow \Pr[X > Y] > \frac{1}{2}, then \succ is transitive.

Proof Outline: Pr[A>B]>12μA>μB\Pr[A > B] > \frac{1}{2} \Leftrightarrow \mu_A > \mu_B

Since mean values are transitive, the probabilistic relation is also transitive.

Non-Transitivity Challenges

For arbitrary distributions, transitivity does not necessarily hold, and cycles like ABA \succ B, BCB \succ C, CAC \succ A may occur, similar to the "rock-paper-scissors" phenomenon.

Future Research Directions

1. Relation Characterization

  • Study distributional conditions that make p\xrightarrow{p} relations transitive
  • Develop transformation methods for handling non-transitivity

2. Host Network Variability

Investigate the impact of host datapath jitter on clock access and message transmission delays.

3. Fair Total Ordering Extension

Extend fair partial ordering to fair total ordering; study randomized fairness mechanisms.

4. Clock Offset Learning

Develop robust mechanisms for learning clock offset distributions, handling anomalous conditions such as temperature fluctuations.

5. Byzantine Clients

Study fairness guarantees under Byzantine failures, preventing timestamp manipulation attacks.

Conclusions and Discussion

Main Conclusions

  1. Conceptual Innovation: The likely-happened-before relation provides a new perspective for ordering concurrent events
  2. Practical Value: Tommy outperforms traditional methods under realistic clock synchronization conditions
  3. Theoretical Foundation: Transitivity under Gaussian distributions provides theoretical support for the method

Limitations

  1. Transitivity Assumption: Current design assumes transitivity; non-transitive cases require further research
  2. Offline Evaluation: Experiments only evaluate offline ordering; online performance requires verification
  3. Distribution Assumption: Gaussian distribution assumption may not apply to all practical scenarios
  4. Byzantine Fault Tolerance: Lacks protection mechanisms against malicious clients

Impact Assessment

Strengths

  1. Theoretical Contribution: Extends the classical happened-before relation
  2. Practical Orientation: Addresses real-world problems in cloud migration
  3. General Framework: Supports arbitrary clock offset distributions
  4. Performance Advantage: Outperforms existing methods under realistic conditions

Weaknesses

  1. Incomplete Treatment: Non-transitivity handling is not sufficiently comprehensive
  2. Security Analysis: Lacks in-depth security threat analysis
  3. Experimental Limitations: Only simulation experiments; lacks real system validation

Applicable Scenarios

  • Multi-datacenter financial trading systems
  • Bidding systems with strict fairness requirements
  • Applications requiring fair ordering on generic network infrastructure

Research Value

This paper pioneering introduces probabilistic fair ordering, providing a new solution direction for fairness problems in distributed systems. Although facing some theoretical and implementation challenges, its core ideas possess significant academic value and practical potential, laying the foundation for subsequent research.

References

Key references include:

  • Lamport, L. "Time, clocks, and the ordering of events in a distributed system" (Classical happened-before relation)
  • Corbett, J.C. et al. "Spanner: Google's globally distributed database" (TrueTime mechanism)
  • Ghalayini, A. et al. "Cloudex: A fair-access financial exchange in the cloud" (Cloud exchange related work)