2025-11-11T16:25:09.674123

Multi-Way Co-Ranking: Index-Space Partitioning of Sorted Sequences Without Merge

Joshi

We present a merge-free algorithm for multi-way co-ranking, the problem of computing cut indices $i_1,\dots,i_m$ that partition each of the $m$ sorted sequences such that all prefix segments together contain exactly $K$ elements. Our method extends two-list co-ranking to arbitrary $m$, maintaining per-sequence bounds that converge to a consistent global frontier without performing any multi-way merge or value-space search. Rather, we apply binary search to \emph{index-space}. The algorithm runs in $O(\log(\sum_t n_t)\,\log m)$ time and $O(m)$ space, independent of $K$. We prove correctness via an exchange argument and discuss applications to distributed fractional knapsack, parallel merge partitioning, and multi-stream joins. Keywords: Co-ranking \sep partitioning \sep Merge-free algorithms \sep Index-space optimization \sep Selection and merging \sep Data structures

academic

Multi-Way Co-Ranking: Index-Space Partitioning of Sorted Sequences Without Merge

Basic Information

Paper ID: 2510.22882
Title: Multi-Way Co-Ranking: Index-Space Partitioning of Sorted Sequences Without Merge
Author: Amit Joshi (Independent Researcher)
Classification: cs.DS (Data Structures and Algorithms)
Publication Date: October 27, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.22882

Abstract

This paper proposes a merge-free multi-way co-ranking algorithm for computing cutting indices $i_1,\dots,i_m$ that partition $m$ sorted sequences such that all prefix segments collectively contain exactly $K$ elements. The method extends Siebert and Träff's two-list co-ranking to arbitrary $m$ -way partitioning, maintaining boundaries for each sequence and converging to a consistent global frontier without performing multi-way merge or value-space search. The algorithm applies binary search in index space with time complexity $O(\log(\sum_t n_t)\log m)$ and space complexity $O(m)$ , independent of $K$ . Correctness is established through exchange arguments, and applications in distributed fractional knapsack, parallel merge partitioning, and multi-stream joins are discussed.

Research Background and Motivation

Problem Definition

The multi-way co-ranking problem is defined as follows: Given $m$ sequences $L_1, \ldots, L_m$ sorted in non-decreasing order (duplicates allowed), each of length $n_t$ , and a global target rank $K \in \{0, \ldots, N\}$ (where $N = \sum_t n_t$ ), find cutting indices $i_1, \ldots, i_m$ such that:

$\sum_{t=1}^m i_t = K \quad \text{and} \quad \max_t \ell_t \leq \min_t r_t$

where $\ell_t$ and $r_t$ denote the left and right boundary values, respectively.

Research Motivation

Extending Classical Algorithms: Existing co-ranking algorithms primarily target two sequences, lacking efficient multi-way extensions
Avoiding Merge Overhead: Traditional methods require merging multiple sequences before selection, incurring substantial overhead
Index-Space Advantages: Operating in index space rather than value space avoids the complexity of value-domain search
Practical Application Demands: Distributed computing, parallel processing, and database queries all require efficient multi-way partitioning algorithms

Limitations of Existing Methods

Siebert-Träff Method: Supports only two-sequence co-ranking
Frederickson-Johnson Method: Operates in value space, requiring global counting operations
Splitter-Based Methods: Require pre-merging or value-domain search, with higher complexity

Core Contributions

Algorithm Design: Proposes the first merge-free multi-way co-ranking algorithm, extending the classical two-way method to arbitrary $m$ -way partitioning
Theoretical Analysis: Proves algorithm correctness and $O(\log(\sum_t n_t)\log m)$ time complexity
Data Structure Innovation: Designs indexed heaps (addressable heaps) to efficiently maintain boundary values
Application Extension: Demonstrates algorithm applicability in distributed optimization, parallel processing, and database systems

Method Details

Task Definition

Input:

$m$ sorted sequences $L_1, \ldots, L_m$ with lengths $n_1, \ldots, n_m$ respectively
Target rank $K \in [0, N]$ , where $N = \sum_{t=1}^m n_t$

Output:

Cutting index vector $(i_1, \ldots, i_m)$ satisfying co-ranking conditions

Constraints:

$\sum_{t=1}^m i_t = K$
$\max_t \ell_t \leq \min_t r_t$ (co-ranking condition)

Algorithm Architecture

Core Data Structure: Index Heaps

The algorithm maintains two index heaps:

$H_L$ : Max-heap storing left boundary values $(\ell_t, t)$ , returning the sequence with maximum left boundary (donor)
$H_R$ : Min-heap storing right boundary values $(r_t, t)$ , returning the sequence with minimum right boundary (receiver)

Each heap supports $O(\log m)$ update_key operations and $O(1)$ peek operations.

Boundary Management

For each sequence $t$ , maintain:

Lower Bound: $Lb[t] \leq i[t]$
Upper Bound: $i[t] \leq Ub[t]$
Current Index: $i[t]$

Iterative Strategy

The algorithm employs a donor-receiver greedy strategy:

Identify Extrema:
- Donor $p = \arg\max_t \ell_t$ (maximum left boundary)
- Receiver $q = \arg\min_t r_t$ (minimum right boundary)

Compute Transfer Amount:

donor_slack = ⌈(i[p] - Lb[p])/2⌉
receiver_slack = ⌈(Ub[q] - i[q])/2⌉
Δ = min{donor_slack, receiver_slack}

Execute Transfer:
- $i[p] \leftarrow i[p] - \Delta$
- $i[q] \leftarrow i[q] + \Delta$
- Update bounds: $Ub[p] \leftarrow i[p]$ , $Lb[q] \leftarrow i[q]$
Update Heaps: Refresh heap keys for affected sequences

Technical Innovations

Index-Space Operations: Operates entirely in index space, avoiding value-domain search and merge operations
Geometric Convergence: Guarantees logarithmic convergence speed through halving the feasible region
Imbalance Potential Function: Defines $\Phi(i) = \max_t \ell_t - \min_t r_t$ as convergence criterion
Deterministic Complexity: Algorithm complexity is independent of target rank $K$

Theoretical Analysis

Correctness Proof

Lemma 1 (Local Extrema Optimality)

If $\Phi(i) > 0$ , let $p = \arg\max_t \ell_t$ and $q = \arg\min_t r_t$ . Among all feasible infinitesimal transfers maintaining $\sum_t i_t = K$ , the $(p,q)$ pair achieves the maximum non-increasing change in $\Phi$ .

Proof Sketch: Decreasing $i_p$ reduces $\ell_p$ (the local maximum of left boundaries), while increasing $i_q$ increases $r_q$ (the local minimum of right boundaries). Since $\ell_p \geq \ell_u$ and $r_q \leq r_v$ for all $u,v$ , the extremal pair $(p,q)$ produces the steepest reduction in the gap $\max\ell - \min r$ .

Lemma 2 (Transfer Order Exchangeability)

Any sequence of feasible transfers reducing $\Phi$ can be reordered such that all extremal $(p,q)$ transfers occur before any non-extremal transfers, without worsening $\Phi$ at any intermediate step.

Theorem 1 (Convergence and Validity)

Algorithm 2 terminates with a valid co-ranking vector $(i_1, \ldots, i_m)$ satisfying $\sum_t i_t = K$ and $\max_t \ell_t \leq \min_t r_t$ .

Complexity Analysis

Round Analysis

In each round, the feasible distance of either the donor or receiver is halved. The distance $Ub[t] - Lb[t]$ for each sequence is reduced at most $O(\log n_t)$ times. Aggregating across all $m$ sequences, the total number of rounds is:

$T = O\left(\log\left(\sum_{t=1}^m n_t\right)\right)$

Time Complexity

Each round executes a constant number of index heap operations ( $O(\log m)$ time), yielding total time complexity:

$O\left(\log\left(\sum_t n_t\right) \cdot \log m\right)$

Space Complexity

The algorithm only needs to store indices and boundary information for $m$ sequences, with space complexity $O(m)$ .

Algorithm Implementation

Core Algorithm Flow

def multi_way_corank(sequences, K):
    m = len(sequences)
    # Initialize bounds and indices
    Lb = [0] * m
    Ub = [len(seq) for seq in sequences]
    i = water_fill_initialization(K, Ub)
    
    # Construct index heaps
    HL = MaxHeap()  # Max-heap for left boundaries
    HR = MinHeap()  # Min-heap for right boundaries
    
    for t in range(m):
        HL.insert(t, left_boundary(sequences[t], i[t]))
        HR.insert(t, right_boundary(sequences[t], i[t]))
    
    while True:
        # Obtain donor and receiver
        max_left, p = HL.peek()
        min_right, q = HR.peek()
        
        # Check termination condition
        if max_left <= min_right:
            break
            
        # Compute transfer amount
        donor_slack = ceil((i[p] - Lb[p]) / 2)
        receiver_slack = ceil((Ub[q] - i[q]) / 2)
        delta = min(donor_slack, receiver_slack)
        
        # Execute transfer
        i[p] -= delta
        i[q] += delta
        
        # Update bounds
        Ub[p] = i[p]
        Lb[q] = i[q]
        
        # Update heaps
        update_heaps(HL, HR, sequences, i, p, q)
    
    return i

Initialization Strategy

Employs a "water-filling" strategy for feasible solution initialization:

def water_fill_initialization(K, capacities):
    i = [0] * len(capacities)
    need = K
    for t in range(len(capacities)):
        take = min(capacities[t], need)
        i[t] = take
        need -= take
        if need == 0:
            break
    return i

Application Scenarios

1. Distributed Fractional Knapsack Problem

In multi-source fractional knapsack problems, when items are sorted by density and distributed across $m$ shards, co-ranking computes global $K$ -prefix partitioning without merging source data.

2. Parallel $m$ -Way Merge Partitioning

Assigns disjoint prefixes to processors without executing preliminary merge. The co-ranking vector determines exact handoff points, allowing processors to perform only local merging over their ranges.

3. Multi-Stream Join Partitioning

In database or stream processing pipelines, partitioning join frontiers at global rank is a natural requirement. This method produces per-stream cursors consistent with the global prefix.

Experimental Verification

While the paper primarily focuses on theoretical analysis, the author provides implementation code for verification. Algorithm practical performance can be evaluated through:

Theoretical Performance Guarantees

Time Complexity: $O(\log(\sum_t n_t) \log m)$
Space Complexity: $O(m)$
Independence: Complexity is independent of target rank $K$

Comparison with Existing Methods

vs. Merge Methods: Avoids $O(N)$ merge overhead
vs. Value-Space Methods: Avoids global counting operations
vs. Frederickson-Johnson: Index-space operations provide greater efficiency

Two-List Co-Ranking

Siebert and Träff's work established the foundation for co-ranking, primarily used for work partitioning in parallel merge. This paper extends their approach from 2-way to arbitrary $m$ -way.

Splitter-Based Parallel Sorting

Siebert and Wolf's exact splitter method operates in value space, searching for key thresholds that balance partitions. In contrast, this paper operates in index space, directly outputting co-ranking vectors.

Selection in Sorted Partitions

Frederickson-Johnson's classical work studies selection and ranking on structured inputs (such as unions of $m$ sorted lists). Their algorithm is essentially a value-space process, while this paper provides an index-space primitive.

Conclusions and Discussion

Main Conclusions

Successfully extends two-way co-ranking to multi-way partitioning while maintaining favorable theoretical properties
Index-space operations avoid value-domain search, providing deterministic complexity guarantees
Algorithm is simple to implement with good practical applicability

Limitations

Assumptions: Requires input sequences to be pre-sorted
Application Scope: Primarily applicable to scenarios requiring exact partitioning
Experimental Validation: Lacks large-scale experimental verification of performance

Future Directions

Dynamic Sequences: Extend to support dynamically updated sequences
Approximation Algorithms: Develop faster approximate versions
Parallelization: Investigate parallelization possibilities
Practical Applications: Validate effectiveness in more real-world systems

In-Depth Evaluation

Strengths

Theoretical Contribution: First efficient algorithm for multi-way co-ranking, filling a theoretical gap
Method Innovation: Novel index-space operation approach avoids limitations of traditional methods
Rigorous Analysis: Provides complete correctness proofs and complexity analysis
Practical Value: Simple, easy-to-implement algorithm with clear application scenarios

Weaknesses

Missing Experiments: Paper lacks experimental verification, preventing practical performance assessment
Limited Comparisons: No detailed performance comparisons with existing methods
Shallow Application Discussion: Application scenarios discussed relatively simply, lacking deep analysis

Impact

Academic Value: Provides theoretical foundation for multi-way co-ranking problem
Practical Potential: Shows application prospects in distributed computing and parallel processing
Reproducibility: Author provides implementation code, facilitating verification and extension

Applicable Scenarios

Data partitioning in distributed systems
Load balancing in parallel algorithms
Query optimization in database systems
Multi-stream merging in stream processing systems

References

1 Greg N. Frederickson and Donald B. Johnson. Generalized selection and ranking. STOC 1980.

2 Christian Siebert. Perfectly load-balanced, stable, synchronization-free parallel merge. Parallel Processing Letters, 2014.

3 Christian Siebert. Simple in-place yet comparison-optimal mergesort, arXiv:2509.24540, 2025.

4 Christian Siebert and Felix Wolf. A scalable parallel sorting algorithm using exact splitting. RWTH Aachen University technical report, 2011.

Overall Assessment: This is a theoretically strong algorithmic paper that successfully addresses the important multi-way co-ranking problem. While lacking experimental verification, it features rigorous theoretical analysis and strong methodological innovation, providing valuable theoretical tools for related fields.