2025-11-13T22:01:11.053323

Lower Bounds on Conversion Bandwidth for MDS Convertible Codes in Split Regime

Wang, Hu

We propose several new lower bounds on the bandwidth costs of MDS convertible codes using a linear-algebraic framework. The derived bounds improve previous results in certain parameter regimes and match the bandwidth cost of the construction proposed by Maturana and Rashmi (2022 IEEE International Symposium on Information Theory) for $r^F\le r^I\le k^F$, implying that our bounds are tight in this case.

academic

Lower Bounds on Conversion Bandwidth for MDS Convertible Codes in Split Regime

Basic Information

Paper ID: 2511.00953
Title: Lower Bounds on Conversion Bandwidth for MDS Convertible Codes in Split Regime
Authors: Lewen Wang, Sihuang Hu (Shandong University)
Classification: cs.IT, math.IT (Information Theory)
Publication Date: November 2, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2511.00953

Abstract

This paper proposes a method based on linear algebra framework to derive lower bounds on conversion bandwidth for MDS convertible codes. The derived bounds improve upon previous results in certain parameter ranges and match the bandwidth overhead of the construction proposed by Maturana and Rashmi (2022 IEEE ISIT) in the case rF ≤ rI ≤ kF, establishing the tightness of the bound.

Research Background and Motivation

Problem Statement

This paper investigates the minimization of conversion bandwidth for MDS convertible codes in the split regime within distributed storage systems. Specifically, when an initial codeword needs to be split into multiple final codewords, how to minimize the amount of data transmission during the conversion process.

Problem Significance

Practical Demand: In large-scale cloud storage systems (e.g., Google), the failure probability of storage nodes varies over time. Dynamically adjusting erasure code parameters can save 11-44% of storage overhead.
Efficiency Requirements: Traditional complete re-encoding methods are computationally and I/O intensive, necessitating efficient code conversion mechanisms.
Theoretical Value: Conversion bandwidth is a key metric for measuring conversion efficiency, but the optimal bandwidth lower bound in split regime has remained an open problem.

Limitations of Existing Methods

Maturana and Rashmi's Work: Established tight bounds in merge regime, but in split regime only proposed bounds based on information flow models and a conjecture, failing to completely resolve the problem.
Assumption Constraints: Previous work assumed uniform data downloads for unchanged and retired symbols, limiting the tightness of bounds.
Parameter Ranges: In certain parameter ranges, existing lower bounds are not sufficiently tight, with gaps compared to known constructions.

Research Motivation

By revisiting the code conversion problem from a linear algebra perspective, establishing containment relationships between column spaces of generator matrices of initial and final codes, we derive tighter bandwidth lower bounds and establish their tightness in certain parameter ranges.

Core Contributions

Linear Algebra Reformulation: Introducing a vector space perspective, by identifying containment relationships between column spaces of generator matrices of initial and final codes, transforming the bandwidth minimization problem into a linear algebra optimization problem.
Closed-Form Lower Bounds: Based on containment relationships, by solving a series of linear programming problems, deriving explicit closed-form lower bounds (Theorems 1-3).
Tightness Proof: Proving that in the parameter range rF ≤ rI ≤ kF, the lower bound of Theorem 2 exactly matches the bandwidth overhead of the Maturana-Rashmi construction, establishing the tightness of the bound.
Improvement of Existing Results: In most parameter ranges, the new bounds strictly improve upon the bounds proposed by Maturana and Rashmi in Theorem 4, and eliminate the uniform download assumption.

Methodology Details

Task Definition

Given an nI, kI, ℓ MDS initial code CI and nF, kF, ℓ MDS final code CF, where kI = λkF (λ ≥ 2), the objective is to find a linear conversion process T such that:

Input: Initial codeword CI(m), where m = (m1,...,mλ)
Output: λ final codewords {CF(mi) : i ∈ λ}
Optimization Objective: Minimize read bandwidth overhead R = Σ βi, where βi is the number of sub-symbols read from symbol i

Symbols are classified into three categories:

Unchanged symbols: appearing in both initial and final codewords
Retired symbols: appearing only in initial codewords
New symbols: appearing only in final codewords

Core Theoretical Framework

Containment Relationship (Lemma 2)

For stable convertible codes, define:

C̃: containing all block rows of final code generator matrices corresponding to read sub-symbols
B̃: block rows of initial code generator matrix corresponding to read sub-symbols of retired symbols

Key containment relationship:

⟨C̃⟩ ⊆ ⟨B̃⟩

The intuitive meaning of this containment relationship: all new symbols must be computable from read sub-symbols of initial codewords, therefore the column space corresponding to new symbols must be contained in the column space of read retired symbols.

Proof Strategy:

By definition of conversion process, there exists matrix T such that new symbols can be linearly computed from read sub-symbols
By selecting standard basis vectors as messages, establishing relationships between generator matrices
Eliminating rows corresponding to identity sub-blocks, obtaining the containment relationship

Rank Constraint Derivation

Starting from the containment relationship:

rank(C̃) ≤ rank(B̃)

Further decomposition:

For kF ≤ rF: utilizing the full row rank property of C
For rF ≤ kF: utilizing MDS property to select rF-sized subsets

Main Theorems

Theorem 1 (Case kF ≤ rF)

Lower Bound: R ≥ kIℓ = λkFℓ

Proof Key Points:

From containment relationship: Σ rank(C(i)) ≤ Σ βj (retired symbols)
From C full row rank: rank(C(i)) ≥ kFℓ - Σ βj (unchanged symbols)
Combining both inequalities yields the result

Tightness: This bound can be achieved through complete re-encoding.

Theorem 2 (Case rF ≤ rI ≤ kF)

Lower Bound:

R ≥ λrFℓ · [(λ-1)kF + rI] / [(λ-1)rF + rI]

Proof Strategy:

Rank Lower Bound for C̃: Selecting all rF-sized subsets Ui, utilizing MDS property
- For each subset, sub-matrix rank at least rFℓ - Σ βj
- Summing and averaging yields: rank(C̃) ≥ λrFℓ - (rF/kF)Σβj
Rank Lower Bound for B(i): For each block, utilizing rI ≤ kF
- rank(B(i)) ≥ Σβj(retired) - (rI/kF)Σβj(unchanged within block)
Linear Programming: Establishing two constraint conditions
- Constraint 1: rFΣβj(unchanged) + kFΣβj(retired) ≥ λkFrFℓ
- Constraint 2: Derived from rank(B̃) - rank(C̃) relationship
Solving LP yields optimal lower bound

Tightness: Matching the Maturana-Rashmi construction.

Theorem 3 (Case rF ≤ kF ≤ rI)

Lower Bound:

R ≥ {
  λrFℓ,                           if kI ≤ rI
  λ²(kF)²rFℓ / [kFrI - rFrI + λkFrF],  if kI > rI
}

Proof Key Points:

Since kF ≤ rI, the bound for rank(B(i)) changes
Establishing new linear programming constraints
Solving in two cases: kI ≤ rI and kI > rI
Through graphical analysis of feasible region finding optimal solution

Technical Innovations

Algebraic Simplification: Transforming combinatorial optimization problems into column space containment relationships, making problems more tractable.
Block-wise Rank Analysis: Through rank properties of block matrices, establishing relationships between quantities of read sub-symbols and dimensions of column spaces.
Linear Programming Framework: Integrating multiple rank constraints into linear programming problems, systematically solving for optimal lower bounds.
Parameter Case Analysis: According to relative magnitude relationships of kF, rF, rI, kI, employing different rank lower bound derivation strategies.

Experimental Setup

Verification Method

This paper is primarily theoretical work, with results verified through mathematical proofs. A concrete example is provided in Appendix A:

Parameter Settings:

Initial code: nI=8, kI=4, ℓ=4 MDS array code
Final code: nF=3, kF=2, ℓ=4 MDS array code
Finite field: F₄₃
λ = 2 (one initial codeword split into 2 final codewords)

Reading Strategy:

First 4 symbols: not read (Di = ∅)
Last 4 symbols: read first 2 sub-symbols (Di = {1,2})
Total read: 8 sub-symbols

Verification Results

Through explicit construction of generator matrices GI and GF, and conversion matrix E, verifying:

C̃E = B̃

where E is an 8×8 invertible matrix, proving the containment relationship holds exactly (⟨C̃⟩ = ⟨B̃⟩).

Read bandwidth is exactly λrFℓ = 8, completely matching the lower bound of Theorem 3.

Experimental Results

Theoretical Comparison

Comparison with Maturana-Rashmi Bound

Previous work's lower bound (Formula 17):

R ≥ {
  λkFℓ - rIℓ·max{kF/rF - 1, 0},  if rI ≤ λrF
  λmin{rF, kF}ℓ,                   if rI > λrF
}

Comparison Results:

Case rF ≥ kF:
- This paper's bound: kIℓ
- Previous bound: kIℓ
- Conclusion: Identical and tight

Case rF ≤ rI ≤ kF:

When rI ≤ λrF:

[λkFℓ - (kF-rF)rIℓ/rF] / [this paper's bound] 
= 1 - rI(kF-rF)(rI-rF) / [λ(rF)²((λ-1)kF+rI)] ≤ 1

When rI > λrF:

λrFℓ / [this paper's bound] = [(λ-1)rF+rI] / [(λ-1)kF+rI] ≤ 1

Conclusion: This paper's bound is strictly tighter and matches construction

Case rF ≤ kF ≤ kI ≤ rI:
- This paper's bound: λrFℓ
- Previous bound: λrFℓ
- Conclusion: Identical
Case rF ≤ kF ≤ rI < kI:
- When rI > λrF:
```
λrFℓ / [this paper's bound] < 1
```
- When rI ≤ λrF:
```
[λkFℓ - rIℓ(kF/rF-1)] / [this paper's bound] < 1
```
- Conclusion: This paper's bound is strictly tighter

Main Findings

Tightness Region: Within the range rF ≤ rI ≤ kF, the lower bound is tight (achievable).
Improvement Magnitude: In the case rF ≤ kF ≤ rI < kI, improvements are most significant, particularly when parameter differences are large.
Advantages of Linear Algebra Method: Compared to information flow methods, the linear algebra framework provides more precise constraints.
Constructibility: The example in the appendix demonstrates that at least for certain parameters, the lower bound is constructively achievable.

Convertible Code Foundations

Maturana & Rashmi (2020, ITCS; 2022, IEEE TIT): Proposed convertible code framework, establishing tight bounds on access overhead.
Maturana, Mukka & Rashmi (2020, ISIT): Access-optimal linear MDS convertible codes for all parameters.

Domain Size Optimization

Chopra, Maturana & Rashmi (2024, ISIT): Access-optimal convertible code constructions with small domain size.

Extension Directions

Kong (2024, IEEE TIT): Locally repairable convertible codes.
Ge, Cai & Tang (2024, ArXiv): Generalized convertible codes.
Shi, Fang & Gao (2025, ArXiv): Bounds and constructions for conversion to LRC.

Bandwidth Overhead Research

Maturana & Rashmi (2023, IEEE TIT): Tight lower bounds and optimal constructions for bandwidth overhead in merge regime.
Maturana & Rashmi (2022, ISIT): Bandwidth lower bounds and constructions in split regime (the object of improvement in this paper).

Positioning of This Paper

This paper focuses on bandwidth lower bounds in split regime, improving upon previous bounds based on information flow through linear algebra methods, and establishing tightness in certain parameter ranges.

Conclusions and Discussion

Main Conclusions

Theoretical Contribution: Establishing tighter bandwidth lower bounds for MDS convertible codes in split regime, covering different parameter ranges in three theorems.
Tightness Proof: In the range rF ≤ rI ≤ kF, proving the achievability of lower bounds, resolving the optimal bandwidth problem for this parameter range.
Methodological Innovation: The linear algebra framework provides a new perspective for analyzing code conversion problems, potentially applicable to other conversion scenarios.
Practical Value: Providing theoretical foundation for designing efficient code conversion protocols in distributed storage systems.

Limitations

Linear Transformation Assumption: All results are based on linear conversion processes; non-linear transformations may achieve lower bandwidth overhead.
Partial Parameter Ranges: In the case rF ≤ kF ≤ rI < kI, although bounds are tighter, tightness has not been proven, lacking matching constructions.
Stability Assumption: Focusing on stable convertible codes (maximizing unchanged symbols), analysis of non-stable codes not addressed.
Construction Methods: Main contribution is lower bounds; explicit constructions only provided in appendix with one example, lacking systematic construction methods.
Domain Size Requirements: Example uses F₄₃; feasibility for small fields not discussed.

Future Directions

Explicitly proposed directions in the paper:

Explicit Constructions: Developing explicit code constructions achieving Theorem 3 bounds, particularly for kI > rI case.
Non-linear Transformations: Exploring whether non-linear conversion processes can further reduce bandwidth overhead.

Potential research directions: 3. Other Parameter Ranges: Investigating parameter combinations not covered in this paper.

Domain Size Optimization: Reducing domain size while maintaining bandwidth optimality.
Computational Complexity: Analyzing computational overhead of conversion processes.
Practical System Implementation: Applying theoretical results to real distributed storage systems.

In-Depth Evaluation

Strengths

1. Methodological Innovation

Novel Perspective: Transforming combinatorial problems into column space containment relationships represents methodological innovation in the field.
Systematization: Linear programming framework provides unified analytical tools, extensible to other scenarios.
Mathematical Rigor: Complete proofs with clear logic and sufficient justification for each step.

2. Theoretical Contributions

Improved Bounds: Strictly superior to previous work in most parameter ranges.
Tightness Proof: Establishing achievability in key parameter ranges, resolving open problems.
Assumption Removal: No longer requiring uniform download assumption, making results more general.

3. Technical Depth

Block-wise Rank Analysis: Skillfully utilizing MDS code properties to establish rank constraints.
Parameter Classification: Employing different strategies for different parameter relationships, demonstrating deep understanding.
Linear Programming Solution: Transforming complex optimization into solvable LP problems.

4. Writing Quality

Clear Structure: From problem definition, theoretical framework to main results, well-organized hierarchy.
Notation Standards: Consistent mathematical notation with clear definitions.
Detailed Comparison: Section 4's comparative analysis is thorough, clearly demonstrating improvements.

Weaknesses

1. Missing Construction Methods

Appendix provides only one 8×4 example, lacking systematic construction algorithms.
No achievability proof or construction for kI > rI case in Theorem 3.
Practical applications require explicit encoding and conversion algorithms.

2. Insufficient Experimental Verification

As theoretical work, lacks numerical experiments or simulations.
No comparison with actual system parameters, difficult to assess practical value.
No statistics on improvement magnitudes across different parameters.

3. Applicability Analysis

Necessity of linear transformation assumption not sufficiently justified.
Impact of stability assumption not quantified.
Extensibility to non-MDS codes or other code classes not discussed.

4. Technical Details

Some proof steps (e.g., summation techniques in Theorem 2) lack intuitive explanation.
Linear programming feasible region analysis (Figure 1) could be more detailed.
Domain size and computational complexity not addressed.

Insufficient comparison with other code conversion methods (e.g., partial re-encoding).
Limited discussion on essential differences between information flow and algebraic methods.

Impact Assessment

Contribution to the Field

Theoretical Completion: Filling theoretical gaps in bandwidth lower bounds for split regime.
Methodology: Linear algebra framework may inspire research on other code conversion problems.
Benchmark Establishment: Providing theoretical optimality reference for subsequent constructions.

Practical Value

Design Guidance: Providing theoretical optimal reference for distributed storage system design.
Parameter Selection: Helping system designers make trade-offs across different parameter combinations.
Performance Evaluation: Assessing efficiency of existing conversion protocols.

Reproducibility

Complete Proofs: All theorems have detailed proofs, verifiable.
Concrete Examples: Appendix A provides complete matrices and verification.
Open Problems: Clearly identifying unresolved issues for future research.

Applicable Scenarios

Ideal Application Scenarios

Large-Scale Cloud Storage: Node failure rates change dynamically, requiring frequent code parameter adjustments.
Hierarchical Storage: Data migration between storage tiers, requiring redundancy changes.
Load Balancing: Redistributing data through code conversion to balance storage load.

Limiting Conditions

MDS Code Requirement: Only applicable when both initial and final codes are MDS.
Linear Transformation: Requiring linear conversion processes.
Stability: Scenarios maximizing unchanged symbols.
Parameter Constraints: kI = λkF integer multiple relationship.

Extension Possibilities

Locally Repairable Codes: Incorporating LRC properties.
Non-MDS Codes: Extending to other code classes.
Multi-level Conversion: Optimizing consecutive multiple conversions.

Key References

Maturana & Rashmi (2022, IEEE TIT): "Convertible codes: Enabling efficient conversion of coded data in distributed storage" - Foundational framework for convertible codes
Maturana & Rashmi (2022, ISIT): "Bandwidth cost of code conversions in the split regime" - Directly improved work in this paper
Maturana & Rashmi (2023, IEEE TIT): "Bandwidth cost of code conversions in distributed storage: Fundamental limits and optimal constructions" - Tight bounds for merge regime
Kadekodi, Rashmi & Ganger (2019, FAST): "Cluster storage systems gotta have HeART" - Practical motivation for dynamic code parameter adjustment
Kong (2024, IEEE TIT): "Locally repairable convertible codes with optimal access costs" - Extension to LRC

Summary

Through introducing a linear algebra framework, this paper successfully derives tighter bandwidth lower bounds for MDS convertible codes in split regime, establishing tightness in the range rF ≤ rI ≤ kF. Main strengths lie in methodological innovation and theoretical completion, while explicit constructions and experimental verification require further development. The work holds significant theoretical value for distributed storage system research, providing theoretical foundation and optimization objectives for subsequent code design. Future work should focus on developing systematic construction methods achieving the bounds and verifying performance gains in practical systems.