2025-11-15T10:19:11.421970

Hierarchical Qubit-Merging Transformer for Quantum Error Correction

Park, Kwak, Kim

For reliable large-scale quantum computation, a quantum error correction (QEC) scheme must effectively resolve physical errors to protect logical information. Leveraging recent advances in deep learning, neural network-based decoders have emerged as a promising approach to enhance the reliability of QEC. We propose the Hierarchical Qubit-Merging Transformer (HQMT), a novel and general decoding framework that explicitly leverages the structural graph of stabilizer codes to learn error correlations across multiple scales. Our architecture first computes attention locally on structurally related groups of stabilizers and then systematically merges these qubit-centric representations to build a global view of the error syndrome. The proposed HQMT achieves substantially lower logical error rates for surface codes by integrating a dedicated qubit-merging layer within the transformer architecture. Across various code distances, HQMT significantly outperforms previous neural network-based QEC decoders as well as a powerful belief propagation with ordered statistics decoding (BP+OSD) baseline. This hierarchical approach provides a scalable and effective framework for surface code decoding, advancing the realization of reliable quantum computing.

academic

Hierarchical Qubit-Merging Transformer for Quantum Error Correction

Basic Information

Paper ID: 2510.11593
Title: Hierarchical Qubit-Merging Transformer for Quantum Error Correction
Authors: Seong-Joon Park (POSTECH), Hee-Youl Kwak (University of Ulsan), Yongjune Kim (POSTECH)
Classification: quant-ph cs.AI cs.LG
Publication Date: October 14, 2025
Paper Link: https://arxiv.org/abs/2510.11593

Abstract

To enable reliable large-scale quantum computing, quantum error correction (QEC) schemes must effectively address physical errors to protect logical information. This paper leverages recent advances in deep learning and proposes the Hierarchical Qubit-Merging Transformer (HQMT), a novel universal decoding framework that explicitly exploits the structural graph of stabilizer codes to learn multi-scale error correlations. The architecture first computes attention locally on structurally-related stabilizer groups, then systematically merges these qubit-centric representations to construct a global view of error syndromes. By integrating dedicated qubit-merging layers into the transformer architecture, HQMT achieves significantly lower logical error rates on surface codes, substantially outperforming previous neural network QEC decoders and the strong BP+OSD baseline across various code distances.

Research Background and Motivation

Core Problem

The fundamental challenge facing quantum computing is the fragility of quantum states. Unlike classical bits, qubits are susceptible to environmental noise and operational imperfections, leading to errors such as bit flips and phase flips. Quantum error correction is a key technology for achieving fault-tolerant quantum computation.

Problem Significance

Practical demands of quantum computing: Large-scale quantum algorithms require maintaining quantum state coherence over extended periods
Physical constraints: The quantum no-cloning theorem makes traditional redundancy-based error correction methods inapplicable
Criticality of decoding latency: The decoder's response time directly impacts the clock speed of the entire quantum system

Limitations of Existing Methods

Classical algorithms: Methods like MWPM, while theoretically guaranteed, show limited performance on complex error patterns
Early neural network approaches: FFNN and CNN fail to fully exploit the structural properties of quantum codes
Iterative decoders: Methods like BP+OSD have unpredictable decoding times, becoming system bottlenecks

Research Motivation

This work aims to design a neural network decoder that both exploits the topological structure of quantum codes and provides fixed decoding latency, specifically optimized for the hierarchical error correlations in surface codes.

Core Contributions

Proposes HQMT architecture: The first hierarchical transformer decoder that explicitly models the topological structure of surface codes
Innovative qubit-merging layer: Fuses fine-grained Z/X stabilizer representations into coarse-grained qubit-level representations
Significant performance improvements: Surpasses existing neural network methods and BP+OSD baseline across multiple code distances
Scalability verification: Demonstrates performance advantages with increasing code distance and favorable pseudo-threshold characteristics

Methodology Details

Task Definition

Input: Error syndrome vector $s = [s_Z, s_X] \in \{0,1\}^{n-k}$ , where $s_Z$ and $s_X$ are Z-type and X-type syndromes respectively Output: Logical operator prediction $\hat{L} \in \{\bar{I}, \bar{X}, \bar{Y}, \bar{Z}\}$ Objective: Minimize logical error rate (LER)

Model Architecture

Overall Design

HQMT employs a two-stage hierarchical architecture:

Stage 1: Fine-grained processing, separately handling Z-type and X-type stabilizers
Stage 2: Coarse-grained processing, handling merged qubit-level representations

Key Components

1. Qubit-Centric Embedding Strategy For each physical qubit $q^{(i)}$ , construct two patches:

Z-type patch: $p_Z^{(i)} = (v_{Z,1}^{(i)}, ..., v_{Z,m}^{(i)})$
X-type patch: $p_X^{(i)} = (v_{X,1}^{(i)}, ..., v_{X,m}^{(i)})$

Where: $v_{Z,j}^{(i)} = \begin{cases} 1-2s_{Z,j} & \text{if } s_{Z,j} \in N_Z^{(i)} \\ 0 & \text{otherwise} \end{cases}$

2. Qubit-Merging Layer

Concatenates each qubit's Z-token and X-token into a $2d_{model}$ -dimensional vector
Projects back to $d_{model}$ dimension through a fully connected layer
Implements dimension transformation from $2n \times d_{model}$ to $n \times d_{model}$

3. Hierarchical Transformer Processing

Stage 1: $N$ transformer blocks process $X_1 \in \mathbb{R}^{2n \times d_{model}}$
Qubit-merging layer transformation
Stage 2: $N$ transformer blocks process $X_2 \in \mathbb{R}^{n \times d_{model}}$

4. Output Layer Generates 4-dimensional logits through mean pooling and fully connected layer, with softmax applied to obtain logical operator probability distribution.

Technical Innovations

1. Topology-Aware Design Explicitly models the topological property that each physical qubit in surface codes connects to at most 4 stabilizers.

2. Hierarchical Attention Mechanism

Local attention: Learns fine-grained correlations between adjacent stabilizers
Global attention: Captures non-local error patterns between qubits

3. Parameter Sharing Strategy Transformer blocks in both stages share parameters, improving parameter efficiency.

Experimental Setup

Dataset

Surface codes: Rotated surface codes $[[n=d^2, k=1, d]]$
Noise model: Depolarizing noise model
Code distances: $d = 3, 5, 7, 9, 11$
Physical error rate range: $p \in [0.07, 0.13]$

Evaluation Metrics

Logical Error Rate (LER): Primary performance metric
Pseudo-threshold: Physical error rate at which LER equals uncoded qubit error rate

Comparison Methods

Classical algorithms: MWPM, BP+OSD (quaternary, 20 iterations)
Neural networks: FFNN, CNN
Ablation variants: Stage 1 only, Stage 2 only

Implementation Details

Model dimension: $d_{model} = 128$
Transformer layers: $N = 3$
Loss function: Cross-entropy loss
Training strategy: End-to-end training

Experimental Results

Main Results

Performance Comparison:

HQMT significantly outperforms MWPM, FFNN, and CNN across all tested code distances
Maintains clear advantages over BP+OSD baseline at $d=5,7,9,11$
Performance gap widens with increasing code distance, demonstrating good scalability

Pseudo-Threshold Comparison:

Code Distance	MWPM	FFNN	CNN	HQMT
d=3	0.0828	0.0977	0.0980	0.0980
d=5	0.1036	0.1135	0.1215	0.1300
d=7	0.1194	0.1249	0.1326	0.1417

Ablation Studies

Architecture Component Analysis:

"Stage 1 only": Significant performance degradation, proving necessity of qubit-merging
"Stage 2 only": Fails to effectively exploit local structural information
Complete HQMT: Both stages cooperate synergistically for optimal performance

Depth Impact Analysis:

$N=1$ to $N=3$ : Significant performance improvement
$N=3$ to $N=5$ : Marginal gains; $N=3$ selected for performance-efficiency balance

Experimental Findings

Effectiveness of hierarchical design: Two-stage processing is crucial for capturing multi-scale error correlations
Importance of topological structure: Qubit-centric embedding strategy significantly enhances performance
Scalability advantages: Relative advantages of HQMT become more pronounced with increasing code distance

Development of Quantum Error Correction Decoders

Classical algorithms: Graph-theoretic methods like MWPM
Early neural networks: FFNN first introduced deep learning to QEC
Convolutional approaches: CNN exploits the planar nature of surface codes
Transformer applications: Transformer-QEC and others explore attention mechanisms

Relative Advantages of This Work

First hierarchical transformer explicitly modeling quantum code topology
Innovative qubit-merging mechanism
Consistent advantages across multiple baselines

Conclusions and Discussion

Main Conclusions

HQMT effectively captures multi-scale error correlations in surface codes through hierarchical processing
The qubit-merging layer is a key innovation connecting local and global features
The method achieves state-of-the-art performance while maintaining fixed decoding latency

Limitations

Code type restrictions: Primarily designed for surface codes; applicability to other quantum codes requires verification
Noise model: Tested only under depolarizing noise; actual quantum device noise is more complex
Computational overhead: Transformer architecture complexity may limit real-time applications

Future Directions

Extension to other quantum code families (e.g., LDPC codes)
Adaptation to more complex noise models
Hardware-friendly model compression and acceleration

In-Depth Evaluation

Strengths

Strong novelty: The qubit-merging layer design is innovative, effectively combining quantum code structure with transformer advantages
Comprehensive experiments: Full comparisons across multiple code distances and baselines with well-designed ablation studies
Solid theoretical foundation: Method design tightly integrates with topological properties of surface codes
Significant performance gains: Achieves notable improvements across all tested scenarios

Weaknesses

Limited generality: Design is overly tailored to surface codes; migration to other quantum codes requires redesign
Insufficient practical deployment considerations: Lacks discussion of hardware implementation and real-time performance
Missing theoretical analysis: No convergence guarantees or theoretical analysis of generalization ability

Impact

Academic contribution: Provides a new architectural paradigm for quantum error correction decoder design
Practical value: Fixed decoding latency characteristic is important for actual quantum systems
Reproducibility: Method description is detailed with clear experimental setup

Applicable Scenarios

Surface code decoding: Directly applicable to fault-tolerant quantum computing systems based on surface codes
Real-time quantum error correction: Fixed latency characteristic suits applications with strict timing requirements
Large-scale quantum systems: Good scalability suits future large-scale quantum processors

References

This paper cites important literature from quantum error correction, deep learning, and neural network decoders, particularly:

Gottesman (1997): Theoretical foundation of stabilizer codes
Varsamopoulos et al. (2018): First neural network QEC decoder
Jung et al. (2024): CNN application in surface code decoding
Google Quantum AI (2023, 2025): Experimental verification of surface codes

Overall Assessment: This is a high-quality paper with significant contributions to quantum error correction decoding. The HQMT architecture is ingeniously designed with sufficient experimental validation, opening new directions for neural network applications in quantum error correction. Despite certain limitations in generality, its outstanding performance on surface code decoding and fixed-latency characteristics provide important practical value.