2025-11-12T05:10:09.967264

Soft Graph Transformer for MIMO Detection

Hong, Liu, Bian et al.
We propose the Soft Graph Transformer (SGT), a soft-input-soft-output neural architecture designed for MIMO detection. While Maximum Likelihood (ML) detection achieves optimal accuracy, its exponential complexity makes it infeasible in large systems, and conventional message-passing algorithms rely on asymptotic assumptions that often fail in finite dimensions. Recent Transformer-based detectors show strong performance but typically overlook the MIMO factor graph structure and cannot exploit prior soft information. SGT addresses these limitations by combining self-attention, which encodes contextual dependencies within symbol and constraint subgraphs, with graph-aware cross-attention, which performs structured message passing across subgraphs. Its soft-input interface allows the integration of auxiliary priors, producing effective soft outputs while maintaining computational efficiency. Experiments demonstrate that SGT achieves near-ML performance and offers a flexible and interpretable framework for receiver systems that leverage soft priors.
academic

Soft Graph Transformer for MIMO Detection

Basic Information

  • Paper ID: 2509.12694
  • Title: Soft Graph Transformer for MIMO Detection
  • Authors: Jiadong Hong¹, Lei Liu¹, Xinyu Bian², Wenjie Wang², Zhaoyang Zhang¹
  • Affiliations: ¹College of Information and Electronic Engineering, Zhejiang University, ²Theoretical Laboratory, Huawei Technologies Co., Ltd.
  • Categories: cs.LG cs.IT eess.SP math.IT
  • Publication Date: September 17, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2509.12694

Abstract

This paper proposes the Soft Graph Transformer (SGT), a soft-input soft-output neural architecture specifically designed for MIMO detection. While maximum likelihood (ML) detection achieves optimal accuracy, its exponential complexity is infeasible for large-scale systems, and traditional message-passing algorithms rely on asymptotic assumptions that often fail in finite-dimensional settings. Recent Transformer-based detectors show promise but typically overlook MIMO factor graph structure and cannot leverage prior soft information. SGT addresses these limitations by combining self-attention mechanisms (encoding symbol and constraint subgraph context dependencies) with graph-aware cross-attention mechanisms (performing structured message passing across subgraphs). Its soft-input interface enables integration of auxiliary priors while producing effective soft outputs while maintaining computational efficiency.

Research Background and Motivation

Problem Definition

MIMO systems, while fundamental to modern wireless communications providing high spectral efficiency and robust links, still present challenges in efficient symbol detection.

Limitations of Existing Methods

  1. Maximum Likelihood Detection: Achieves optimal accuracy but has computational complexity O(M^Nt) (M is constellation size), making it infeasible for large-scale systems
  2. Message-Passing Algorithms: Methods like AMP, OAMP, MAMP have lower complexity but rely on asymptotic assumptions, proving fragile in finite-dimensional settings
  3. Deep Unfolding Methods: Approaches such as OAMP-Net and DetNet learn algorithm parameters from data but remain constrained by assumptions of the underlying algorithms
  4. Existing Transformer Methods:
    • RE-MIMO lacks explicit graph awareness
    • Transformer-based MIMO uses QR decomposition with high cost and ignores factor graph structure

Research Motivation

Inspired by classical message-passing MIMO detection, this work aims to design a Transformer architecture that:

  1. Exploits MIMO factor graph structure
  2. Supports soft-input soft-output interfaces
  3. Provides a principled approach unifying context encoding and message passing

Core Contributions

  1. Proposes SGT Architecture: First MIMO detector unifying factor-graph-guided self-attention and cross-attention within an AMP-style framework
  2. Graph-Aware Tokenization Method: Transforms the weighted dense factor graph of MIMO systems into a dual-subgraph representation suitable for Transformer processing
  3. Soft-Input Soft-Output Interface: Naturally integrates external prior information from other receiver modules
  4. Performance Improvements: Achieves near-ML detection accuracy in small-scale MIMO systems and demonstrates superior quadratic complexity growth in large-scale systems

Methodology Details

Task Definition

Inputs:

  • Received signal vector y ∈ R^(2Nr)
  • Channel matrix H ∈ R^(2Nr×2Nt)
  • Noise variance information
  • Optional prior soft information (LLR)

Outputs:

  • Bit-level posterior log-likelihood ratios (LLR) suitable for channel decoders

Constraints: Linear system model y = Hx + n, where n ~ N(0,Σ)

Model Architecture

1. Graph-Aware Tokenization

Decomposes the MIMO factor graph into two subgraphs:

Linear Constraint Tokens/Subgraph:

T_lin = {τ_j = (y_j, h_j, σ²_j) | j ∈ {1,...,2Nr}}

where h_j is the j-th row of H, encoding local likelihood constraints between received signals and transmitted symbols.

Symbol Tokens/Subgraph:

T_sym = {x_i^(l) | i ∈ {1,...,2Nt}}

Corresponds to variable nodes of transmitted symbols, serving as query embeddings interacting with constraint tokens via cross-attention.

2. Attention Mechanism Design

Self-Attention - Context Encoding: Provides robust context encoding within homogeneous token sets, ensuring consistency among similar entities:

t̃_j = ∑_{k=1}^N α_{jk} W^V t_k
α_{jk} = softmax((W^Q t_j)^T (W^K t_k) / √d_k)

Cross-Attention - Message Passing: Implements directed message passing between heterogeneous token types:

t̃_j = ∑_i α_{ij} W^V t_i
α_{ij} = softmax((W^Q t_j)^T (W^K t_i) / √d_k)

3. Soft-Input Soft-Output Interface

Soft-Input Embedding Module:

  • Symbol tokens: T_sym, dimension 2Nt, Nbits/2
  • Linear constraint tokens: T_lin, dimension 2Nr, 2Nt+2
  • Processed independently via dedicated FFN with positional encoding

Soft-Output Module:

  • Receives embedding representations: dimension 2Nt, d_model
  • Processed via FFN + Sigmoid activation
  • Produces final soft output: dimension 2Nt, Nbits/2

Technical Innovations

  1. Structured Attention Design: Unlike CrossMPT, SGT combines self-attention and cross-attention tailored to MIMO's homogeneous subgraph characteristics
  2. Information Preservation Advantage: Compared to QR-decomposition-based methods, graph-aware tokenization retains more symbol-level information
  3. Unified Framework: Integrates AMP-inspired updates with Transformer architecture, achieving interpretable message passing

Experimental Setup

Datasets

  • Channel Model: Rayleigh fading channel with perfect CSI
  • Modulation: QPSK (Quadrature Phase Shift Keying)
  • System Configuration: 8×8, 8×16, 16×16 MIMO systems
  • Noise: Additive White Gaussian Noise

Evaluation Metrics

  • BER (Bit Error Rate): Bit error rate
  • Training Loss: Convergence analysis
  • Runtime: Computational efficiency assessment

Comparison Methods

  • Classical Methods: LMMSE, OAMP, Maximum Likelihood
  • Deep Learning Methods: OAMPNet2, DetNet
  • Transformer Methods: Transformer-based MIMO, RE-MIMO
  • Ablation Studies: Cross-attention-free version, tokenization-only version

Implementation Details

  • Model Dimension: d_model = 128
  • Network Layers: L = 8 layers
  • Training Parameters: Same learning rate, batch size, and training steps
  • Hardware Platform: RTX 4090 GPU

Experimental Results

Main Results

BER Performance Comparison:

  • In 8×8 MIMO systems, SGT significantly outperforms OAMPNet2 and Transformer-based MIMO
  • Maintains performance advantages in 8×16 and 16×16 systems
  • Approaches ML detection upper bound performance

Runtime Analysis (RTX 4090 GPU, 1000 samples):

Method8×88×1616×16
LMMSE0.00679s0.00718s0.00742s
OAMP0.02208s0.02234s0.02408s
OAMPNet20.03333s0.03415s0.03507s
Transformer-based MIMO0.03844s0.03924s0.04028s
SGT (Proposed)0.09351s0.09464s0.09498s

Ablation Studies

Role of Graph-Aware Tokenization:

  • Complete tokenization achieves lower final loss in small-scale systems (8×8)
  • Validates capability to preserve detailed symbol-level information
  • Requires cross-attention cooperation in large-scale systems

Contribution of Cross-Attention:

  • Enables faster convergence and superior final accuracy
  • Provides guidance similar to QR preprocessing but fully learnable
  • Alleviates training stagnation in large-scale systems

Complexity Analysis

Asymptotic Complexity Comparison:

MethodComplexityGrowth Trend
ML DetectionO(M^Nt)Exponential
OAMP/OAMPNetO(KNrNt²)Cubic
Transformer-based MIMOO(NrNt² + LNt²dmodel)Cubic
SGTL·O(Nr² + Nt² + NrNt)·dmodelQuadratic

MIMO Detection Methods Development

  1. Classical Methods: From linear detection (MMSE) to nonlinear detection (ML)
  2. Message-Passing Algorithms: Evolution and limitations of AMP series algorithms
  3. Deep Learning Methods: Evolution from DetNet to deep unfolding approaches

Transformer Applications in Communications

  1. Channel Decoding: ECCT leverages LDPC Tanner graphs; CrossMPT simulates message passing via cross-attention
  2. MIMO Detection: Contributions and limitations of RE-MIMO and Transformer-based MIMO

Positioning of This Work

SGT is the first MIMO detector explicitly integrating factor graph structure into Transformer architecture, unifying context encoding and message passing.

Conclusions and Discussion

Main Conclusions

  1. SGT successfully combines Transformer's context modeling capability with factor graph's structured message passing
  2. Achieves near-ML performance in small-scale MIMO systems while maintaining computational efficiency
  3. Soft-input soft-output interface provides flexibility for integration with other receiver modules
  4. Quadratic complexity growth offers better scalability for large-scale systems

Limitations

  1. Computational Overhead: While complexity growth is superior, absolute runtime remains higher than traditional methods
  2. Large-Scale Validation: Detection performance in ultra-large-scale MIMO settings requires further investigation
  3. Theoretical Analysis: Lacks rigorous theoretical convergence analysis
  4. Channel Adaptability: Primarily validated on Rayleigh fading channels; adaptability to other channel models needs exploration

Future Directions

  1. Further optimize computational efficiency to reduce absolute runtime
  2. Extend validation to larger-scale MIMO systems
  3. Investigate robustness under different channel conditions
  4. Joint optimization with other receiver components

In-Depth Evaluation

Strengths

  1. Strong Innovation: First explicit integration of factor graph structure into Transformer with novel design
  2. Solid Theoretical Foundation: Message passing inspired by AMP framework has solid theoretical support
  3. Comprehensive Experiments: Includes detailed ablation studies and complexity analysis
  4. High Practical Value: Soft-input soft-output interface enhances system integration flexibility
  5. Clear Presentation: Technical details accurately described with intuitive figures

Weaknesses

  1. Limited Performance Gains: Improvements over baselines are consistent but modest
  2. Computational Efficiency: Actual runtime 2-3 times higher than traditional methods
  3. Limited Validation Scope: Primarily validated on small-scale systems and specific channel conditions
  4. Insufficient Theoretical Analysis: Lacks convergence and optimality guarantees
  5. Incomplete Comparisons: Missing comparisons with latest deep learning MIMO detection methods

Impact

  1. Academic Contribution: Provides new insights for Transformer applications in structured signal processing problems
  2. Practical Value: Offers interpretable framework for next-generation deep learning MIMO detectors
  3. Reproducibility: Sufficient technical detail facilitates reproduction and extension

Applicable Scenarios

  1. Small to Medium-Scale MIMO Systems: Clear performance advantages
  2. Receiver Systems Requiring Soft Information Exchange: SISO interface provides flexibility
  3. Applications Requiring Interpretability: Structured design facilitates understanding and debugging
  4. Research Prototype Systems: Provides foundational framework for further algorithm development

References

The paper cites important literature in MIMO detection, message-passing algorithms, deep learning, and Transformers, particularly:

  • Foundational literature on AMP series algorithms 1-3
  • Representative works on deep unfolding methods 4-6
  • Original Transformer architecture papers 7
  • Related Transformer-based communication system works 8-11

Overall Assessment: This is a technically innovative paper that successfully combines Transformer architecture with MIMO detection's factor graph structure, proposing the SGT method with solid theoretical foundation and practical value. While there remains room for improvement in computational efficiency and performance gain magnitude, it provides valuable exploration of deep learning applications in structured signal processing problems.