2025-11-12T05:10:09.967264

Soft Graph Transformer for MIMO Detection

Hong, Liu, Bian et al.

We propose the Soft Graph Transformer (SGT), a soft-input-soft-output neural architecture designed for MIMO detection. While Maximum Likelihood (ML) detection achieves optimal accuracy, its exponential complexity makes it infeasible in large systems, and conventional message-passing algorithms rely on asymptotic assumptions that often fail in finite dimensions. Recent Transformer-based detectors show strong performance but typically overlook the MIMO factor graph structure and cannot exploit prior soft information. SGT addresses these limitations by combining self-attention, which encodes contextual dependencies within symbol and constraint subgraphs, with graph-aware cross-attention, which performs structured message passing across subgraphs. Its soft-input interface allows the integration of auxiliary priors, producing effective soft outputs while maintaining computational efficiency. Experiments demonstrate that SGT achieves near-ML performance and offers a flexible and interpretable framework for receiver systems that leverage soft priors.

academic

Soft Graph Transformer for MIMO Detection

Basic Information

Paper ID: 2509.12694
Title: Soft Graph Transformer for MIMO Detection
Authors: Jiadong Hong¹, Lei Liu¹, Xinyu Bian², Wenjie Wang², Zhaoyang Zhang¹
Affiliations: ¹College of Information and Electronic Engineering, Zhejiang University, ²Theoretical Laboratory, Huawei Technologies Co., Ltd.
Categories: cs.LG cs.IT eess.SP math.IT
Publication Date: September 17, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2509.12694

Abstract

This paper proposes the Soft Graph Transformer (SGT), a soft-input soft-output neural architecture specifically designed for MIMO detection. While maximum likelihood (ML) detection achieves optimal accuracy, its exponential complexity is infeasible for large-scale systems, and traditional message-passing algorithms rely on asymptotic assumptions that often fail in finite-dimensional settings. Recent Transformer-based detectors show promise but typically overlook MIMO factor graph structure and cannot leverage prior soft information. SGT addresses these limitations by combining self-attention mechanisms (encoding symbol and constraint subgraph context dependencies) with graph-aware cross-attention mechanisms (performing structured message passing across subgraphs). Its soft-input interface enables integration of auxiliary priors while producing effective soft outputs while maintaining computational efficiency.

Research Background and Motivation

Problem Definition

MIMO systems, while fundamental to modern wireless communications providing high spectral efficiency and robust links, still present challenges in efficient symbol detection.

Limitations of Existing Methods

Maximum Likelihood Detection: Achieves optimal accuracy but has computational complexity O(M^Nt) (M is constellation size), making it infeasible for large-scale systems
Message-Passing Algorithms: Methods like AMP, OAMP, MAMP have lower complexity but rely on asymptotic assumptions, proving fragile in finite-dimensional settings
Deep Unfolding Methods: Approaches such as OAMP-Net and DetNet learn algorithm parameters from data but remain constrained by assumptions of the underlying algorithms
Existing Transformer Methods:
- RE-MIMO lacks explicit graph awareness
- Transformer-based MIMO uses QR decomposition with high cost and ignores factor graph structure

Research Motivation

Inspired by classical message-passing MIMO detection, this work aims to design a Transformer architecture that:

Exploits MIMO factor graph structure
Supports soft-input soft-output interfaces
Provides a principled approach unifying context encoding and message passing

Core Contributions

Proposes SGT Architecture: First MIMO detector unifying factor-graph-guided self-attention and cross-attention within an AMP-style framework
Graph-Aware Tokenization Method: Transforms the weighted dense factor graph of MIMO systems into a dual-subgraph representation suitable for Transformer processing
Soft-Input Soft-Output Interface: Naturally integrates external prior information from other receiver modules
Performance Improvements: Achieves near-ML detection accuracy in small-scale MIMO systems and demonstrates superior quadratic complexity growth in large-scale systems

Methodology Details

Task Definition

Inputs:

Received signal vector y ∈ R^(2Nr)
Channel matrix H ∈ R^(2Nr×2Nt)
Noise variance information
Optional prior soft information (LLR)

Outputs:

Bit-level posterior log-likelihood ratios (LLR) suitable for channel decoders

Constraints: Linear system model y = Hx + n, where n ~ N(0,Σ)

Model Architecture

1. Graph-Aware Tokenization

Decomposes the MIMO factor graph into two subgraphs:

Linear Constraint Tokens/Subgraph:

T_lin = {τ_j = (y_j, h_j, σ²_j) | j ∈ {1,...,2Nr}}

where h_j is the j-th row of H, encoding local likelihood constraints between received signals and transmitted symbols.

Symbol Tokens/Subgraph:

T_sym = {x_i^(l) | i ∈ {1,...,2Nt}}

Corresponds to variable nodes of transmitted symbols, serving as query embeddings interacting with constraint tokens via cross-attention.

2. Attention Mechanism Design

Self-Attention - Context Encoding: Provides robust context encoding within homogeneous token sets, ensuring consistency among similar entities:

t̃_j = ∑_{k=1}^N α_{jk} W^V t_k
α_{jk} = softmax((W^Q t_j)^T (W^K t_k) / √d_k)

Cross-Attention - Message Passing: Implements directed message passing between heterogeneous token types:

t̃_j = ∑_i α_{ij} W^V t_i
α_{ij} = softmax((W^Q t_j)^T (W^K t_i) / √d_k)

3. Soft-Input Soft-Output Interface

Soft-Input Embedding Module:

Symbol tokens: T_sym, dimension 2Nt, Nbits/2
Linear constraint tokens: T_lin, dimension 2Nr, 2Nt+2
Processed independently via dedicated FFN with positional encoding

Soft-Output Module:

Receives embedding representations: dimension 2Nt, d_model
Processed via FFN + Sigmoid activation
Produces final soft output: dimension 2Nt, Nbits/2

Technical Innovations

Structured Attention Design: Unlike CrossMPT, SGT combines self-attention and cross-attention tailored to MIMO's homogeneous subgraph characteristics
Information Preservation Advantage: Compared to QR-decomposition-based methods, graph-aware tokenization retains more symbol-level information
Unified Framework: Integrates AMP-inspired updates with Transformer architecture, achieving interpretable message passing

Experimental Setup

Datasets

Channel Model: Rayleigh fading channel with perfect CSI
Modulation: QPSK (Quadrature Phase Shift Keying)
System Configuration: 8×8, 8×16, 16×16 MIMO systems
Noise: Additive White Gaussian Noise

Evaluation Metrics

BER (Bit Error Rate): Bit error rate
Training Loss: Convergence analysis
Runtime: Computational efficiency assessment

Comparison Methods

Classical Methods: LMMSE, OAMP, Maximum Likelihood
Deep Learning Methods: OAMPNet2, DetNet
Transformer Methods: Transformer-based MIMO, RE-MIMO
Ablation Studies: Cross-attention-free version, tokenization-only version

Implementation Details

Model Dimension: d_model = 128
Network Layers: L = 8 layers
Training Parameters: Same learning rate, batch size, and training steps
Hardware Platform: RTX 4090 GPU

Experimental Results

Main Results

BER Performance Comparison:

In 8×8 MIMO systems, SGT significantly outperforms OAMPNet2 and Transformer-based MIMO
Maintains performance advantages in 8×16 and 16×16 systems
Approaches ML detection upper bound performance

Runtime Analysis (RTX 4090 GPU, 1000 samples):

Method	8×8	8×16	16×16
LMMSE	0.00679s	0.00718s	0.00742s
OAMP	0.02208s	0.02234s	0.02408s
OAMPNet2	0.03333s	0.03415s	0.03507s
Transformer-based MIMO	0.03844s	0.03924s	0.04028s
SGT (Proposed)	0.09351s	0.09464s	0.09498s

Ablation Studies

Role of Graph-Aware Tokenization:

Complete tokenization achieves lower final loss in small-scale systems (8×8)
Validates capability to preserve detailed symbol-level information
Requires cross-attention cooperation in large-scale systems

Contribution of Cross-Attention:

Enables faster convergence and superior final accuracy
Provides guidance similar to QR preprocessing but fully learnable
Alleviates training stagnation in large-scale systems

Complexity Analysis

Asymptotic Complexity Comparison:

Method	Complexity	Growth Trend
ML Detection	O(M^Nt)	Exponential
OAMP/OAMPNet	O(KNrNt²)	Cubic
Transformer-based MIMO	O(NrNt² + LNt²dmodel)	Cubic
SGT	L·O(Nr² + Nt² + NrNt)·dmodel	Quadratic

MIMO Detection Methods Development

Classical Methods: From linear detection (MMSE) to nonlinear detection (ML)
Message-Passing Algorithms: Evolution and limitations of AMP series algorithms
Deep Learning Methods: Evolution from DetNet to deep unfolding approaches

Transformer Applications in Communications

Channel Decoding: ECCT leverages LDPC Tanner graphs; CrossMPT simulates message passing via cross-attention
MIMO Detection: Contributions and limitations of RE-MIMO and Transformer-based MIMO

Positioning of This Work

SGT is the first MIMO detector explicitly integrating factor graph structure into Transformer architecture, unifying context encoding and message passing.

Conclusions and Discussion

Main Conclusions

SGT successfully combines Transformer's context modeling capability with factor graph's structured message passing
Achieves near-ML performance in small-scale MIMO systems while maintaining computational efficiency
Soft-input soft-output interface provides flexibility for integration with other receiver modules
Quadratic complexity growth offers better scalability for large-scale systems

Limitations

Computational Overhead: While complexity growth is superior, absolute runtime remains higher than traditional methods
Large-Scale Validation: Detection performance in ultra-large-scale MIMO settings requires further investigation
Theoretical Analysis: Lacks rigorous theoretical convergence analysis
Channel Adaptability: Primarily validated on Rayleigh fading channels; adaptability to other channel models needs exploration

Future Directions

Further optimize computational efficiency to reduce absolute runtime
Extend validation to larger-scale MIMO systems
Investigate robustness under different channel conditions
Joint optimization with other receiver components

In-Depth Evaluation

Strengths

Strong Innovation: First explicit integration of factor graph structure into Transformer with novel design
Solid Theoretical Foundation: Message passing inspired by AMP framework has solid theoretical support
Comprehensive Experiments: Includes detailed ablation studies and complexity analysis
High Practical Value: Soft-input soft-output interface enhances system integration flexibility
Clear Presentation: Technical details accurately described with intuitive figures

Weaknesses

Limited Performance Gains: Improvements over baselines are consistent but modest
Computational Efficiency: Actual runtime 2-3 times higher than traditional methods
Limited Validation Scope: Primarily validated on small-scale systems and specific channel conditions
Insufficient Theoretical Analysis: Lacks convergence and optimality guarantees
Incomplete Comparisons: Missing comparisons with latest deep learning MIMO detection methods

Impact

Academic Contribution: Provides new insights for Transformer applications in structured signal processing problems
Practical Value: Offers interpretable framework for next-generation deep learning MIMO detectors
Reproducibility: Sufficient technical detail facilitates reproduction and extension

Applicable Scenarios

Small to Medium-Scale MIMO Systems: Clear performance advantages
Receiver Systems Requiring Soft Information Exchange: SISO interface provides flexibility
Applications Requiring Interpretability: Structured design facilitates understanding and debugging
Research Prototype Systems: Provides foundational framework for further algorithm development

References

The paper cites important literature in MIMO detection, message-passing algorithms, deep learning, and Transformers, particularly:

Foundational literature on AMP series algorithms 1-3
Representative works on deep unfolding methods 4-6
Original Transformer architecture papers 7
Related Transformer-based communication system works 8-11

Overall Assessment: This is a technically innovative paper that successfully combines Transformer architecture with MIMO detection's factor graph structure, proposing the SGT method with solid theoretical foundation and practical value. While there remains room for improvement in computational efficiency and performance gain magnitude, it provides valuable exploration of deep learning applications in structured signal processing problems.