We propose the Soft Graph Transformer (SGT), a soft-input-soft-output neural architecture designed for MIMO detection. While Maximum Likelihood (ML) detection achieves optimal accuracy, its exponential complexity makes it infeasible in large systems, and conventional message-passing algorithms rely on asymptotic assumptions that often fail in finite dimensions. Recent Transformer-based detectors show strong performance but typically overlook the MIMO factor graph structure and cannot exploit prior soft information. SGT addresses these limitations by combining self-attention, which encodes contextual dependencies within symbol and constraint subgraphs, with graph-aware cross-attention, which performs structured message passing across subgraphs. Its soft-input interface allows the integration of auxiliary priors, producing effective soft outputs while maintaining computational efficiency. Experiments demonstrate that SGT achieves near-ML performance and offers a flexible and interpretable framework for receiver systems that leverage soft priors.
This paper proposes the Soft Graph Transformer (SGT), a soft-input soft-output neural architecture specifically designed for MIMO detection. While maximum likelihood (ML) detection achieves optimal accuracy, its exponential complexity is infeasible for large-scale systems, and traditional message-passing algorithms rely on asymptotic assumptions that often fail in finite-dimensional settings. Recent Transformer-based detectors show promise but typically overlook MIMO factor graph structure and cannot leverage prior soft information. SGT addresses these limitations by combining self-attention mechanisms (encoding symbol and constraint subgraph context dependencies) with graph-aware cross-attention mechanisms (performing structured message passing across subgraphs). Its soft-input interface enables integration of auxiliary priors while producing effective soft outputs while maintaining computational efficiency.
MIMO systems, while fundamental to modern wireless communications providing high spectral efficiency and robust links, still present challenges in efficient symbol detection.
Maximum Likelihood Detection: Achieves optimal accuracy but has computational complexity O(M^Nt) (M is constellation size), making it infeasible for large-scale systems
Message-Passing Algorithms: Methods like AMP, OAMP, MAMP have lower complexity but rely on asymptotic assumptions, proving fragile in finite-dimensional settings
Deep Unfolding Methods: Approaches such as OAMP-Net and DetNet learn algorithm parameters from data but remain constrained by assumptions of the underlying algorithms
Existing Transformer Methods:
RE-MIMO lacks explicit graph awareness
Transformer-based MIMO uses QR decomposition with high cost and ignores factor graph structure
Proposes SGT Architecture: First MIMO detector unifying factor-graph-guided self-attention and cross-attention within an AMP-style framework
Graph-Aware Tokenization Method: Transforms the weighted dense factor graph of MIMO systems into a dual-subgraph representation suitable for Transformer processing
Soft-Input Soft-Output Interface: Naturally integrates external prior information from other receiver modules
Performance Improvements: Achieves near-ML detection accuracy in small-scale MIMO systems and demonstrates superior quadratic complexity growth in large-scale systems
SGT is the first MIMO detector explicitly integrating factor graph structure into Transformer architecture, unifying context encoding and message passing.
The paper cites important literature in MIMO detection, message-passing algorithms, deep learning, and Transformers, particularly:
Foundational literature on AMP series algorithms 1-3
Representative works on deep unfolding methods 4-6
Original Transformer architecture papers 7
Related Transformer-based communication system works 8-11
Overall Assessment: This is a technically innovative paper that successfully combines Transformer architecture with MIMO detection's factor graph structure, proposing the SGT method with solid theoretical foundation and practical value. While there remains room for improvement in computational efficiency and performance gain magnitude, it provides valuable exploration of deep learning applications in structured signal processing problems.