2025-11-16T20:04:19.781760

In-Context Learning for Non-Stationary MIMO Equalization

Jiang, Qin, Zhu

Channel equalization is fundamental for mitigating distortions such as frequency-selective fading and inter-symbol interference. Unlike standard supervised learning approaches that require costly retraining or fine-tuning for each new task, in-context learning (ICL) adapts to new channels at inference time with only a few examples. However, existing ICL-based equalizers are primarily developed for and evaluated on static channels within the context window. Indeed, to our knowledge, prior principled analyses and theoretical studies of ICL focus exclusively on the stationary setting, where the function remains fixed within the context. In this paper, we investigate the ability of ICL to address non-stationary problems through the lens of time-varying channel equalization. We employ a principled framework for designing efficient attention mechanisms with improved adaptivity in non-stationary tasks, leveraging algorithms from adaptive signal processing to guide better designs. For example, new attention variants can be derived from the Least Mean Square (LMS) adaptive algorithm, a Least Root Mean Square (LRMS) formulation for enhanced robustness, or multi-step gradient updates for improved long-term tracking. Experimental results demonstrate that ICL holds strong promise for non-stationary MIMO equalization, and that attention mechanisms inspired by classical adaptive algorithms can substantially enhance adaptability and performance in dynamic environments. Our findings may provide critical insights for developing next-generation wireless foundation models with stronger adaptability and robustness.

academic

In-Context Learning for Non-Stationary MIMO Equalization

Basic Information

Paper ID: 2510.08711
Title: In-Context Learning for Non-Stationary MIMO Equalization
Authors: Jiachen Jiang¹, Zhen Qin²³⁴, Zhihui Zhu¹
- ¹Department of Computer Science and Engineering, The Ohio State University
- ²³⁴Institute for Computational Discovery and Engineering, Department of Electrical Engineering and Computer Science, Department of Statistics, University of Michigan
Classification: cs.LG cs.AI
Submission Date: October 9, 2025 to arXiv
Paper Link: https://arxiv.org/abs/2510.08711

Abstract

Channel equalization is a fundamental technique for mitigating distortions such as frequency-selective fading and inter-symbol interference. Unlike standard supervised learning methods that require expensive retraining or fine-tuning for each new task, in-context learning (ICL) enables adaptation to new channels at inference time using only a few examples. However, existing ICL-based equalizers have been primarily developed and evaluated for static channels within the context window. To the authors' knowledge, prior principled analyses and theoretical studies of ICL have focused exclusively on stationary settings where the function remains fixed within the context. This paper investigates ICL's capability to address non-stationary problems through the lens of time-varying channel equalization. The authors employ a principled framework to design efficient attention mechanisms with improved adaptability, leveraging adaptive signal processing algorithms to guide better design choices.

Research Background and Motivation

Problem Definition

Channel equalization is a core technology in wireless communication systems for compensating channel-induced distortions, such as frequency-selective fading and inter-symbol interference. In time-varying channel environments, the channel matrix evolves dynamically and is typically only partially observable, requiring the equalizer to continuously adapt based on limited or noisy observations.

Limitations of Existing Methods

Traditional Methods: Zero-forcing (ZF) equalization, linear minimum mean square error (LMMSE) equalizers, and adaptive equalizers require precise channel knowledge
Learning Methods: Deep learning, meta-learning, and reinforcement learning approaches typically require training independent models for each task or involve additional parameter updates
Existing ICL Methods: Primarily assume static channels within the context window, use standard softmax attention, and may hinder capturing rapid channel variations and temporal correlations

Research Motivation

The paper addresses two core questions:

Can ICL not only identify tasks from context but also track time-varying changes in tasks?
In non-stationary settings, is softmax attention optimal, or can new attention mechanism variants be developed to enhance adaptability?

Core Contributions

Extended ICL Framework: Extends ICL from function classes to time-varying function classes, instantiated for channel equalization problems
Novel Attention Mechanisms: Proposes a framework for designing attention mechanisms based on classical adaptive signal processing algorithms
Three Attention Variants:
- LMS Attention: Based on the least mean square (LMS) adaptive algorithm
- Multi-LMS Attention: Multi-step update strategy to capture long-term dynamics
- LRMS Attention: Based on the least root mean square (LRMS) formulation for enhanced robustness
Theoretical Connections: Establishes principled connections between LMS-inspired updates and the DeltaNet attention mechanism

Methodology Details

Task Definition

Given a set of prior input-output pairs (context C = {(xᵢ,yᵢ)}ᴷᵢ₌₁), the objective is to infer the transmitted signal xₖ₊₁ from new received observation yₖ₊₁ without explicit knowledge of the underlying channel.

Channel Model

Employs a time-varying m₁×m₂ MIMO autoregressive model:

Hᵢ = ρHᵢ₋₁ + √(1-ρ²)Wᵢ, i = 2,...,K

where:

ρ ∈ [0,1): Memory factor controlling the channel time-variation rate
Hᵢ ∈ ℂᵐ²ˣᵐ¹: Complex-valued channel matrix
Wᵢ ~ CN(0,σ²ᵨI): Additive noise matrix

Discrete-time MIMO system model:

yᵢ = Qᵦ(Hᵢxᵢ + eᵢ), i = 1,...,K

Adaptive Attention Mechanism Design

1. LMS Attention

After removing the softmax function, the output becomes oᵢ = Sᵢqᵢ, where the state matrix Sᵢ is updated by solving a test-time regression problem:

Sᵢ ≈ argmin_{S∈ℝᵈˣᵈ} L(S) = 1/2 Σⱼ₌₁ᶦ ||vⱼ - Skⱼ||₂²

Updated via one-step gradient descent:

Sᵢ = Sᵢ₋₁ - βᵢ(Sᵢ₋₁kᵢ - vᵢ)kᵢᵀ

2. Multi-LMS Attention

To improve adaptation speed and stability, a closed-form M-step extension is proposed:

Sᵢ = Sᵢ₋₁ - [1-(1-βᵢ||kᵢ||₂²)ᴹ]/||kᵢ||₂² (Sᵢ₋₁kᵢ - vᵢ)kᵢᵀ

3. LRMS Attention

Employs root mean square loss for enhanced robustness:

L(S) = 1/2 Σⱼ₌₁ᶦ ||vⱼ - Skⱼ||₂

Corresponding recursive form:

Sᵢ = Sᵢ₋₁ - βᵢ [(Sᵢ₋₁kᵢ - vᵢ)/||Sᵢ₋₁kᵢ - vᵢ||₂] kᵢᵀ

Technical Innovations

Theoretical Foundation: Establishes theoretical connections between classical adaptive filtering and modern attention mechanisms
Computational Efficiency: LMS attention avoids the computational overhead of softmax
Robustness Design: LRMS adaptively down-weights unreliable updates through normalization
Long-term Tracking: Multi-LMS improves long-term channel dynamics tracking through multi-step updates

Experimental Setup

Dataset

Model Architecture: Two-layer GPT-2 transformer (embedding dimension 64, 4 attention heads per layer)
Channel Configuration: 2×2 time-varying MIMO system
Input Signal: Normalized QPSK constellation
Quantization: b-bit uniform quantizer with range -4,4
Training Set Size: 8192 pre-trained channels
Context Length: K = 20

Evaluation Metrics

Mean squared error (MSE):

MSE(θ) = E[||fθ(C,yₖ₊₁) - xₖ₊₁||²]

Experimental Parameters

Memory factor ρ: Uniformly sampled from [0.9,1)
Signal-to-noise ratio (SNR): Sampled from 0,30 dB
Quantization bits b: Sampled from integer range 1,6
Channel variation noise level: σᵨ = 0.1
Training: Adam optimizer, 50,000 steps, batch size 128

Comparison Methods

LMMSE equalizer (theoretical baseline)
ICL equalizer with softmax attention
ICL equalizer with LMS attention

Experimental Results

Main Results

From the experimental results in Figure 1:

Overall Performance: ICL equalizers outperform LMMSE across all settings
Attention Mechanism Comparison: LMS attention performs comparably or better than softmax attention
Parameter Sensitivity:
- Increasing memory factor ρ, SNR, or quantization bits consistently reduces estimation error
- LMS attention not only reduces computational burden but also maintains or improves accuracy

Ablation Studies

Multi-LMS vs Single-step LMS (Figure 2a)

Increasing the number of steps M generally improves performance
When M is too large, the model may overfit to current noisy observations, leading to performance degradation

LRMS vs LMS (Figure 2b)

Under low quantization bits (b=1), LRMS attention outperforms LMS attention
The LRMS mechanism effectively mitigates the effects of outliers and severe quantization noise

Experimental Findings

Computational Advantages: LMS attention avoids the computational overhead of nonlinear softmax functions
Robustness: LRMS exhibits more stable performance in noisy environments
Adaptability: Multi-step update strategies better capture long-term channel dynamics
Practicality: The proposed methods significantly enhance adaptability and performance in dynamic environments

ICL Theoretical Research

Existing ICL theoretical analyses primarily focus on stationary settings where the function remains fixed within the context. This paper is the first to extend to non-stationary scenarios.

Channel Equalization Methods

Classical Methods: ZF, LMMSE, adaptive equalizers, decision feedback equalizers, etc.
Machine Learning Methods: Deep learning, meta-learning, reinforcement learning, graph neural networks, etc.
ICL Methods: Recently emerging transformer-based sequence model equalizers

Attention Mechanism Design

The paper draws on research connecting transformers with Kalman filters, test-time regression, and state-space models.

Conclusions and Discussion

Main Conclusions

ICL can effectively handle non-stationary MIMO equalization tasks
Attention mechanisms inspired by classical adaptive algorithms significantly enhance adaptability and performance in dynamic environments
Establishes a theoretical bridge between adaptive signal processing and modern attention mechanisms

Limitations

Experimental Scale: Validation only on 2×2 MIMO systems; performance on larger-scale systems remains to be verified
Channel Model: Employs a specific autoregressive channel model; applicability to other channel models requires further investigation
Theoretical Analysis: Lacks theoretical guarantees on convergence and generalization capability for non-stationary ICL

Future Directions

Develop next-generation wireless foundation models with stronger adaptability and robustness
Extend to more complex channel environments and larger-scale MIMO systems
Provide theoretical analysis frameworks for non-stationary ICL

In-Depth Evaluation

Strengths

Strong Innovation: First to extend ICL to non-stationary settings, filling a theoretical gap
Principled Methodology: Designs attention mechanisms based on classical adaptive algorithms with solid theoretical foundations
High Practical Value: Addresses important problems in practical wireless communications
Comprehensive Experiments: Covers multiple parameter settings and comparison methods
Clear Presentation: Accurate technical descriptions and rigorous mathematical derivations

Weaknesses

Limited Experimental Scale: Validation only on small-scale MIMO systems
Insufficient Theoretical Analysis: Lacks theoretical guarantees on convergence and generalization capability
Limited Comparison Methods: Lacks comparison with other advanced adaptive equalization methods
Practical Deployment Considerations: Does not address complexities and constraints in real systems

Impact

Academic Contribution: Opens new directions for ICL theoretical research
Practical Value: Provides new insights for wireless communication system design
Cross-domain Impact: Bridges machine learning and signal processing fields
Reproducibility: Provides detailed experimental settings and implementation details

Applicable Scenarios

Time-varying Channel Environments: Mobile communications, satellite communications, and other dynamic environments
Resource-constrained Systems: Scenarios requiring rapid adaptation with limited computational resources
Multi-task Learning: Applications requiring rapid switching across different channel conditions
Edge Computing: Scenarios requiring real-time adaptation on edge devices

References

The paper cites 31 relevant references covering multiple domains including channel equalization, adaptive filtering, machine learning, and attention mechanisms, providing solid theoretical foundations and comprehensive background research.

Overall Assessment: This is a high-quality research paper with significant contributions in both theoretical innovation and practical value. The paper is the first to extend ICL to non-stationary settings, and the proposed methods have solid theoretical foundations and good experimental validation. While there is room for improvement in experimental scale and theoretical analysis, the work provides important insights and directions for related fields.