2025-11-16T20:04:19.781760

In-Context Learning for Non-Stationary MIMO Equalization

Jiang, Qin, Zhu
Channel equalization is fundamental for mitigating distortions such as frequency-selective fading and inter-symbol interference. Unlike standard supervised learning approaches that require costly retraining or fine-tuning for each new task, in-context learning (ICL) adapts to new channels at inference time with only a few examples. However, existing ICL-based equalizers are primarily developed for and evaluated on static channels within the context window. Indeed, to our knowledge, prior principled analyses and theoretical studies of ICL focus exclusively on the stationary setting, where the function remains fixed within the context. In this paper, we investigate the ability of ICL to address non-stationary problems through the lens of time-varying channel equalization. We employ a principled framework for designing efficient attention mechanisms with improved adaptivity in non-stationary tasks, leveraging algorithms from adaptive signal processing to guide better designs. For example, new attention variants can be derived from the Least Mean Square (LMS) adaptive algorithm, a Least Root Mean Square (LRMS) formulation for enhanced robustness, or multi-step gradient updates for improved long-term tracking. Experimental results demonstrate that ICL holds strong promise for non-stationary MIMO equalization, and that attention mechanisms inspired by classical adaptive algorithms can substantially enhance adaptability and performance in dynamic environments. Our findings may provide critical insights for developing next-generation wireless foundation models with stronger adaptability and robustness.
academic

In-Context Learning for Non-Stationary MIMO Equalization

Basic Information

  • Paper ID: 2510.08711
  • Title: In-Context Learning for Non-Stationary MIMO Equalization
  • Authors: Jiachen Jiang¹, Zhen Qin²³⁴, Zhihui Zhu¹
    • ¹Department of Computer Science and Engineering, The Ohio State University
    • ²³⁴Institute for Computational Discovery and Engineering, Department of Electrical Engineering and Computer Science, Department of Statistics, University of Michigan
  • Classification: cs.LG cs.AI
  • Submission Date: October 9, 2025 to arXiv
  • Paper Link: https://arxiv.org/abs/2510.08711

Abstract

Channel equalization is a fundamental technique for mitigating distortions such as frequency-selective fading and inter-symbol interference. Unlike standard supervised learning methods that require expensive retraining or fine-tuning for each new task, in-context learning (ICL) enables adaptation to new channels at inference time using only a few examples. However, existing ICL-based equalizers have been primarily developed and evaluated for static channels within the context window. To the authors' knowledge, prior principled analyses and theoretical studies of ICL have focused exclusively on stationary settings where the function remains fixed within the context. This paper investigates ICL's capability to address non-stationary problems through the lens of time-varying channel equalization. The authors employ a principled framework to design efficient attention mechanisms with improved adaptability, leveraging adaptive signal processing algorithms to guide better design choices.

Research Background and Motivation

Problem Definition

Channel equalization is a core technology in wireless communication systems for compensating channel-induced distortions, such as frequency-selective fading and inter-symbol interference. In time-varying channel environments, the channel matrix evolves dynamically and is typically only partially observable, requiring the equalizer to continuously adapt based on limited or noisy observations.

Limitations of Existing Methods

  1. Traditional Methods: Zero-forcing (ZF) equalization, linear minimum mean square error (LMMSE) equalizers, and adaptive equalizers require precise channel knowledge
  2. Learning Methods: Deep learning, meta-learning, and reinforcement learning approaches typically require training independent models for each task or involve additional parameter updates
  3. Existing ICL Methods: Primarily assume static channels within the context window, use standard softmax attention, and may hinder capturing rapid channel variations and temporal correlations

Research Motivation

The paper addresses two core questions:

  1. Can ICL not only identify tasks from context but also track time-varying changes in tasks?
  2. In non-stationary settings, is softmax attention optimal, or can new attention mechanism variants be developed to enhance adaptability?

Core Contributions

  1. Extended ICL Framework: Extends ICL from function classes to time-varying function classes, instantiated for channel equalization problems
  2. Novel Attention Mechanisms: Proposes a framework for designing attention mechanisms based on classical adaptive signal processing algorithms
  3. Three Attention Variants:
    • LMS Attention: Based on the least mean square (LMS) adaptive algorithm
    • Multi-LMS Attention: Multi-step update strategy to capture long-term dynamics
    • LRMS Attention: Based on the least root mean square (LRMS) formulation for enhanced robustness
  4. Theoretical Connections: Establishes principled connections between LMS-inspired updates and the DeltaNet attention mechanism

Methodology Details

Task Definition

Given a set of prior input-output pairs (context C = {(xᵢ,yᵢ)}ᴷᵢ₌₁), the objective is to infer the transmitted signal xₖ₊₁ from new received observation yₖ₊₁ without explicit knowledge of the underlying channel.

Channel Model

Employs a time-varying m₁×m₂ MIMO autoregressive model:

Hᵢ = ρHᵢ₋₁ + √(1-ρ²)Wᵢ, i = 2,...,K

where:

  • ρ ∈ [0,1): Memory factor controlling the channel time-variation rate
  • Hᵢ ∈ ℂᵐ²ˣᵐ¹: Complex-valued channel matrix
  • Wᵢ ~ CN(0,σ²ᵨI): Additive noise matrix

Discrete-time MIMO system model:

yᵢ = Qᵦ(Hᵢxᵢ + eᵢ), i = 1,...,K

Adaptive Attention Mechanism Design

1. LMS Attention

After removing the softmax function, the output becomes oᵢ = Sᵢqᵢ, where the state matrix Sᵢ is updated by solving a test-time regression problem:

Sᵢ ≈ argmin_{S∈ℝᵈˣᵈ} L(S) = 1/2 Σⱼ₌₁ᶦ ||vⱼ - Skⱼ||₂²

Updated via one-step gradient descent:

Sᵢ = Sᵢ₋₁ - βᵢ(Sᵢ₋₁kᵢ - vᵢ)kᵢᵀ

2. Multi-LMS Attention

To improve adaptation speed and stability, a closed-form M-step extension is proposed:

Sᵢ = Sᵢ₋₁ - [1-(1-βᵢ||kᵢ||₂²)ᴹ]/||kᵢ||₂² (Sᵢ₋₁kᵢ - vᵢ)kᵢᵀ

3. LRMS Attention

Employs root mean square loss for enhanced robustness:

L(S) = 1/2 Σⱼ₌₁ᶦ ||vⱼ - Skⱼ||₂

Corresponding recursive form:

Sᵢ = Sᵢ₋₁ - βᵢ [(Sᵢ₋₁kᵢ - vᵢ)/||Sᵢ₋₁kᵢ - vᵢ||₂] kᵢᵀ

Technical Innovations

  1. Theoretical Foundation: Establishes theoretical connections between classical adaptive filtering and modern attention mechanisms
  2. Computational Efficiency: LMS attention avoids the computational overhead of softmax
  3. Robustness Design: LRMS adaptively down-weights unreliable updates through normalization
  4. Long-term Tracking: Multi-LMS improves long-term channel dynamics tracking through multi-step updates

Experimental Setup

Dataset

  • Model Architecture: Two-layer GPT-2 transformer (embedding dimension 64, 4 attention heads per layer)
  • Channel Configuration: 2×2 time-varying MIMO system
  • Input Signal: Normalized QPSK constellation
  • Quantization: b-bit uniform quantizer with range -4,4
  • Training Set Size: 8192 pre-trained channels
  • Context Length: K = 20

Evaluation Metrics

Mean squared error (MSE):

MSE(θ) = E[||fθ(C,yₖ₊₁) - xₖ₊₁||²]

Experimental Parameters

  • Memory factor ρ: Uniformly sampled from [0.9,1)
  • Signal-to-noise ratio (SNR): Sampled from 0,30 dB
  • Quantization bits b: Sampled from integer range 1,6
  • Channel variation noise level: σᵨ = 0.1
  • Training: Adam optimizer, 50,000 steps, batch size 128

Comparison Methods

  1. LMMSE equalizer (theoretical baseline)
  2. ICL equalizer with softmax attention
  3. ICL equalizer with LMS attention

Experimental Results

Main Results

From the experimental results in Figure 1:

  1. Overall Performance: ICL equalizers outperform LMMSE across all settings
  2. Attention Mechanism Comparison: LMS attention performs comparably or better than softmax attention
  3. Parameter Sensitivity:
    • Increasing memory factor ρ, SNR, or quantization bits consistently reduces estimation error
    • LMS attention not only reduces computational burden but also maintains or improves accuracy

Ablation Studies

Multi-LMS vs Single-step LMS (Figure 2a)

  • Increasing the number of steps M generally improves performance
  • When M is too large, the model may overfit to current noisy observations, leading to performance degradation

LRMS vs LMS (Figure 2b)

  • Under low quantization bits (b=1), LRMS attention outperforms LMS attention
  • The LRMS mechanism effectively mitigates the effects of outliers and severe quantization noise

Experimental Findings

  1. Computational Advantages: LMS attention avoids the computational overhead of nonlinear softmax functions
  2. Robustness: LRMS exhibits more stable performance in noisy environments
  3. Adaptability: Multi-step update strategies better capture long-term channel dynamics
  4. Practicality: The proposed methods significantly enhance adaptability and performance in dynamic environments

ICL Theoretical Research

Existing ICL theoretical analyses primarily focus on stationary settings where the function remains fixed within the context. This paper is the first to extend to non-stationary scenarios.

Channel Equalization Methods

  1. Classical Methods: ZF, LMMSE, adaptive equalizers, decision feedback equalizers, etc.
  2. Machine Learning Methods: Deep learning, meta-learning, reinforcement learning, graph neural networks, etc.
  3. ICL Methods: Recently emerging transformer-based sequence model equalizers

Attention Mechanism Design

The paper draws on research connecting transformers with Kalman filters, test-time regression, and state-space models.

Conclusions and Discussion

Main Conclusions

  1. ICL can effectively handle non-stationary MIMO equalization tasks
  2. Attention mechanisms inspired by classical adaptive algorithms significantly enhance adaptability and performance in dynamic environments
  3. Establishes a theoretical bridge between adaptive signal processing and modern attention mechanisms

Limitations

  1. Experimental Scale: Validation only on 2×2 MIMO systems; performance on larger-scale systems remains to be verified
  2. Channel Model: Employs a specific autoregressive channel model; applicability to other channel models requires further investigation
  3. Theoretical Analysis: Lacks theoretical guarantees on convergence and generalization capability for non-stationary ICL

Future Directions

  1. Develop next-generation wireless foundation models with stronger adaptability and robustness
  2. Extend to more complex channel environments and larger-scale MIMO systems
  3. Provide theoretical analysis frameworks for non-stationary ICL

In-Depth Evaluation

Strengths

  1. Strong Innovation: First to extend ICL to non-stationary settings, filling a theoretical gap
  2. Principled Methodology: Designs attention mechanisms based on classical adaptive algorithms with solid theoretical foundations
  3. High Practical Value: Addresses important problems in practical wireless communications
  4. Comprehensive Experiments: Covers multiple parameter settings and comparison methods
  5. Clear Presentation: Accurate technical descriptions and rigorous mathematical derivations

Weaknesses

  1. Limited Experimental Scale: Validation only on small-scale MIMO systems
  2. Insufficient Theoretical Analysis: Lacks theoretical guarantees on convergence and generalization capability
  3. Limited Comparison Methods: Lacks comparison with other advanced adaptive equalization methods
  4. Practical Deployment Considerations: Does not address complexities and constraints in real systems

Impact

  1. Academic Contribution: Opens new directions for ICL theoretical research
  2. Practical Value: Provides new insights for wireless communication system design
  3. Cross-domain Impact: Bridges machine learning and signal processing fields
  4. Reproducibility: Provides detailed experimental settings and implementation details

Applicable Scenarios

  1. Time-varying Channel Environments: Mobile communications, satellite communications, and other dynamic environments
  2. Resource-constrained Systems: Scenarios requiring rapid adaptation with limited computational resources
  3. Multi-task Learning: Applications requiring rapid switching across different channel conditions
  4. Edge Computing: Scenarios requiring real-time adaptation on edge devices

References

The paper cites 31 relevant references covering multiple domains including channel equalization, adaptive filtering, machine learning, and attention mechanisms, providing solid theoretical foundations and comprehensive background research.


Overall Assessment: This is a high-quality research paper with significant contributions in both theoretical innovation and practical value. The paper is the first to extend ICL to non-stationary settings, and the proposed methods have solid theoretical foundations and good experimental validation. While there is room for improvement in experimental scale and theoretical analysis, the work provides important insights and directions for related fields.