2025-11-20T07:28:15.314096

Forward-Forward Autoencoder Architectures for Energy-Efficient Wireless Communications

Seifert, GÃ¼nlÃ¼, Schaefer

The application of deep learning to the area of communications systems has been a growing field of interest in recent years. Forward-forward (FF) learning is an efficient alternative to the backpropagation (BP) algorithm, which is the typically used training procedure for neural networks. Among its several advantages, FF learning does not require the communication channel to be differentiable and does not rely on the global availability of partial derivatives, allowing for an energy-efficient implementation. In this work, we design end-to-end learned autoencoders using the FF algorithm and numerically evaluate their performance for the additive white Gaussian noise and Rayleigh block fading channels. We demonstrate their competitiveness with BP-trained systems in the case of joint coding and modulation, and in a scenario where a fixed, non-differentiable modulation stage is applied. Moreover, we provide further insights into the design principles of the FF network, its training convergence behavior, and significant memory and processing time savings compared to BP-based approaches.

academic

Forward-Forward Autoencoder Architectures for Energy-Efficient Wireless Communications

Basic Information

Paper ID: 2510.11418
Title: Forward-Forward Autoencoder Architectures for Energy-Efficient Wireless Communications
Authors: Daniel Seifert, Onur Günlü, Rafael F. Schaefer
Classification: cs.IT cs.LG math.IT
Publication Date: October 13, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.11418

Abstract

The application of deep learning in communication systems has attracted considerable attention in recent years. Forward-forward (FF) learning represents an efficient alternative to backpropagation (BP), the standard training procedure for neural networks. The FF learning approach offers numerous advantages, including: no requirement for differentiable communication channels, independence from global availability of partial derivatives, thereby enabling energy-efficient implementations. This study designs end-to-end learning autoencoders using the FF algorithm and provides numerical evaluation of their performance over additive white Gaussian noise (AWGN) and Rayleigh block fading channels. The research demonstrates competitive performance with BP-trained systems in joint source-channel coding scenarios and applications with fixed non-differentiable modulation stages. Furthermore, it provides in-depth insights into FF network design principles, training convergence behavior, and significant memory and processing time savings compared to BP methods.

Research Background and Motivation

1. Problems to be Addressed

The traditional backpropagation algorithm presents three major challenges in communication systems:

Differentiability Path Requirements: BP requires a fully differentiable path through the entire neural network, while actual channels are often non-differentiable
Low Memory and Energy Efficiency: Requires storing partial derivatives at each node, leading to high memory consumption and energy expenditure
Locking Mechanisms: Existence of backward locking where all layers must wait for gradient computations from subsequent layers

2. Problem Significance

Deploying deep learning methods in communication systems faces practical challenges, particularly on resource-constrained edge devices. The limitations of traditional BP algorithms hinder efficient neural network implementation in practical communication systems.

3. Limitations of Existing Methods

Reinforcement Learning Approaches: Require additional noise-free feedback links to estimate transmitter gradients
Generative Adversarial Networks/Diffusion Models: Though differentiable, exhibit high computational complexity
Straight-Through Estimators (STE): Performance degrades significantly in quantization scenarios

4. Research Motivation

The FF algorithm possesses the following advantages, making it particularly suitable for communication systems:

No requirement for differentiable channels
Enables fully analog, low-power circuit implementations
Allows pipelined training processes
Significantly reduces memory usage

Core Contributions

Proposed end-to-end autoencoder architectures based on the FF algorithm, specifically designed for wireless communication systems
Designed contrastive input data generation strategies, including construction methods for positive, negative, and neutral samples
Validated competitive performance over AWGN and Rayleigh block fading channels, particularly demonstrating advantages in non-differentiable scenarios
Provided in-depth analysis of network design principles, including the impact of network depth and width on performance
Quantified significant memory and processing time savings, demonstrating practical advantages of the FF algorithm

Methodology Details

Task Definition

Given a message $m \in \mathcal{M} = \{0, \ldots, 2^k-1\}$ , the autoencoder must:

Encode k-bit messages into n-dimensional codewords
Transmit through noisy channels
Correctly decode the original message at the receiver
Optimize to minimize block error rate (BLER)

Model Architecture

Overall Architecture Design

As shown in Figure 1, the FF autoencoder comprises:

Encoder: L fully connected layers with normalized/quantized outputs
Channel: AWGN or Rayleigh block fading channel
Decoder: K fully connected layers
Classifier: Single-layer classifier with softmax probability outputs

Contrastive Input Data Construction

The key innovation of the FF algorithm is the design of contrastive input data:

Positive Samples: $v = (1_m || 1_m)$ (true label duplicated)
Negative Samples: $v = (1_m || 1_{\bar{m}})$ (true label + random incorrect label)
Neutral Samples: $v = (1_m || 0)$ (for inference)

Where $1_m$ denotes the one-hot encoding of message m, and $||$ represents concatenation.

Training Algorithm

Layer-wise optimization is based on the "goodness" metric $g_i = ||a_i||_2^2$ , with loss function defined as:

\zeta(-(g_i - \tau_i)) & \text{positive samples} \\ \zeta(g_i - \tau_i) & \text{negative samples} \end{cases}$$ Where $\zeta(x) = \log(1 + e^x)$ is the softplus function, and $\tau_i$ is the threshold. #### Classifier Training An independent classifier $c_\kappa(\cdot)$ learns to map decoder activities to original messages, trained using cross-entropy loss. ### Technical Innovations 1. **No Global Gradient Requirement**: Each layer optimizes independently, breaking backward locking 2. **Handling Non-Differentiable Operations**: Naturally supports non-differentiable operations such as quantization 3. **Contrastive Learning Mechanism**: Effectively learns representations through positive-negative sample contrasts 4. **Decoupled Classifier**: Separates representation learning from classification tasks ## Experimental Setup ### Channel Models Real-valued Rayleigh block fading (RBF) channel is considered: $$Y_i = HX_i + N_i$$ Where: - $N_i \sim \mathcal{N}(0, \sigma^2)$, $\sigma^2 = (2RE_b/N_0)^{-1}$ - $H$ follows Rayleigh distribution (fading coefficient magnitude) - $E_b/N_0$ is the energy per bit to noise power spectral density ratio (SNR) ### Experimental Parameters - **Code Rate**: $R = k/n = 4/7$ - **Training SNR**: $E_b/N_0 = 5$ dB - **Network Structure**: Optimal configuration is $L = K = 4$, $W = 80$ ### Comparison Methods 1. **BP Autoencoder**: Classical backpropagation training 2. **BP-RL Autoencoder**: Model-free training based on reinforcement learning 3. **FF Autoencoder**: Proposed forward-forward training ### Evaluation Metrics - **Block Error Rate (BLER)**: $P_e = \Pr(\hat{m} \neq m)$ - **Convergence Speed**: Training iterations required to achieve target performance - **Memory Usage**: Gradient storage requirements - **Processing Time**: Training time complexity ## Experimental Results ### Main Results #### Joint Source-Channel Coding Scenario In autoencoders with continuous outputs (Figure 2): - **AWGN Channel**: FF performance approaches BP and BP-RL, with approximately 1 dB performance gap in high SNR regions - **RBF Channel**: FF competes with other methods, demonstrating robustness to channel perturbations #### Quantized Encoder Output Scenario Under BPSK quantization (Figure 3): - **FF Algorithm Advantages Evident**: Maintains original performance while BP and BP-RL show significant degradation - **RBF Channel**: FF surpasses BP methods, with BP-RL nearly closing the gap - Demonstrates insufficiency of STE approximation ### Network Capacity Analysis Table I shows BLER performance for different network scales: - FF networks require wider layers ($W=80$ vs $W=16$) to achieve good performance - Encoder complexity is more critical than decoder complexity - Optimal configuration: $L=K=4$, $W=80$ ### Convergence Behavior Analysis Figure 4 presents training convergence curves: - **Continuous Encoder**: FF convergence speed comparable to BP, significantly faster than BP-RL - **Quantized Encoder**: FF reaches target loss faster, demonstrating advantages for non-differentiable operations ### Hardware Complexity Analysis #### Processing Time Savings For N-layer networks: - **BP Algorithm**: Requires 2N time units (forward + backward) - **FF Algorithm**: Requires only N+1 time units #### Memory Savings - **BP Network**: Requires storing gradients for 791 parameters - **FF Network**: No gradient storage needed, direct computation and consumption ## Related Work ### Forward Learning Algorithms 1. **Hebbian Learning**: Based on neuroplasticity rules, requires no feedback signals 2. **Sigprop Algorithm**: Parallel signal propagation learning, requires separated representations of data and labels 3. **FF Algorithm**: Layer-wise training through two forward passes and goodness metrics ### Deep Learning in Communication Systems 1. **End-to-End Learning**: Direct optimization of communication system performance 2. **Reinforcement Learning Methods**: Handling non-differentiable channels 3. **Generative Models**: Modeling complex channel characteristics ## Conclusions and Discussion ### Main Conclusions 1. **FF Autoencoders are Competitive**: Performance approaches or exceeds BP methods under various channel conditions 2. **Clear Advantages in Non-Differentiable Scenarios**: Superior performance in quantization and similar scenarios 3. **Hardware Implementation Friendly**: Significant memory and time savings 4. **Good Convergence Performance**: Training speed comparable to or faster than BP ### Limitations 1. **Network Capacity Requirements**: Requires larger networks to achieve comparable performance 2. **Hyperparameter Sensitivity**: Training process sensitive to hyperparameter settings 3. **High SNR Performance Gap**: Slightly reduced performance in low-noise environments 4. **Short Code Length Limitation**: Current experiments only consider short code length scenarios ### Future Directions 1. **Complex Channel Models**: Extension to more complex non-differentiable channels 2. **Algorithm Improvements**: More sophisticated loss function design and layer cooperation techniques 3. **Long Code Length Extension**: Extension to longer code lengths through concatenated code constructions 4. **Hardware Implementation**: Validation through actual analog hardware implementations ## In-Depth Evaluation ### Strengths 1. **Strong Method Innovation**: First application of FF algorithm to communication systems, addressing key practical deployment challenges 2. **Comprehensive Experimental Design**: Covers multiple channel models and application scenarios with thorough comparative methods 3. **In-Depth Theoretical Analysis**: Provides quantitative analysis of network design principles and hardware complexity 4. **High Practical Value**: Offers feasible deep learning solutions for low-power communication devices ### Weaknesses 1. **Performance Gaps**: Performance gaps with BP methods persist in certain scenarios 2. **Code Length Limitations**: Validation only at short code lengths (k=4, n=7); longer code lengths needed for practical applications 3. **Insufficient Hyperparameter Search**: Acknowledges lack of extensive hyperparameter search, potentially affecting performance evaluation 4. **Lack of Theoretical Analysis**: Missing theoretical guarantees for FF algorithm convergence and optimality ### Impact 1. **Academic Contribution**: Provides new training paradigm for deep learning in communication systems 2. **Practical Value**: Offers feasible neural coding solutions for resource-constrained devices 3. **Inspirational Significance**: May promote application of forward learning algorithms in communication domain 4. **Reproducibility**: Provides detailed hyperparameter settings facilitating reproduction ### Applicable Scenarios 1. **Edge Computing Devices**: Communication devices with limited memory and computational resources 2. **Non-Differentiable Systems**: Communication systems with quantization, modulation, and other non-differentiable operations 3. **Low-Power Applications**: Energy-sensitive Internet of Things and sensor networks 4. **Real-Time Communications**: Dynamic systems requiring rapid channel adaptation ## References 1. Hinton, G. "The forward-forward algorithm: Some preliminary investigations." arXiv:2212.13345 (2022) 2. O'Shea, T. & Hoydis, J. "An introduction to deep learning for the physical layer." IEEE Trans. Cogn. Commun. Netw. 3.4 (2017): 563-575 3. Aoudia, F. A. & Hoydis, J. "Model-free training of end-to-end communication systems." IEEE J. Sel. Areas Commun. 37.11 (2019): 2503-2516 --- **Summary**: This paper makes significant contributions to deep learning in communication systems by introducing the FF algorithm to address key challenges in practical deployment of traditional BP methods. While there remains room for improvement in certain performance metrics, its advantages in non-differentiable scenarios and hardware-friendly characteristics provide important practical value and academic significance.