The application of deep learning in communication systems has attracted considerable attention in recent years. Forward-forward (FF) learning represents an efficient alternative to backpropagation (BP), the standard training procedure for neural networks. The FF learning approach offers numerous advantages, including: no requirement for differentiable communication channels, independence from global availability of partial derivatives, thereby enabling energy-efficient implementations. This study designs end-to-end learning autoencoders using the FF algorithm and provides numerical evaluation of their performance over additive white Gaussian noise (AWGN) and Rayleigh block fading channels. The research demonstrates competitive performance with BP-trained systems in joint source-channel coding scenarios and applications with fixed non-differentiable modulation stages. Furthermore, it provides in-depth insights into FF network design principles, training convergence behavior, and significant memory and processing time savings compared to BP methods.
The traditional backpropagation algorithm presents three major challenges in communication systems:
Deploying deep learning methods in communication systems faces practical challenges, particularly on resource-constrained edge devices. The limitations of traditional BP algorithms hinder efficient neural network implementation in practical communication systems.
The FF algorithm possesses the following advantages, making it particularly suitable for communication systems:
Given a message , the autoencoder must:
As shown in Figure 1, the FF autoencoder comprises:
The key innovation of the FF algorithm is the design of contrastive input data:
Where denotes the one-hot encoding of message m, and represents concatenation.
Layer-wise optimization is based on the "goodness" metric , with loss function defined as:
\zeta(-(g_i - \tau_i)) & \text{positive samples} \\ \zeta(g_i - \tau_i) & \text{negative samples} \end{cases}$$ Where $\zeta(x) = \log(1 + e^x)$ is the softplus function, and $\tau_i$ is the threshold. #### Classifier Training An independent classifier $c_\kappa(\cdot)$ learns to map decoder activities to original messages, trained using cross-entropy loss. ### Technical Innovations 1. **No Global Gradient Requirement**: Each layer optimizes independently, breaking backward locking 2. **Handling Non-Differentiable Operations**: Naturally supports non-differentiable operations such as quantization 3. **Contrastive Learning Mechanism**: Effectively learns representations through positive-negative sample contrasts 4. **Decoupled Classifier**: Separates representation learning from classification tasks ## Experimental Setup ### Channel Models Real-valued Rayleigh block fading (RBF) channel is considered: $$Y_i = HX_i + N_i$$ Where: - $N_i \sim \mathcal{N}(0, \sigma^2)$, $\sigma^2 = (2RE_b/N_0)^{-1}$ - $H$ follows Rayleigh distribution (fading coefficient magnitude) - $E_b/N_0$ is the energy per bit to noise power spectral density ratio (SNR) ### Experimental Parameters - **Code Rate**: $R = k/n = 4/7$ - **Training SNR**: $E_b/N_0 = 5$ dB - **Network Structure**: Optimal configuration is $L = K = 4$, $W = 80$ ### Comparison Methods 1. **BP Autoencoder**: Classical backpropagation training 2. **BP-RL Autoencoder**: Model-free training based on reinforcement learning 3. **FF Autoencoder**: Proposed forward-forward training ### Evaluation Metrics - **Block Error Rate (BLER)**: $P_e = \Pr(\hat{m} \neq m)$ - **Convergence Speed**: Training iterations required to achieve target performance - **Memory Usage**: Gradient storage requirements - **Processing Time**: Training time complexity ## Experimental Results ### Main Results #### Joint Source-Channel Coding Scenario In autoencoders with continuous outputs (Figure 2): - **AWGN Channel**: FF performance approaches BP and BP-RL, with approximately 1 dB performance gap in high SNR regions - **RBF Channel**: FF competes with other methods, demonstrating robustness to channel perturbations #### Quantized Encoder Output Scenario Under BPSK quantization (Figure 3): - **FF Algorithm Advantages Evident**: Maintains original performance while BP and BP-RL show significant degradation - **RBF Channel**: FF surpasses BP methods, with BP-RL nearly closing the gap - Demonstrates insufficiency of STE approximation ### Network Capacity Analysis Table I shows BLER performance for different network scales: - FF networks require wider layers ($W=80$ vs $W=16$) to achieve good performance - Encoder complexity is more critical than decoder complexity - Optimal configuration: $L=K=4$, $W=80$ ### Convergence Behavior Analysis Figure 4 presents training convergence curves: - **Continuous Encoder**: FF convergence speed comparable to BP, significantly faster than BP-RL - **Quantized Encoder**: FF reaches target loss faster, demonstrating advantages for non-differentiable operations ### Hardware Complexity Analysis #### Processing Time Savings For N-layer networks: - **BP Algorithm**: Requires 2N time units (forward + backward) - **FF Algorithm**: Requires only N+1 time units #### Memory Savings - **BP Network**: Requires storing gradients for 791 parameters - **FF Network**: No gradient storage needed, direct computation and consumption ## Related Work ### Forward Learning Algorithms 1. **Hebbian Learning**: Based on neuroplasticity rules, requires no feedback signals 2. **Sigprop Algorithm**: Parallel signal propagation learning, requires separated representations of data and labels 3. **FF Algorithm**: Layer-wise training through two forward passes and goodness metrics ### Deep Learning in Communication Systems 1. **End-to-End Learning**: Direct optimization of communication system performance 2. **Reinforcement Learning Methods**: Handling non-differentiable channels 3. **Generative Models**: Modeling complex channel characteristics ## Conclusions and Discussion ### Main Conclusions 1. **FF Autoencoders are Competitive**: Performance approaches or exceeds BP methods under various channel conditions 2. **Clear Advantages in Non-Differentiable Scenarios**: Superior performance in quantization and similar scenarios 3. **Hardware Implementation Friendly**: Significant memory and time savings 4. **Good Convergence Performance**: Training speed comparable to or faster than BP ### Limitations 1. **Network Capacity Requirements**: Requires larger networks to achieve comparable performance 2. **Hyperparameter Sensitivity**: Training process sensitive to hyperparameter settings 3. **High SNR Performance Gap**: Slightly reduced performance in low-noise environments 4. **Short Code Length Limitation**: Current experiments only consider short code length scenarios ### Future Directions 1. **Complex Channel Models**: Extension to more complex non-differentiable channels 2. **Algorithm Improvements**: More sophisticated loss function design and layer cooperation techniques 3. **Long Code Length Extension**: Extension to longer code lengths through concatenated code constructions 4. **Hardware Implementation**: Validation through actual analog hardware implementations ## In-Depth Evaluation ### Strengths 1. **Strong Method Innovation**: First application of FF algorithm to communication systems, addressing key practical deployment challenges 2. **Comprehensive Experimental Design**: Covers multiple channel models and application scenarios with thorough comparative methods 3. **In-Depth Theoretical Analysis**: Provides quantitative analysis of network design principles and hardware complexity 4. **High Practical Value**: Offers feasible deep learning solutions for low-power communication devices ### Weaknesses 1. **Performance Gaps**: Performance gaps with BP methods persist in certain scenarios 2. **Code Length Limitations**: Validation only at short code lengths (k=4, n=7); longer code lengths needed for practical applications 3. **Insufficient Hyperparameter Search**: Acknowledges lack of extensive hyperparameter search, potentially affecting performance evaluation 4. **Lack of Theoretical Analysis**: Missing theoretical guarantees for FF algorithm convergence and optimality ### Impact 1. **Academic Contribution**: Provides new training paradigm for deep learning in communication systems 2. **Practical Value**: Offers feasible neural coding solutions for resource-constrained devices 3. **Inspirational Significance**: May promote application of forward learning algorithms in communication domain 4. **Reproducibility**: Provides detailed hyperparameter settings facilitating reproduction ### Applicable Scenarios 1. **Edge Computing Devices**: Communication devices with limited memory and computational resources 2. **Non-Differentiable Systems**: Communication systems with quantization, modulation, and other non-differentiable operations 3. **Low-Power Applications**: Energy-sensitive Internet of Things and sensor networks 4. **Real-Time Communications**: Dynamic systems requiring rapid channel adaptation ## References 1. Hinton, G. "The forward-forward algorithm: Some preliminary investigations." arXiv:2212.13345 (2022) 2. O'Shea, T. & Hoydis, J. "An introduction to deep learning for the physical layer." IEEE Trans. Cogn. Commun. Netw. 3.4 (2017): 563-575 3. Aoudia, F. A. & Hoydis, J. "Model-free training of end-to-end communication systems." IEEE J. Sel. Areas Commun. 37.11 (2019): 2503-2516 --- **Summary**: This paper makes significant contributions to deep learning in communication systems by introducing the FF algorithm to address key challenges in practical deployment of traditional BP methods. While there remains room for improvement in certain performance metrics, its advantages in non-differentiable scenarios and hardware-friendly characteristics provide important practical value and academic significance.