2025-11-13T09:34:11.098712

Scaling Equilibrium Propagation to Deeper Neural Network Architectures

Elayedam, Srinivasan

Equilibrium propagation has been proposed as a biologically plausible alternative to the backpropagation algorithm. The local nature of gradient computations, combined with the use of convergent RNNs to reach equilibrium states, make this approach well-suited for implementation on neuromorphic hardware. However, previous studies on equilibrium propagation have been restricted to networks containing only dense layers or relatively small architectures with a few convolutional layers followed by a final dense layer. These networks have a significant gap in accuracy compared to similarly sized feedforward networks trained with backpropagation. In this work, we introduce the Hopfield-Resnet architecture, which incorporates residual (or skip) connections in Hopfield networks with clipped $\mathrm{ReLU}$ as the activation function. The proposed architectural enhancements enable the training of networks with nearly twice the number of layers reported in prior works. For example, Hopfield-Resnet13 achieves 93.92\% accuracy on CIFAR-10, which is $\approx$3.5\% higher than the previous best result and comparable to that provided by Resnet13 trained using backpropagation.

academic

Scaling Equilibrium Propagation to Deeper Neural Network Architectures

Basic Information

Paper ID: 2509.26003
Title: Scaling Equilibrium Propagation to Deeper Neural Network Architectures
Authors: Sankar Vinayak E P (IIT Madras), Gopalakrishnan Srinivasan (IIT Madras)
Categories: cs.NE (Neural and Evolutionary Computing), cs.LG (Machine Learning)
Publication Date: October 13, 2025 (arXiv v2)
Paper Link: https://arxiv.org/abs/2509.26003

Abstract

Equilibrium Propagation (EP) has been proposed as a biologically plausible alternative to backpropagation. The local nature of its gradient computation, combined with the use of converged RNNs to reach equilibrium states, makes this approach particularly suitable for implementation on neuromorphic hardware. However, previous research on equilibrium propagation has been limited to networks containing dense layers or relatively small architectures, which exhibit significant accuracy gaps compared to similarly-sized feedforward networks trained with backpropagation. This work introduces the Hopfield-Resnet architecture, which integrates residual connections within Hopfield networks and employs clipped ReLU as the activation function. The proposed architectural enhancements enable networks to train with nearly twice the depth reported in previous work. For example, Hopfield-Resnet13 achieves 93.92% accuracy on CIFAR-10, approximately 3.5% higher than the previous best result and comparable to Resnet13 trained with backpropagation.

Research Background and Motivation

Problem Definition

The core problem addressed in this research is the scalability of the Equilibrium Propagation (EP) method for deep neural networks, manifested specifically as:

Depth Limitation: Existing EP methods can only effectively train shallow networks (≤6 layers)
Performance Gap: Networks trained with EP exhibit significant performance degradation compared to same-scale networks trained with backpropagation
Biological Plausibility Requirement: The need to maintain the biological plausibility advantages of the EP method

Importance Analysis

The significance of this problem is evident in:

Biological Plausibility: Backpropagation is considered biologically implausible due to its non-local gradient computation
Hardware Compatibility: EP methods are better suited for neuromorphic hardware implementation with higher energy efficiency
Online Learning Potential: EP supports on-device learning, suitable for edge computing scenarios

Limitations of Existing Methods

Architectural Constraints: Previous research limited to small networks such as VGG5
Gradient Bias: Theoretically requires infinitesimal nudging parameter β, introducing bias in practical applications
Convergence Difficulties: Deep networks struggle to reach stable equilibrium states
Activation Function Limitations: Existing activation functions perform poorly in deep networks

Core Contributions

Proposed Clipped ReLU Activation Function: Simplifies energy function and gradient computation, improving training stability for deep networks
Introduced Hopfield-Resnet Architecture: Enables EP methods to successfully train networks exceeding 12 layers through residual connections
Significant Performance Improvement: Achieves 93.92% accuracy on CIFAR-10, approaching backpropagation performance
Multi-Dataset Validation: Validates method effectiveness on CIFAR-10, CIFAR-100, and Fashion-MNIST

Methodology Details

Task Definition

This work investigates how to train deep convolutional neural networks for image classification tasks using the equilibrium propagation method. The input is an image x, the output is a class label y, with the constraint of maintaining the biological plausibility and local gradient computation characteristics of the EP method.

Equilibrium Propagation Theoretical Foundation

The EP method is based on static converged RNNs, with network state evolution following:

s^(t+1) = ∂Φ(x, s^t, θ)/∂s

where Φ is the energy function, s represents neuron states, and θ denotes network parameters.

EP training comprises two phases:

Free Phase: Evolution based solely on the energy function
Weakly Clamped Phase: Addition of perturbation terms proportional to the gradient of the loss function

The gradient computation formula is:

-∂L/∂θ = (1/β)[∂Φ(x, s^β*, θ)/∂θ - ∂Φ(x, s*, θ)/∂θ]

Hopfield-Resnet Architecture Design

Residual Connection Integration

The Hopfield-Resnet block contains three convolutional operations:

Main pathway: Two 3×3 convolutions
Skip connection: One 1×1 convolution

The neuron state update equation is modified to:

s^(t+1)_n = σ(∑[i∈pre(n)] P(w_i ⋆ s^t_i) + ∑[j∈post(n)] w̃_j ⋆ P^(-1)(s^t_j))

where pre(n) and post(n) denote all predecessor and successor states directly interacting with state n.

Network Architecture Details

4 Hopfield-Resnet blocks + 1 fully connected layer
Total of 13 trainable parameter groups (12 convolutional layers + 1 fully connected layer)
9 updatable neuron states

Clipped ReLU Activation Function

Proposes the ReLU_α activation function, restricting output to the range 0, α:

Prevents explosive growth of the energy function
Experiments employ ReLU_6 (α=6) for optimal performance
Computationally simpler compared to traditional sigmoid/tanh functions

Centered Equilibrium Propagation (CEP)

Adopts the CEP algorithm to reduce gradient estimation bias:

-∂L/∂θ = (1/2β)[∂Φ(x, s^(+β)*, θ)/∂θ - ∂Φ(x, s^(-β)*, θ)/∂θ]

Experimental Setup

Datasets

CIFAR-10: 32×32 color images, 10 classes, 50,000 training samples
CIFAR-100: 32×32 color images, 100 classes, 50,000 training samples
Fashion-MNIST: 28×28 grayscale images, 10 classes, 60,000 training samples

Evaluation Metrics

Test set accuracy is used as the primary evaluation metric

Comparison Methods

Baseline Method: Deep Convolutional Hopfield Network (DCHN) with VGG5 architecture
Backpropagation Baseline: Corresponding feedforward network architecture

Implementation Details

Optimizer: Nesterov Accelerated Gradient optimizer
Nudging Parameter β: Empirically tuned to 0.1, 0.4 range
Time Steps: 120 steps for free phase, 50 steps each for clamped phases (±β)
Hardware: NVIDIA RTX 4090 and 6000 Ada GPUs
Framework: PyTorch

Experimental Results

Main Results

Dataset	Model Architecture	Previous Best (%)	This Work (%)	Backpropagation (%)
CIFAR-10	VGG5	90.3	92.84	92.11
CIFAR-10	Hopfield-Resnet13	-	93.92	93.78
CIFAR-100	VGG5	68.4	70.78	72.54
CIFAR-100	Hopfield-Resnet13	-	71.05	75.12
F-MNIST	VGG5	93.53	94.34	-
F-MNIST	Hopfield-Resnet13	-	94.15	-

Key Findings

Significant Performance Improvement: 3.5% improvement over previous best results on CIFAR-10
Approaching Backpropagation Performance: Hopfield-Resnet13 on CIFAR-10 is only 0.14% lower than backpropagation
Successful Deep Network Training: First successful training of EP networks exceeding 12 layers

Ablation Studies

Importance of Residual Connections

Experiments demonstrate that deep networks without residual connections maintain stagnant training loss, while networks with residual connections successfully converge.

Activation Function Comparison

ReLU_6 performs best
ReLU_1 (hard-sigmoid) performs second
Randomly initialized ReLU_α with α∈0,10 shows intermediate performance

Training Time Analysis

Hopfield-Resnet13 training for 300 epochs requires over 30 hours
Significant time consumed by GPU kernel launches and CPU-GPU synchronization
Room for optimization exists

Memory Usage

CEP training memory usage comparable to backpropagation
Hopfield-Resnet13 (batch size 128): 1612 MiB
Corresponding Resnet13: 1324 MiB

Weight Distribution Analysis

Weight distribution characteristics of CEP-trained networks:

Smaller Weight Values: Both absolute values and variance are smaller than backpropagation-trained networks
Weights Approaching Zero with Depth: Weights gradually approach zero as depth increases
Residual Connection Mitigation: Skip connection layers show significantly lower proportion of near-zero weights

Biologically Plausible Learning Algorithms

Forward Propagation: Avoids the non-locality of backpropagation
Predictive Coding: Learning based on free energy principle
Contrastive Hebbian Learning: Theoretical foundation of EP

Development History of Equilibrium Propagation

Original EP: Scellier & Bengio (2017) proposed foundational theory
CEP: Reduces gradient bias through ±β
HEP: Further reduces bias using multiple equilibrium points on the complex plane
Convolutional Extension: Extends EP to convolutional networks

Hardware Implementation

Existing research has implemented EP on neuromorphic hardware such as memristor crossbars, demonstrating potential for on-device learning.

Conclusions and Discussion

Main Conclusions

Technical Breakthrough: First successful extension of EP to 13-layer deep networks
Performance Improvement: Significantly surpasses previous EP methods on multiple datasets
Architectural Innovation: The combination of residual connections and clipped ReLU effectively solves the deep scaling problem

Limitations

Computational Efficiency: Training time remains significantly longer than backpropagation
Hardware Dependency: Requires specially optimized hardware to fully leverage advantages
Performance Gap: Performance gap remains on complex datasets (e.g., CIFAR-100)
Depth Limitation: While improved, still falls short of modern deep networks

Future Directions

Modern Hopfield Networks: Integration with modern Hopfield networks for sequence learning
Hardware Optimization: Development of neuromorphic hardware specifically adapted for EP
Algorithm Optimization: Further reduction of training time and efficiency improvements
Theoretical Analysis: Deeper understanding of the unique training mechanism properties of EP

In-Depth Evaluation

Strengths

Important Breakthrough: First successful extension of EP to deep networks, addressing long-standing scalability issues
Practical Innovation: Simple and effective combination of residual connections and clipped ReLU
Comprehensive Validation: Sufficient experimental verification on multiple datasets
In-depth Analysis: Provides deep analytical insights such as weight distribution analysis
Open Source Code: Provides complete implementation code, enhancing reproducibility

Weaknesses

Computational Efficiency: Excessive training time limits practical applications
Insufficient Theoretical Analysis: Lacks theoretical explanation for why residual connections are effective
Dataset Limitations: Primarily validated on relatively simple datasets
Missing Hardware Optimization: Insufficient exploitation of existing GPU parallel computing capabilities

Impact

Academic Contribution: Provides important architectural innovation for the EP field
Practical Value: Offers more practical deep learning methods for neuromorphic computing
Research Inspiration: Establishes foundation for subsequent EP deep network research

Applicable Scenarios

Neuromorphic Hardware: Particularly suitable for implementation on specialized neuromorphic chips
Edge Computing: Suitable for edge devices requiring online learning
Bio-inspired Computing: Provides direction for constructing more biologically plausible AI systems
Low-Power Applications: Advantageous in scenarios with extreme energy efficiency requirements

References

Scellier, B. & Bengio, Y. (2017). Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in Computational Neuroscience.
Laborieux, A. et al. (2021). Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias. Frontiers in Neuroscience.
Laborieux, A. & Zenke, F. (2022). Holomorphic equilibrium propagation computes exact gradients through finite size oscillations. NeurIPS.
He, K. et al. (2016). Deep residual learning for image recognition. CVPR.

This paper achieves important breakthroughs in extending equilibrium propagation to deep networks. Through ingenious architectural design, it significantly enhances the practical utility of the EP method, making valuable contributions to the development of neuromorphic computing and bio-inspired learning algorithms.