2025-11-13T09:34:11.098712

Scaling Equilibrium Propagation to Deeper Neural Network Architectures

Elayedam, Srinivasan
Equilibrium propagation has been proposed as a biologically plausible alternative to the backpropagation algorithm. The local nature of gradient computations, combined with the use of convergent RNNs to reach equilibrium states, make this approach well-suited for implementation on neuromorphic hardware. However, previous studies on equilibrium propagation have been restricted to networks containing only dense layers or relatively small architectures with a few convolutional layers followed by a final dense layer. These networks have a significant gap in accuracy compared to similarly sized feedforward networks trained with backpropagation. In this work, we introduce the Hopfield-Resnet architecture, which incorporates residual (or skip) connections in Hopfield networks with clipped $\mathrm{ReLU}$ as the activation function. The proposed architectural enhancements enable the training of networks with nearly twice the number of layers reported in prior works. For example, Hopfield-Resnet13 achieves 93.92\% accuracy on CIFAR-10, which is $\approx$3.5\% higher than the previous best result and comparable to that provided by Resnet13 trained using backpropagation.
academic

Scaling Equilibrium Propagation to Deeper Neural Network Architectures

Basic Information

  • Paper ID: 2509.26003
  • Title: Scaling Equilibrium Propagation to Deeper Neural Network Architectures
  • Authors: Sankar Vinayak E P (IIT Madras), Gopalakrishnan Srinivasan (IIT Madras)
  • Categories: cs.NE (Neural and Evolutionary Computing), cs.LG (Machine Learning)
  • Publication Date: October 13, 2025 (arXiv v2)
  • Paper Link: https://arxiv.org/abs/2509.26003

Abstract

Equilibrium Propagation (EP) has been proposed as a biologically plausible alternative to backpropagation. The local nature of its gradient computation, combined with the use of converged RNNs to reach equilibrium states, makes this approach particularly suitable for implementation on neuromorphic hardware. However, previous research on equilibrium propagation has been limited to networks containing dense layers or relatively small architectures, which exhibit significant accuracy gaps compared to similarly-sized feedforward networks trained with backpropagation. This work introduces the Hopfield-Resnet architecture, which integrates residual connections within Hopfield networks and employs clipped ReLU as the activation function. The proposed architectural enhancements enable networks to train with nearly twice the depth reported in previous work. For example, Hopfield-Resnet13 achieves 93.92% accuracy on CIFAR-10, approximately 3.5% higher than the previous best result and comparable to Resnet13 trained with backpropagation.

Research Background and Motivation

Problem Definition

The core problem addressed in this research is the scalability of the Equilibrium Propagation (EP) method for deep neural networks, manifested specifically as:

  1. Depth Limitation: Existing EP methods can only effectively train shallow networks (≤6 layers)
  2. Performance Gap: Networks trained with EP exhibit significant performance degradation compared to same-scale networks trained with backpropagation
  3. Biological Plausibility Requirement: The need to maintain the biological plausibility advantages of the EP method

Importance Analysis

The significance of this problem is evident in:

  1. Biological Plausibility: Backpropagation is considered biologically implausible due to its non-local gradient computation
  2. Hardware Compatibility: EP methods are better suited for neuromorphic hardware implementation with higher energy efficiency
  3. Online Learning Potential: EP supports on-device learning, suitable for edge computing scenarios

Limitations of Existing Methods

  1. Architectural Constraints: Previous research limited to small networks such as VGG5
  2. Gradient Bias: Theoretically requires infinitesimal nudging parameter β, introducing bias in practical applications
  3. Convergence Difficulties: Deep networks struggle to reach stable equilibrium states
  4. Activation Function Limitations: Existing activation functions perform poorly in deep networks

Core Contributions

  1. Proposed Clipped ReLU Activation Function: Simplifies energy function and gradient computation, improving training stability for deep networks
  2. Introduced Hopfield-Resnet Architecture: Enables EP methods to successfully train networks exceeding 12 layers through residual connections
  3. Significant Performance Improvement: Achieves 93.92% accuracy on CIFAR-10, approaching backpropagation performance
  4. Multi-Dataset Validation: Validates method effectiveness on CIFAR-10, CIFAR-100, and Fashion-MNIST

Methodology Details

Task Definition

This work investigates how to train deep convolutional neural networks for image classification tasks using the equilibrium propagation method. The input is an image x, the output is a class label y, with the constraint of maintaining the biological plausibility and local gradient computation characteristics of the EP method.

Equilibrium Propagation Theoretical Foundation

The EP method is based on static converged RNNs, with network state evolution following:

s^(t+1) = ∂Φ(x, s^t, θ)/∂s

where Φ is the energy function, s represents neuron states, and θ denotes network parameters.

EP training comprises two phases:

  1. Free Phase: Evolution based solely on the energy function
  2. Weakly Clamped Phase: Addition of perturbation terms proportional to the gradient of the loss function

The gradient computation formula is:

-∂L/∂θ = (1/β)[∂Φ(x, s^β*, θ)/∂θ - ∂Φ(x, s*, θ)/∂θ]

Hopfield-Resnet Architecture Design

Residual Connection Integration

The Hopfield-Resnet block contains three convolutional operations:

  • Main pathway: Two 3×3 convolutions
  • Skip connection: One 1×1 convolution

The neuron state update equation is modified to:

s^(t+1)_n = σ(∑[i∈pre(n)] P(w_i ⋆ s^t_i) + ∑[j∈post(n)] w̃_j ⋆ P^(-1)(s^t_j))

where pre(n) and post(n) denote all predecessor and successor states directly interacting with state n.

Network Architecture Details

  • 4 Hopfield-Resnet blocks + 1 fully connected layer
  • Total of 13 trainable parameter groups (12 convolutional layers + 1 fully connected layer)
  • 9 updatable neuron states

Clipped ReLU Activation Function

Proposes the ReLU_α activation function, restricting output to the range 0, α:

  • Prevents explosive growth of the energy function
  • Experiments employ ReLU_6 (α=6) for optimal performance
  • Computationally simpler compared to traditional sigmoid/tanh functions

Centered Equilibrium Propagation (CEP)

Adopts the CEP algorithm to reduce gradient estimation bias:

-∂L/∂θ = (1/2β)[∂Φ(x, s^(+β)*, θ)/∂θ - ∂Φ(x, s^(-β)*, θ)/∂θ]

Experimental Setup

Datasets

  • CIFAR-10: 32×32 color images, 10 classes, 50,000 training samples
  • CIFAR-100: 32×32 color images, 100 classes, 50,000 training samples
  • Fashion-MNIST: 28×28 grayscale images, 10 classes, 60,000 training samples

Evaluation Metrics

Test set accuracy is used as the primary evaluation metric

Comparison Methods

  • Baseline Method: Deep Convolutional Hopfield Network (DCHN) with VGG5 architecture
  • Backpropagation Baseline: Corresponding feedforward network architecture

Implementation Details

  • Optimizer: Nesterov Accelerated Gradient optimizer
  • Nudging Parameter β: Empirically tuned to 0.1, 0.4 range
  • Time Steps: 120 steps for free phase, 50 steps each for clamped phases (±β)
  • Hardware: NVIDIA RTX 4090 and 6000 Ada GPUs
  • Framework: PyTorch

Experimental Results

Main Results

DatasetModel ArchitecturePrevious Best (%)This Work (%)Backpropagation (%)
CIFAR-10VGG590.392.8492.11
CIFAR-10Hopfield-Resnet13-93.9293.78
CIFAR-100VGG568.470.7872.54
CIFAR-100Hopfield-Resnet13-71.0575.12
F-MNISTVGG593.5394.34-
F-MNISTHopfield-Resnet13-94.15-

Key Findings

  1. Significant Performance Improvement: 3.5% improvement over previous best results on CIFAR-10
  2. Approaching Backpropagation Performance: Hopfield-Resnet13 on CIFAR-10 is only 0.14% lower than backpropagation
  3. Successful Deep Network Training: First successful training of EP networks exceeding 12 layers

Ablation Studies

Importance of Residual Connections

Experiments demonstrate that deep networks without residual connections maintain stagnant training loss, while networks with residual connections successfully converge.

Activation Function Comparison

  • ReLU_6 performs best
  • ReLU_1 (hard-sigmoid) performs second
  • Randomly initialized ReLU_α with α∈0,10 shows intermediate performance

Training Time Analysis

  • Hopfield-Resnet13 training for 300 epochs requires over 30 hours
  • Significant time consumed by GPU kernel launches and CPU-GPU synchronization
  • Room for optimization exists

Memory Usage

  • CEP training memory usage comparable to backpropagation
  • Hopfield-Resnet13 (batch size 128): 1612 MiB
  • Corresponding Resnet13: 1324 MiB

Weight Distribution Analysis

Weight distribution characteristics of CEP-trained networks:

  1. Smaller Weight Values: Both absolute values and variance are smaller than backpropagation-trained networks
  2. Weights Approaching Zero with Depth: Weights gradually approach zero as depth increases
  3. Residual Connection Mitigation: Skip connection layers show significantly lower proportion of near-zero weights

Biologically Plausible Learning Algorithms

  • Forward Propagation: Avoids the non-locality of backpropagation
  • Predictive Coding: Learning based on free energy principle
  • Contrastive Hebbian Learning: Theoretical foundation of EP

Development History of Equilibrium Propagation

  • Original EP: Scellier & Bengio (2017) proposed foundational theory
  • CEP: Reduces gradient bias through ±β
  • HEP: Further reduces bias using multiple equilibrium points on the complex plane
  • Convolutional Extension: Extends EP to convolutional networks

Hardware Implementation

Existing research has implemented EP on neuromorphic hardware such as memristor crossbars, demonstrating potential for on-device learning.

Conclusions and Discussion

Main Conclusions

  1. Technical Breakthrough: First successful extension of EP to 13-layer deep networks
  2. Performance Improvement: Significantly surpasses previous EP methods on multiple datasets
  3. Architectural Innovation: The combination of residual connections and clipped ReLU effectively solves the deep scaling problem

Limitations

  1. Computational Efficiency: Training time remains significantly longer than backpropagation
  2. Hardware Dependency: Requires specially optimized hardware to fully leverage advantages
  3. Performance Gap: Performance gap remains on complex datasets (e.g., CIFAR-100)
  4. Depth Limitation: While improved, still falls short of modern deep networks

Future Directions

  1. Modern Hopfield Networks: Integration with modern Hopfield networks for sequence learning
  2. Hardware Optimization: Development of neuromorphic hardware specifically adapted for EP
  3. Algorithm Optimization: Further reduction of training time and efficiency improvements
  4. Theoretical Analysis: Deeper understanding of the unique training mechanism properties of EP

In-Depth Evaluation

Strengths

  1. Important Breakthrough: First successful extension of EP to deep networks, addressing long-standing scalability issues
  2. Practical Innovation: Simple and effective combination of residual connections and clipped ReLU
  3. Comprehensive Validation: Sufficient experimental verification on multiple datasets
  4. In-depth Analysis: Provides deep analytical insights such as weight distribution analysis
  5. Open Source Code: Provides complete implementation code, enhancing reproducibility

Weaknesses

  1. Computational Efficiency: Excessive training time limits practical applications
  2. Insufficient Theoretical Analysis: Lacks theoretical explanation for why residual connections are effective
  3. Dataset Limitations: Primarily validated on relatively simple datasets
  4. Missing Hardware Optimization: Insufficient exploitation of existing GPU parallel computing capabilities

Impact

  1. Academic Contribution: Provides important architectural innovation for the EP field
  2. Practical Value: Offers more practical deep learning methods for neuromorphic computing
  3. Research Inspiration: Establishes foundation for subsequent EP deep network research

Applicable Scenarios

  1. Neuromorphic Hardware: Particularly suitable for implementation on specialized neuromorphic chips
  2. Edge Computing: Suitable for edge devices requiring online learning
  3. Bio-inspired Computing: Provides direction for constructing more biologically plausible AI systems
  4. Low-Power Applications: Advantageous in scenarios with extreme energy efficiency requirements

References

  1. Scellier, B. & Bengio, Y. (2017). Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in Computational Neuroscience.
  2. Laborieux, A. et al. (2021). Scaling equilibrium propagation to deep convnets by drastically reducing its gradient estimator bias. Frontiers in Neuroscience.
  3. Laborieux, A. & Zenke, F. (2022). Holomorphic equilibrium propagation computes exact gradients through finite size oscillations. NeurIPS.
  4. He, K. et al. (2016). Deep residual learning for image recognition. CVPR.

This paper achieves important breakthroughs in extending equilibrium propagation to deep networks. Through ingenious architectural design, it significantly enhances the practical utility of the EP method, making valuable contributions to the development of neuromorphic computing and bio-inspired learning algorithms.