Digital computers are power-hungry and largely intolerant of damaged components, making them potentially difficult tools for energy-limited autonomous agents in uncertain environments. Recently developed Contrastive Local Learning Networks (CLLNs) - analog networks of self-adjusting nonlinear resistors - are inherently low-power and robust to physical damage, but were constructed to perform supervised learning. In this work we demonstrate success on two simple RL problems using Q-learning adapted for simulated CLLNs. Doing so makes explicit the components (beyond the network being trained) required to enact various tools in the RL toolbox, some of which (policy function and value function) are more natural in this system than others (replay buffer). We discuss assumptions such as the physical safety that digital hardware requires, CLLNs can forgo, and biological systems cannot rely on, and highlight secondary goals that are important in biology and trainable in CLLNs, but make little sense in digital computers.
While digital computers are powerful, they suffer from high energy consumption and intolerance to component damage, which poses challenges for their use as autonomous intelligent agents in energy-limited and uncertain environments. This paper investigates Contrastive Local Learning Networks (CLLNs)—analog networks composed of self-regulating nonlinear resistors—for reinforcement learning tasks. CLLNs naturally exhibit low power consumption and robustness to physical damage, but have previously been used only for supervised learning. The authors successfully adapted Q-learning to simulated CLLNs to solve two simple reinforcement learning problems and clarified the components required to implement various tools in the RL toolkit. Policy functions and value functions are more naturally implemented in this system, while experience replay buffers are less natural.
Digital computers face two fundamental weaknesses in reinforcement learning applications:
Poor fault tolerance: Damage to a single transistor can cause system-wide failure, as the function of each component is inherently tied to its position in the system
High energy consumption: Laptop CPUs consume approximately 50W, stemming from the high energy cost of maintaining "perfect" operation and data transfer between processing and storage
For autonomous agents in energy-limited environments, low power consumption and fault tolerance are critical. Biological systems excel in these aspects:
The human brain consumes only 20W total power while performing perception, cognition, motor control, and other tasks
The brain can withstand significant damage and continue operating, including single neuron destruction, traumatic brain injury, and even brain region removal
This robustness stems from distributed processing and emergent computation, rather than linear computation
Few examples of artificial non-digital hardware applications in RL tasks
Many digitally-enhanced or simulated analog systems have been used for RL, but few hardware demonstrations combine distributed storage, computation, and analog signals
Recently developed CLLNs, while possessing low power and fault-tolerant characteristics, have not yet been validated in RL scenarios
First application of CLLNs to reinforcement learning: Successfully adapted Q-learning to simulated CLLNs, enabling RL capabilities for physical learning networks
Validation on two RL tasks:
Four-state, four-action Markov Decision Process (MDP)
Nine-state (3×3 grid) four-action navigation task
Achieved near-optimal policies in 8-10 out of 10 trials
Clarification of design considerations for physical learning systems:
Identified RL components naturally implementable in CLLNs (policy functions, value functions)
Identified components requiring additional hardware support (experience replay buffers)
Revealed constraints unique to physical systems (bounded parameters, non-feedforward structure)
Proposed unique advantages of physical learning systems:
Low-power operation can be further optimized through modified learning algorithms
Online recovery capability after damage
Ability to train secondary objectives (e.g., power consumption, robustness) that are meaningless in digital systems
Dillavou et al. (2024): Machine learning without a processor: Emergent learning in a nonlinear analog network. PNAS. (Original CLLN paper)
Stern et al. (2021): Supervised Learning in Physical Networks: From Machine Learning to Learning Machines. Physical Review X. (Coupled Learning theoretical framework)
Scellier & Bengio (2017): Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation. Frontiers in Computational Neuroscience. (Theoretical foundation)
Mak et al. (2007, 2010): Early work on analog circuit RL
Stern et al. (2024): Training self-learning circuits for power-efficient solutions. APL Machine Learning. (Power efficiency optimization)
Overall Assessment: This is pioneering work that first applies physical learning networks to reinforcement learning, with significant theoretical and potential practical value. While currently validated only on simple tasks and still distant from fully autonomous physical learning systems, it opens new research directions for energy-efficient and fault-tolerant autonomous agents. The paper's primary value lies in clarifying the design space, constraints, and unique advantages of physical learning systems, laying foundation for subsequent research. Future work should continue advancing hardware implementation, task complexity, and methodological refinement.