2025-11-24T09:40:18.218209

Heterogeneous RBCs via deep multi-agent reinforcement learning

Gabriele, Glielmo, Taboga

Current macroeconomic models with agent heterogeneity can be broadly divided into two main groups. Heterogeneous-agent general equilibrium (GE) models, such as those based on Heterogeneous Agents New Keynesian (HANK) or Krusell-Smith (KS) approaches, rely on GE and 'rational expectations', somewhat unrealistic assumptions that make the models very computationally cumbersome, which in turn limits the amount of heterogeneity that can be modelled. In contrast, agent-based models (ABMs) can flexibly encompass a large number of arbitrarily heterogeneous agents, but typically require the specification of explicit behavioural rules, which can lead to a lengthy trial-and-error model-development process. To address these limitations, we introduce MARL-BC, a framework that integrates deep multi-agent reinforcement learning (MARL) with Real Business Cycle (RBC) models. We demonstrate that MARL-BC can: (1) recover textbook RBC results when using a single agent; (2) recover the results of the mean-field KS model using a large number of identical agents; and (3) effectively simulate rich heterogeneity among agents, a hard task for traditional GE approaches. Our framework can be thought of as an ABM if used with a variety of heterogeneous interacting agents, and can reproduce GE results in limit cases. As such, it is a step towards a synthesis of these often opposed modelling paradigms.

academic

Heterogeneous RBCs via Deep Multi-Agent Reinforcement Learning

Basic Information

Paper ID: 2510.12272
Title: Heterogeneous RBCs via deep multi-agent reinforcement learning
Authors: Federico Gabriele (Sapienza Università di Roma), Aldo Glielmo (Banca d'Italia), Marco Taboga (Banca d'Italia)
Classification: cs.MA cs.LG econ.TH
Publication Date: October 14, 2025
Paper Link: https://arxiv.org/abs/2510.12272

Abstract

Current macroeconomic models with agent heterogeneity can be divided into two major categories. Heterogeneous agent general equilibrium (GE) models, such as those based on HANK or Krusell-Smith (KS) approaches, rely on general equilibrium and "rational expectations" assumptions that are unrealistic and computationally complex, limiting the degree of heterogeneity that can be modeled. In contrast, agent-based models (ABMs) can flexibly incorporate numerous arbitrarily heterogeneous agents but typically require explicit specification of behavioral rules, resulting in lengthy trial-and-error model development processes. To address these limitations, this paper introduces the MARL-BC framework, which combines deep multi-agent reinforcement learning (MARL) with real business cycle (RBC) models.

Research Background and Motivation

Problem Definition

Macroeconomic modeling traditionally relies on general equilibrium models using representative agents, such as RBC and New Keynesian models. However, a well-known limitation of representative agent models is their inability to account for agent heterogeneity.

Limitations of Existing Approaches

Heterogeneous Agent GE Models:
- Require "rational expectations" assumptions, where agents must track the entire wealth or income distribution as state variables
- High computational costs, significantly limiting the degree of achievable heterogeneity
- Typically only achieve "ex-post" heterogeneity, where all agents start identical and diverge only due to individual random shocks
Agent-Based Models (ABMs):
- Completely abandon representative agents and rational expectations assumptions
- Require modelers to directly determine agent behavioral rules
- Difficult to properly handle arbitrariness in rule specifications and determine realistic rules

Research Motivation

Reinforcement learning (RL), particularly multi-agent reinforcement learning (MARL), offers new approaches for modeling heterogeneous agents in macroeconomics. The RL learning paradigm appears to provide a natural synthesis between the extremes of GE and ABM: agents can be boundedly rational and diverse, yet their behavior emerges endogenously from a principled optimization process (learning to maximize rewards).

Core Contributions

Developed the MARL-BC Framework: A MARL-based framework extending the classical RBC model to support multiple households with rich and flexible heterogeneity
Demonstrated Training Feasibility: Training using state-of-the-art RL algorithms (PPO, SAC, DDPG) is computationally feasible
Reproduced Classical Results: When using a single agent, textbook RBC results can be recovered
Reproduced Mean-Field Models: When using numerous ex-ante identical agents, mean-field Krusell-Smith model results can be recovered
Supported Rich Heterogeneity: Effectively simulates rich heterogeneity among agents, a task difficult for traditional GE methods

Methodology Details

Task Definition

The MARL-BC framework aims to extend the classical RBC model through multi-agent reinforcement learning to support heterogeneous household agents capable of:

Recovering traditional RBC models in the single-agent case
Recovering Krusell-Smith mean-field models with multiple identical agents
Modeling agents with arbitrary heterogeneity

Model Architecture

Heterogeneous RBC Environment

The model contains n types of households i = 1,...,n and a single firm:

Effective Total Capital and Labor:
```
K_t = (1/n) * Σ(κ_i * k_i_t)
L_t = (1/n) * Σ(λ_i * ℓ_i_t)
```
where κ_i and λ_i are capital and labor productivity, respectively
Production Function: Using Cobb-Douglas function
```
Y_t = A_t * K_t^α * L_t^(1-α)
```

Capital and Labor Costs: Assuming perfect competition

r_i_t = α * (Y_t/K_t) * κ_i
w_i_t = (1-α) * (Y_t/L_t) * λ_i

Household Wealth:

a_i_t = w_i_t * ℓ_i_t + r_i_t * k_i_t + (1-δ) * k_i_t

RL Household Agents

Action Space: Actions at each time step are tuples (c_i_t, ℓ_i_t)
- c_i_t: consumption ratio, range (0.01, 0.99)
- ℓ_i_t: labor supply, range (0.01, 0.99)

Observation Space:

x_i_t = (k_i_t, K_t, ℓ_i_(t-1), L_(t-1), A_t, κ_i, λ_i)

Reward Function:
```
R_i_t = log(c_i_t) + b * log(1 - ℓ_i_t)
```
where b > 0 controls the trade-off between consumption and leisure
Policy Learning: Each RL household learns a deterministic policy
```
π_i: x_i_t → (c_i_t, ℓ_i_t)
```
by maximizing the expected discounted reward sum:
```
R_i = E_π_i[Σ_t β^t * R_i_t]
```

Technical Innovations

Parameter Sharing: Adopts standard MARL parameter sharing paradigm where a single neural network represents all agents, achieving different behaviors through individual features in observations
Independent Learners: Trains independent learners, each accessing only partial information set x_i_t, optimizing approximate best-response policies
Flexible Heterogeneity: Supports arbitrary heterogeneity settings in capital and labor productivity
Unified Framework: Can recover GE results in limiting cases and serve as ABM in general cases

Experimental Setup

Experimental Parameters

Parameter	RBC	KS	General
n (number of households)	1	20	20
T (episode length)	500	500	500
κ_i (capital productivity)	1	1	{0, 0.8, 1, 1.2, 0.98, 1.02}
λ_i (labor productivity)	1	1	{0.98, 1, 1.02}
α (output elasticity)	0.36	0.36	0.36
δ (capital depreciation)	{1, 0.025}	0.025	0.025
β (discount factor)	0.95	0.95	0.95

Comparison Methods

Four RL algorithms are compared:

DDPG (Deep Deterministic Policy Gradient)
TD3 (Twin Delayed Deep Deterministic Policy Gradient)
SAC (Soft Actor Critic)
PPO (Proximal Policy Optimization)

Implementation Details

MARL environment developed using PettingZoo interface
RL algorithms from Stable-Baselines3
Single-agent environment trained for 10^6 steps; multi-agent environment with 10^5 updates per agent
Parameter sharing employed to improve sample efficiency and scalability

Experimental Results

Main Results

1. Representative Agent RBC Limit

Algorithm Performance: SAC, TD3, and DDPG significantly outperform PPO in convergence speed, with SAC being the most stable learner
Textbook RBC Recovery: Under complete depreciation (δ=1), RL households learn to recover optimal policies, converging to optimal values after approximately 10^4 training steps
Typical RBC Recovery: Under partial depreciation (δ=0.025), learned optimal consumption and labor choices match results computed by Dynare software
Impulse Response Functions: Successfully reproduces standard impulse response functions, statistically consistent with traditional method results

2. Mean-Field Krusell-Smith Limit

KS Law of Motion: Endogenously emerges with perfectly linear relationships (R² > 0.99), without prior assumptions
Distribution Characteristics: Gini coefficient increases to 0.18 after convergence, approaching the original KS calculation of 0.25
Marginal Propensity to Consume: Learned curves are flat at high wealth and sharply increase at low wealth, consistent with key results from the original KS paper

3. Modeling Greater Heterogeneity

Heterogeneous Capital Returns KS: By introducing different capital productivity rates, Gini coefficients reach 0.33 (mild heterogeneity) and 0.61 (significant heterogeneity)
Heterogeneous RBC: In 3×3 grid settings with 9 agents, different productivity rates lead to overlapping but distinct wealth levels
Scalability: Successfully scales to hundreds of agents (maximum 529), with SAC maintaining stable high performance across all scales

Ablation Studies

Comparing different RL algorithms' performance across varying agent numbers:

SAC consistently achieves high evaluation rewards across all population sizes
PPO performs poorly in small populations but improves as n increases
TD3 and DDPG show unstable performance in large n cases

Experimental Findings

Convergence: All considered RL algorithms successfully learn policies that optimize cumulative rewards
Stability: SAC is the most reliable learner, particularly in multi-agent settings
Scalability: Framework scales to hundreds of heterogeneous households, achievable even on ordinary hardware
Emergent Behavior: Behaviors like "hand-to-mouth" consumption strategies emerge endogenously without heuristic encoding

RL Applications in Economics

Early Contributions: Using deep multi-agent RL to simulate emergent economic behavior in simplified toy economies
Finance: Successfully applied to modeling various trading strategies
Macroeconomics: Recently begun exploring RL techniques to extend classical GE frameworks

Distinction from Existing Work

Economics Side: Primarily focuses on single-agent RL, showing it can recover policy functions of representative agent GE models
Computer Science Side: Experiments with multi-agent RL, showing methods can produce rich emergent economic behavior, but mostly ignores foundational macroeconomic models
This Work: Bridges two research lines, providing foundations connecting research across both disciplines

Conclusions and Discussion

Main Conclusions

The MARL-BC framework successfully integrates deep MARL with RBC environments
The framework can recover classical textbook RBC results and Krusell-Smith mean-field models
Can model rich agent heterogeneity difficult to achieve with traditional GE methods
Provides steps toward synthesis of ABM and heterogeneous agent GE models

Limitations

Computational Cost: Accurate training of RL agents requires substantial computational resources, with multi-agent training runs requiring hours
Hardware Dependency: GPU acceleration needed to significantly alleviate computational burden
Model Complexity: Requires more complex training and tuning processes compared to traditional methods

Future Directions

GPU Vectorization Implementation: Implement vectorized-style MARL environments to fully leverage GPU acceleration
Specific Economic Problem Studies: Apply framework to study specific economic issues like economic inequality and asymmetric labor productivity changes
AI Tool Impact: Study economic and financial consequences of AI tool proliferation in workplaces

In-Depth Evaluation

Strengths

Methodological Innovation:
- First successful combination of MARL with classical macroeconomic models
- Provides bridge between ABM and GE models
- Precisely reproduces traditional model results in limiting cases
Experimental Sufficiency:
- Three-level validation: single-agent RBC, mean-field KS, general heterogeneity
- Systematic comparison of multiple RL algorithms
- Scalability testing covering from single digits to hundreds of agents
Result Convincingness:
- Quantitative reproduction of classical model key metrics
- Statistical significance verification (e.g., impulse response functions)
- Demonstrates heterogeneity modeling capabilities difficult for traditional methods
Writing Clarity:
- Clear framework description and mathematical notation
- Intuitive figure presentations
- Detailed hyperparameters and implementation details

Shortcomings

Methodological Limitations:
- Parameter sharing dependence may limit true independence of agent behavior
- Independent learner approach may not achieve true equilibrium solutions
Experimental Setup Defects:
- Relatively limited agent numbers (maximum 529)
- Lack of direct comparison with other economic modeling methods
- Computational time analysis primarily CPU-based, GPU performance insufficiently explored
Insufficient Analysis:
- Lack of theoretical convergence analysis
- Limited theoretical understanding of learning dynamics
- Insufficient parameter sensitivity analysis

Impact

Contribution to Field:
- Provides new methodological framework for macroeconomic modeling
- Promotes cross-disciplinary research between computer science and economics
- Opens new directions for complex economic system modeling
Practical Value:
- Open-source code enhances reproducibility and extensibility
- Provides new tools for policy analysis
- Supports more realistic heterogeneity assumptions
Reproducibility:
- Detailed hyperparameter settings
- Open-source code and implementation details
- Standardized experimental protocols

Applicable Scenarios

Macroeconomic Policy Analysis: Particularly scenarios requiring consideration of agent heterogeneity
Economic Inequality Research: Utilizing heterogeneous productivity rates to model wealth distribution
Complex Economic System Modeling: High-dimensional heterogeneity problems difficult for traditional GE methods
Teaching and Research Tools: Provides intuitive modeling framework for economics education

References

This paper cites 60 related references covering important works in macroeconomics, reinforcement learning, multi-agent systems, and other fields, providing solid theoretical foundations for interdisciplinary research.