2025-11-24T09:40:18.218209

Heterogeneous RBCs via deep multi-agent reinforcement learning

Gabriele, Glielmo, Taboga
Current macroeconomic models with agent heterogeneity can be broadly divided into two main groups. Heterogeneous-agent general equilibrium (GE) models, such as those based on Heterogeneous Agents New Keynesian (HANK) or Krusell-Smith (KS) approaches, rely on GE and 'rational expectations', somewhat unrealistic assumptions that make the models very computationally cumbersome, which in turn limits the amount of heterogeneity that can be modelled. In contrast, agent-based models (ABMs) can flexibly encompass a large number of arbitrarily heterogeneous agents, but typically require the specification of explicit behavioural rules, which can lead to a lengthy trial-and-error model-development process. To address these limitations, we introduce MARL-BC, a framework that integrates deep multi-agent reinforcement learning (MARL) with Real Business Cycle (RBC) models. We demonstrate that MARL-BC can: (1) recover textbook RBC results when using a single agent; (2) recover the results of the mean-field KS model using a large number of identical agents; and (3) effectively simulate rich heterogeneity among agents, a hard task for traditional GE approaches. Our framework can be thought of as an ABM if used with a variety of heterogeneous interacting agents, and can reproduce GE results in limit cases. As such, it is a step towards a synthesis of these often opposed modelling paradigms.
academic

Heterogeneous RBCs via Deep Multi-Agent Reinforcement Learning

Basic Information

  • Paper ID: 2510.12272
  • Title: Heterogeneous RBCs via deep multi-agent reinforcement learning
  • Authors: Federico Gabriele (Sapienza Università di Roma), Aldo Glielmo (Banca d'Italia), Marco Taboga (Banca d'Italia)
  • Classification: cs.MA cs.LG econ.TH
  • Publication Date: October 14, 2025
  • Paper Link: https://arxiv.org/abs/2510.12272

Abstract

Current macroeconomic models with agent heterogeneity can be divided into two major categories. Heterogeneous agent general equilibrium (GE) models, such as those based on HANK or Krusell-Smith (KS) approaches, rely on general equilibrium and "rational expectations" assumptions that are unrealistic and computationally complex, limiting the degree of heterogeneity that can be modeled. In contrast, agent-based models (ABMs) can flexibly incorporate numerous arbitrarily heterogeneous agents but typically require explicit specification of behavioral rules, resulting in lengthy trial-and-error model development processes. To address these limitations, this paper introduces the MARL-BC framework, which combines deep multi-agent reinforcement learning (MARL) with real business cycle (RBC) models.

Research Background and Motivation

Problem Definition

Macroeconomic modeling traditionally relies on general equilibrium models using representative agents, such as RBC and New Keynesian models. However, a well-known limitation of representative agent models is their inability to account for agent heterogeneity.

Limitations of Existing Approaches

  1. Heterogeneous Agent GE Models:
    • Require "rational expectations" assumptions, where agents must track the entire wealth or income distribution as state variables
    • High computational costs, significantly limiting the degree of achievable heterogeneity
    • Typically only achieve "ex-post" heterogeneity, where all agents start identical and diverge only due to individual random shocks
  2. Agent-Based Models (ABMs):
    • Completely abandon representative agents and rational expectations assumptions
    • Require modelers to directly determine agent behavioral rules
    • Difficult to properly handle arbitrariness in rule specifications and determine realistic rules

Research Motivation

Reinforcement learning (RL), particularly multi-agent reinforcement learning (MARL), offers new approaches for modeling heterogeneous agents in macroeconomics. The RL learning paradigm appears to provide a natural synthesis between the extremes of GE and ABM: agents can be boundedly rational and diverse, yet their behavior emerges endogenously from a principled optimization process (learning to maximize rewards).

Core Contributions

  1. Developed the MARL-BC Framework: A MARL-based framework extending the classical RBC model to support multiple households with rich and flexible heterogeneity
  2. Demonstrated Training Feasibility: Training using state-of-the-art RL algorithms (PPO, SAC, DDPG) is computationally feasible
  3. Reproduced Classical Results: When using a single agent, textbook RBC results can be recovered
  4. Reproduced Mean-Field Models: When using numerous ex-ante identical agents, mean-field Krusell-Smith model results can be recovered
  5. Supported Rich Heterogeneity: Effectively simulates rich heterogeneity among agents, a task difficult for traditional GE methods

Methodology Details

Task Definition

The MARL-BC framework aims to extend the classical RBC model through multi-agent reinforcement learning to support heterogeneous household agents capable of:

  • Recovering traditional RBC models in the single-agent case
  • Recovering Krusell-Smith mean-field models with multiple identical agents
  • Modeling agents with arbitrary heterogeneity

Model Architecture

Heterogeneous RBC Environment

The model contains n types of households i = 1,...,n and a single firm:

  1. Effective Total Capital and Labor:
    K_t = (1/n) * Σ(κ_i * k_i_t)
    L_t = (1/n) * Σ(λ_i * ℓ_i_t)
    

    where κ_i and λ_i are capital and labor productivity, respectively
  2. Production Function: Using Cobb-Douglas function
    Y_t = A_t * K_t^α * L_t^(1-α)
    
  3. Capital and Labor Costs: Assuming perfect competition
    r_i_t = α * (Y_t/K_t) * κ_i
    w_i_t = (1-α) * (Y_t/L_t) * λ_i
    
  4. Household Wealth:
    a_i_t = w_i_t * ℓ_i_t + r_i_t * k_i_t + (1-δ) * k_i_t
    

RL Household Agents

  1. Action Space: Actions at each time step are tuples (c_i_t, ℓ_i_t)
    • c_i_t: consumption ratio, range (0.01, 0.99)
    • ℓ_i_t: labor supply, range (0.01, 0.99)
  2. Observation Space:
    x_i_t = (k_i_t, K_t, ℓ_i_(t-1), L_(t-1), A_t, κ_i, λ_i)
    
  3. Reward Function:
    R_i_t = log(c_i_t) + b * log(1 - ℓ_i_t)
    

    where b > 0 controls the trade-off between consumption and leisure
  4. Policy Learning: Each RL household learns a deterministic policy
    π_i: x_i_t → (c_i_t, ℓ_i_t)
    

    by maximizing the expected discounted reward sum:
    R_i = E_π_i[Σ_t β^t * R_i_t]
    

Technical Innovations

  1. Parameter Sharing: Adopts standard MARL parameter sharing paradigm where a single neural network represents all agents, achieving different behaviors through individual features in observations
  2. Independent Learners: Trains independent learners, each accessing only partial information set x_i_t, optimizing approximate best-response policies
  3. Flexible Heterogeneity: Supports arbitrary heterogeneity settings in capital and labor productivity
  4. Unified Framework: Can recover GE results in limiting cases and serve as ABM in general cases

Experimental Setup

Experimental Parameters

ParameterRBCKSGeneral
n (number of households)12020
T (episode length)500500500
κ_i (capital productivity)11{0, 0.8, 1, 1.2, 0.98, 1.02}
λ_i (labor productivity)11{0.98, 1, 1.02}
α (output elasticity)0.360.360.36
δ (capital depreciation){1, 0.025}0.0250.025
β (discount factor)0.950.950.95

Comparison Methods

Four RL algorithms are compared:

  • DDPG (Deep Deterministic Policy Gradient)
  • TD3 (Twin Delayed Deep Deterministic Policy Gradient)
  • SAC (Soft Actor Critic)
  • PPO (Proximal Policy Optimization)

Implementation Details

  • MARL environment developed using PettingZoo interface
  • RL algorithms from Stable-Baselines3
  • Single-agent environment trained for 10^6 steps; multi-agent environment with 10^5 updates per agent
  • Parameter sharing employed to improve sample efficiency and scalability

Experimental Results

Main Results

1. Representative Agent RBC Limit

  • Algorithm Performance: SAC, TD3, and DDPG significantly outperform PPO in convergence speed, with SAC being the most stable learner
  • Textbook RBC Recovery: Under complete depreciation (δ=1), RL households learn to recover optimal policies, converging to optimal values after approximately 10^4 training steps
  • Typical RBC Recovery: Under partial depreciation (δ=0.025), learned optimal consumption and labor choices match results computed by Dynare software
  • Impulse Response Functions: Successfully reproduces standard impulse response functions, statistically consistent with traditional method results

2. Mean-Field Krusell-Smith Limit

  • KS Law of Motion: Endogenously emerges with perfectly linear relationships (R² > 0.99), without prior assumptions
  • Distribution Characteristics: Gini coefficient increases to 0.18 after convergence, approaching the original KS calculation of 0.25
  • Marginal Propensity to Consume: Learned curves are flat at high wealth and sharply increase at low wealth, consistent with key results from the original KS paper

3. Modeling Greater Heterogeneity

  • Heterogeneous Capital Returns KS: By introducing different capital productivity rates, Gini coefficients reach 0.33 (mild heterogeneity) and 0.61 (significant heterogeneity)
  • Heterogeneous RBC: In 3×3 grid settings with 9 agents, different productivity rates lead to overlapping but distinct wealth levels
  • Scalability: Successfully scales to hundreds of agents (maximum 529), with SAC maintaining stable high performance across all scales

Ablation Studies

Comparing different RL algorithms' performance across varying agent numbers:

  • SAC consistently achieves high evaluation rewards across all population sizes
  • PPO performs poorly in small populations but improves as n increases
  • TD3 and DDPG show unstable performance in large n cases

Experimental Findings

  1. Convergence: All considered RL algorithms successfully learn policies that optimize cumulative rewards
  2. Stability: SAC is the most reliable learner, particularly in multi-agent settings
  3. Scalability: Framework scales to hundreds of heterogeneous households, achievable even on ordinary hardware
  4. Emergent Behavior: Behaviors like "hand-to-mouth" consumption strategies emerge endogenously without heuristic encoding

RL Applications in Economics

  • Early Contributions: Using deep multi-agent RL to simulate emergent economic behavior in simplified toy economies
  • Finance: Successfully applied to modeling various trading strategies
  • Macroeconomics: Recently begun exploring RL techniques to extend classical GE frameworks

Distinction from Existing Work

  1. Economics Side: Primarily focuses on single-agent RL, showing it can recover policy functions of representative agent GE models
  2. Computer Science Side: Experiments with multi-agent RL, showing methods can produce rich emergent economic behavior, but mostly ignores foundational macroeconomic models
  3. This Work: Bridges two research lines, providing foundations connecting research across both disciplines

Conclusions and Discussion

Main Conclusions

  1. The MARL-BC framework successfully integrates deep MARL with RBC environments
  2. The framework can recover classical textbook RBC results and Krusell-Smith mean-field models
  3. Can model rich agent heterogeneity difficult to achieve with traditional GE methods
  4. Provides steps toward synthesis of ABM and heterogeneous agent GE models

Limitations

  1. Computational Cost: Accurate training of RL agents requires substantial computational resources, with multi-agent training runs requiring hours
  2. Hardware Dependency: GPU acceleration needed to significantly alleviate computational burden
  3. Model Complexity: Requires more complex training and tuning processes compared to traditional methods

Future Directions

  1. GPU Vectorization Implementation: Implement vectorized-style MARL environments to fully leverage GPU acceleration
  2. Specific Economic Problem Studies: Apply framework to study specific economic issues like economic inequality and asymmetric labor productivity changes
  3. AI Tool Impact: Study economic and financial consequences of AI tool proliferation in workplaces

In-Depth Evaluation

Strengths

  1. Methodological Innovation:
    • First successful combination of MARL with classical macroeconomic models
    • Provides bridge between ABM and GE models
    • Precisely reproduces traditional model results in limiting cases
  2. Experimental Sufficiency:
    • Three-level validation: single-agent RBC, mean-field KS, general heterogeneity
    • Systematic comparison of multiple RL algorithms
    • Scalability testing covering from single digits to hundreds of agents
  3. Result Convincingness:
    • Quantitative reproduction of classical model key metrics
    • Statistical significance verification (e.g., impulse response functions)
    • Demonstrates heterogeneity modeling capabilities difficult for traditional methods
  4. Writing Clarity:
    • Clear framework description and mathematical notation
    • Intuitive figure presentations
    • Detailed hyperparameters and implementation details

Shortcomings

  1. Methodological Limitations:
    • Parameter sharing dependence may limit true independence of agent behavior
    • Independent learner approach may not achieve true equilibrium solutions
  2. Experimental Setup Defects:
    • Relatively limited agent numbers (maximum 529)
    • Lack of direct comparison with other economic modeling methods
    • Computational time analysis primarily CPU-based, GPU performance insufficiently explored
  3. Insufficient Analysis:
    • Lack of theoretical convergence analysis
    • Limited theoretical understanding of learning dynamics
    • Insufficient parameter sensitivity analysis

Impact

  1. Contribution to Field:
    • Provides new methodological framework for macroeconomic modeling
    • Promotes cross-disciplinary research between computer science and economics
    • Opens new directions for complex economic system modeling
  2. Practical Value:
    • Open-source code enhances reproducibility and extensibility
    • Provides new tools for policy analysis
    • Supports more realistic heterogeneity assumptions
  3. Reproducibility:
    • Detailed hyperparameter settings
    • Open-source code and implementation details
    • Standardized experimental protocols

Applicable Scenarios

  1. Macroeconomic Policy Analysis: Particularly scenarios requiring consideration of agent heterogeneity
  2. Economic Inequality Research: Utilizing heterogeneous productivity rates to model wealth distribution
  3. Complex Economic System Modeling: High-dimensional heterogeneity problems difficult for traditional GE methods
  4. Teaching and Research Tools: Provides intuitive modeling framework for economics education

References

This paper cites 60 related references covering important works in macroeconomics, reinforcement learning, multi-agent systems, and other fields, providing solid theoretical foundations for interdisciplinary research.