MADiff: Offline Multi-agent Learning with Diffusion Models
Zhu, Liu, Mao et al.
Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task. Q-learning algorithms struggle with extrapolation errors in offline settings, while supervised learning methods are constrained by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Generating trajectories for each agent with independent DMs may impede coordination, while concatenating all agents' information can lead to low sample efficiency. Accordingly, we propose MADiff, which is realized with an attention-based diffusion model to model the complex coordination among behaviors of multiple agents. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework, functioning as both a decentralized policy and a centralized controller. During decentralized executions, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied in multi-agent trajectory predictions. Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions. Our code is available at https://github.com/zbzhu99/madiff.
academic
MADiff: Offline Multi-agent Learning with Diffusion Models
Offline reinforcement learning (Offline RL) aims to learn policies from pre-existing datasets without further interaction, which is a challenging task. Q-learning algorithms suffer from extrapolation error in offline settings, while supervised learning methods are limited by model expressiveness. Recently, diffusion models (DMs) have shown promise in overcoming these limitations in single-agent learning, but their application in multi-agent scenarios remains unclear. Using independent DMs for each agent to generate trajectories may hinder coordination, while concatenating all agent information leads to low sample efficiency. Therefore, this paper proposes MADiff, which models complex coordination between multiple agent behaviors through attention-based diffusion models. To our knowledge, MADiff is the first diffusion-based multi-agent learning framework that functions both as a decentralized policy and as a centralized controller. During decentralized execution, MADiff simultaneously performs teammate modeling, and the centralized controller can also be applied to multi-agent trajectory prediction. Experiments demonstrate that MADiff outperforms baseline algorithms on various multi-agent learning tasks, highlighting its effectiveness in modeling complex multi-agent interactions.
Challenges in Offline Multi-agent Reinforcement Learning: Compared to single-agent learning, offline multi-agent learning (MAL) has received less research attention and is more challenging. Since the behaviors of all agents are interdependent, each agent must model inter-agent interactions and coordination while making decisions in a decentralized manner to achieve objectives.
Limitations of Existing Methods:
Q-learning Methods: Suffer from extrapolation error in offline settings; incorrect centralized value functions lead to significant extrapolation errors
Sequence Modeling Methods: Limited by model expressiveness; difficult to handle diverse datasets; suffer from compounding errors in autoregressive generation
Independent Diffusion Models: Using independent DMs for each agent may result in severe inconsistency due to lack of proper credit assignment
Simple Concatenation Methods: Concatenating all agent information as DM input/output ignores important characteristics of multi-agent systems
Research Motivation:
Diffusion models demonstrate superior modeling capabilities in single-agent offline RL
Multi-agent systems require effective coordination mechanisms
Need for a unified framework supporting the centralized training decentralized execution (CTDE) paradigm
First Diffusion-based Multi-agent Learning Framework: Proposes MADiff, which unifies decentralized policies, centralized controllers, teammate modeling, and trajectory prediction functionalities
Novel Attention-based Diffusion Model Architecture: Specifically designed for multi-agent learning, enabling inter-agent coordination at each denoising step
Superior Experimental Performance: Achieves excellent results on various offline multi-agent problems, including offline MARL and trajectory prediction tasks
U-Net Foundation: Adopts U-Net as the base structure for modeling trajectories of all agents, containing repeated one-dimensional convolutional residual blocks
Attention Mechanism:
Applies attention layers before decoder blocks in all agents' U-Nets
Attention operations are performed on skip connection features cli from encoder layers
Uses multi-head attention mechanism to fuse encoded features
The paper cites multiple important works, including:
Foundational diffusion model work: Ho et al. (2020), Song and Ermon (2019)
Single-agent diffusion RL: Janner et al. (2022), Ajay et al. (2023)
Multi-agent RL baselines: Rashid et al. (2020), Meng et al. (2021)
Overall Assessment: This is a high-quality research paper that successfully introduces diffusion models to the multi-agent learning domain with significant technical innovation and comprehensive experimental validation. Despite some limitations, it opens new research directions in the field with important academic value and practical prospects.