PIMAEX: Multi-Agent Exploration through Peer Incentivization
Kölle, Tochtermann, Schönberger et al.
While exploration in single-agent reinforcement learning has been studied extensively in recent years, considerably less work has focused on its counterpart in multi-agent reinforcement learning. To address this issue, this work proposes a peer-incentivized reward function inspired by previous research on intrinsic curiosity and influence-based rewards. The \textit{PIMAEX} reward, short for Peer-Incentivized Multi-Agent Exploration, aims to improve exploration in the multi-agent setting by encouraging agents to exert influence over each other to increase the likelihood of encountering novel states. We evaluate the \textit{PIMAEX} reward in conjunction with \textit{PIMAEX-Communication}, a multi-agent training algorithm that employs a communication channel for agents to influence one another. The evaluation is conducted in the \textit{Consume/Explore} environment, a partially observable environment with deceptive rewards, specifically designed to challenge the exploration vs.\ exploitation dilemma and the credit-assignment problem. The results empirically demonstrate that agents using the \textit{PIMAEX} reward with \textit{PIMAEX-Communication} outperform those that do not.
academic
PIMAEX: Multi-Agent Exploration through Peer Incentivization
While exploration in single-agent reinforcement learning has been extensively studied, exploration in multi-agent reinforcement learning remains relatively understudied. To address this gap, this paper proposes a reward function based on peer incentivization, inspired by prior research on intrinsic curiosity and influence-based rewards. The PIMAEX reward (abbreviation for Peer-Incentivized Multi-Agent Exploration) aims to improve the likelihood of encountering novel states by encouraging agents to exert mutual influence on one another, thereby enhancing exploration in multi-agent environments. The study evaluates the combination of PIMAEX rewards with the PIMAEX-Communication algorithm in the Consume/Explore environment, a partially observable environment with deceptive rewards specifically designed to challenge the exploration-exploitation dilemma and credit assignment problems. Experimental results demonstrate that agents using PIMAEX rewards outperform those without it.
Multi-Agent Exploration Challenges: Exploration in multi-agent reinforcement learning is more difficult than in single-agent settings because the joint state space grows exponentially with the number of agents
Coordination Requirements: Since state transition probabilities depend on the joint actions of all agents, individual agents cannot independently explore important portions of the state space
Sparse and Deceptive Rewards: In environments with sparse or deceptive rewards, agents easily become trapped in local optima
Credit Assignment Problem: The temporal distance between long action sequences and final rewards makes credit assignment challenging
Proposes PIMAEX Reward Function: A novel peer incentivization mechanism combining intrinsic curiosity and social influence to promote multi-agent exploration
Constructs Generalized Social Influence Reward Framework: Unifies influence reward concepts from prior work, incorporating weighted combinations of α, β, and γ terms
Designs PIMAEX-Communication Algorithm: A multi-agent training algorithm based on communication mechanisms that can be combined with any actor-critic algorithm
Develops Consume/Explore Environment: A specially designed test environment for evaluating exploration-exploitation dilemmas and credit assignment problems
Empirical Validation: Demonstrates the effectiveness of the PIMAEX method in challenging environments
This paper is primarily based on the following important works:
Jaques et al. (2018) - Social influence as intrinsic motivation for multi-agent deep reinforcement learning
Wang et al. (2019) - Influence-based multi-agent exploration
Burda et al. (2018) - Random network distillation exploration method
Pathak et al. (2017) - Curiosity-driven exploration by self-supervised prediction
Overall Assessment: This is an innovative work in the field of multi-agent reinforcement learning exploration. While it has certain limitations, the proposal of the β term and its empirical validation provide valuable contributions to the field. Future work should validate the method's generalization capability in more complex environments.