Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola
Liu, Cann, Colbert et al.
While the rapid advancements in the reinforcement learning (RL) research community have been remarkable, the adoption in commercial video games remains slow. In this paper, we outline common challenges the Game AI community faces when using RL-driven NPCs in practice, and highlight the intersection of RL with traditional behavior trees (BTs) as a crucial juncture to be explored further. Although the BT+RL intersection has been suggested in several research papers, its adoption is rare. We demonstrate the viability of this approach using AMD Schola -- a plugin for training RL agents in Unreal Engine -- by creating multi-task NPCs in a complex 3D environment inspired by the commercial video game ``The Last of Us". We provide detailed methodologies for jointly training RL models with BTs while showcasing various skills.
academic
Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola
Despite significant advances in reinforcement learning (RL) research, its application in commercial video games remains limited. This paper outlines common challenges faced by the game AI community when implementing RL-driven NPCs in practice, and highlights the intersection of RL with traditional behavior trees (BT) as a critical node requiring further exploration. While the combination of BT+RL has been mentioned in several research papers, its practical application remains scarce. The authors utilize AMD Schola—a plugin for training RL agents in Unreal Engine—to demonstrate the feasibility of this approach by creating multi-task NPCs in a complex 3D environment inspired by the commercial game The Last of Us.
Despite rapid advances in reinforcement learning technology, the adoption of RL-driven NPCs in commercial game development faces significant challenges. Traditional behavior tree approaches, while highly structured, become complex and lack adaptability when handling multiple tasks; conversely, RL methods, despite their dynamic adaptation capabilities, suffer from reward shaping difficulties, negative transfer learning, and high computational resource requirements.
Pure BT Approach: Complex multi-task BT development is tedious, lacks adaptability, and easily produces repetitive gameplay experiences
Pure RL Approach: Difficult to train general capability models, with issues including reward shaping, negative task transfer, and high computational costs
Large Model Approaches: Increasing model parameters or using large foundation models significantly increases training time and game latency
Core Observations: 36 ray casts for detecting targets, obstacles, and ammunition reload locations; floating-point observations including current health, ammunition count, and normalized target direction
Network Architecture:
Base Skills: MLP with depth 2 and width 64
Curriculum Learning: MLP with depth 2, width 128 + attention layer (attention dimension 60, maximum sequence length 20)
This paper cites 21 relevant references covering important works in game AI, reinforcement learning, behavior trees, and other research domains, providing solid theoretical foundation and technical support for the research.
Overall Assessment: This is a highly practical, application-oriented research paper that successfully transforms theoretical methods into usable tools, making important contributions to the game AI field. While there is room for improvement in theoretical depth and evaluation breadth, its open-source nature and complete implementation provide a solid foundation for subsequent research.