2025-11-24T19:19:17.959493

Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola

Liu, Cann, Colbert et al.

While the rapid advancements in the reinforcement learning (RL) research community have been remarkable, the adoption in commercial video games remains slow. In this paper, we outline common challenges the Game AI community faces when using RL-driven NPCs in practice, and highlight the intersection of RL with traditional behavior trees (BTs) as a crucial juncture to be explored further. Although the BT+RL intersection has been suggested in several research papers, its adoption is rare. We demonstrate the viability of this approach using AMD Schola -- a plugin for training RL agents in Unreal Engine -- by creating multi-task NPCs in a complex 3D environment inspired by the commercial video game ``The Last of Us". We provide detailed methodologies for jointly training RL models with BTs while showcasing various skills.

academic

Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola

Basic Information

Paper ID: 2510.14154
Title: Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola
Authors: Tian Liu, Alex Cann, Ian Colbert, Mehdi Saeedi (Advanced Micro Devices)
Classification: cs.AI cs.LG
Publication Date: October 17, 2025 (preprint)
Paper Link: https://arxiv.org/abs/2510.14154

Abstract

Despite significant advances in reinforcement learning (RL) research, its application in commercial video games remains limited. This paper outlines common challenges faced by the game AI community when implementing RL-driven NPCs in practice, and highlights the intersection of RL with traditional behavior trees (BT) as a critical node requiring further exploration. While the combination of BT+RL has been mentioned in several research papers, its practical application remains scarce. The authors utilize AMD Schola—a plugin for training RL agents in Unreal Engine—to demonstrate the feasibility of this approach by creating multi-task NPCs in a complex 3D environment inspired by the commercial game The Last of Us.

Research Background and Motivation

1. Core Problem

Despite rapid advances in reinforcement learning technology, the adoption of RL-driven NPCs in commercial game development faces significant challenges. Traditional behavior tree approaches, while highly structured, become complex and lack adaptability when handling multiple tasks; conversely, RL methods, despite their dynamic adaptation capabilities, suffer from reward shaping difficulties, negative transfer learning, and high computational resource requirements.

2. Problem Significance

Game Experience: The consistency and human-likeness of NPC behavior are crucial for maintaining game quality and enhancing user experience
Development Efficiency: Game developers prefer reusing developed assets, requiring reusable and adjustable models
Technical Barriers: Insufficient tool support, particularly regarding interpretability and controllability

3. Limitations of Existing Approaches

Pure BT Approach: Complex multi-task BT development is tedious, lacks adaptability, and easily produces repetitive gameplay experiences
Pure RL Approach: Difficult to train general capability models, with issues including reward shaping, negative task transfer, and high computational costs
Large Model Approaches: Increasing model parameters or using large foundation models significantly increases training time and game latency

Core Contributions

Proposed a BT+RL Hybrid Architecture: Integrating RL models into behavior trees, combining the advantages of both approaches
Developed a Multi-Skill NPC System: Implementing five core skills including Flee, Search, Combat, Hide, and Move
Constructed a Complete Training Framework: Based on the AMD Schola plugin, providing a complete solution for training and deployment in Unreal Engine
Provided Empirical Validation: Verifying the method's effectiveness in a 3D environment inspired by The Last of Us
Released Open-Source Implementation: Including environment, models, and implementation code to facilitate community research

Methodology Details

Task Definition

Building NPCs capable of executing multiple skills in complex 3D environments, specifically including:

Input: Environmental observations (depth information, health status, ammunition count, target direction, etc.)
Output: Action sequences (movement, shooting, rotation, etc.)
Constraints: Maintaining behavioral consistency and ensuring game balance

Model Architecture

1. Behavior Tree Structure

Root → Healthy? → [Ammo>0 → Collect → InSight → Combat]
                               ↓
                           Search → [Distance<2000 → Flee]
                                           ↓
                                        Hide

2. RL Model Configuration

Core Observations: 36 ray casts for detecting targets, obstacles, and ammunition reload locations; floating-point observations including current health, ammunition count, and normalized target direction
Network Architecture:
- Base Skills: MLP with depth 2 and width 64
- Curriculum Learning: MLP with depth 2, width 128 + attention layer (attention dimension 60, maximum sequence length 20)
Action Space: Lateral movement, forward movement, shooting

3. Skill-Specific Configuration

Skill	Special Observations	Special Actions	Termination Condition	Training Steps
Flee	Player visibility, distance	Movement	Player distance < 1000	2M
Combat	-	Shooting	Player health ≤ 0	2M
Hide	Player visibility, obstacle distance	Movement	Player discovered	10M
Collect	Nearest ammunition location	Movement	Successful reload	12M

Technical Innovations

Modular Design: Each skill trained independently, enabling reusability and composition
Hierarchical Control: BT handles high-level decision-making, RL handles specific execution
Interpretability: Developers can understand and adjust NPC behavior logic
Consistency Guarantee: BT structure ensures behavioral predictability

Experimental Setup

Dataset

Environment: Closed square map of 4000×4000 units containing static obstacles and 8 ammunition reload points
NPC Configuration: 100 HP, 10 ammunition, 10 HP damage per attack, 0.15-second shooting interval, 600 units/second movement speed
Training Environment: Specialized training scenarios designed for each skill

Evaluation Metrics

Win Rate: Percentage of victories against different opponents
Average Steps: Duration of each game session
Damage Output: Damage inflicted when facing aggressive NPCs
FPS Performance: Frame rate during real-time execution

Comparison Methods

Pure BT Baseline: Using the same tree structure with predefined BT tasks at leaf nodes
Curriculum Learning RL: End-to-end RL model trained with 5-stage curriculum learning
Static NPC: Non-moving, non-attacking test subject
Aggressive NPC: Simplified BT control with combat advantage (unlimited ammunition)

Implementation Details

Optimization Algorithm: Proximal Policy Optimization (PPO)
Learning Rate: 3e-4
Maximum Steps: 2000 steps per episode
Training Framework: RLlib with AMD Schola plugin

Experimental Results

Main Results

Combat Performance Comparison

Method	Win Rate vs Static NPC	Win Rate vs Aggressive NPC	Average Steps	Damage Output
BT	1.00	0.59	1839.63	170.48
Hybrid Method	1.00	0.53	3969.22	149.86
Curriculum Learning	1.00	0.41	3836.95	137.80

Performance Analysis

Win Rate: Hybrid method significantly outperforms curriculum learning RL, only slightly lower than pure BT
Game Duration: BT method shows fewest steps with concentrated distribution; RL methods exhibit greater variability, indicating behavioral diversity
Computational Performance: Pure BT > Curriculum Learning > Hybrid Method

FPS Performance Testing

Configuration	1 Agent	10 Agents
No Model	267.73±3.37	188.83±4.14
BT	261.90±10.88	155.82±4.31
Hybrid Method	211.90±4.11	109.71±1.88
Curriculum Learning	215.80±9.77	116.14±2.54

Experimental Findings

Behavioral Diversity: RL methods produce more diverse gameplay trajectories, increasing game unpredictability
Performance Trade-offs: Hybrid method provides better adaptability while maintaining reasonable performance
Optimization Potential: Further performance improvements possible through techniques such as batch processing

Main Research Directions

RL Applications in Game AI: Behavioral cloning and reinforcement learning in games like Counter-Strike
Multi-Task Reinforcement Learning: Knowledge sharing and contextual representation learning
BT and RL Integration: Applications in safety-critical systems and robotics
Large-Scale Models: Enhancing NPC capabilities through parameter scaling and foundation models

Differentiation of This Work

Practical Orientation: Focusing on actual game developer needs rather than pure research scenarios
Complete Toolchain: Providing end-to-end solutions from training to deployment
Open-Source Implementation: Promoting community adoption and further development

Conclusions and Discussion

Main Conclusions

Feasibility Verification: BT+RL hybrid method demonstrates practical feasibility in game environments
Balanced Advantages: Successfully combining RL's adaptability with BT's interpretability
Modularity Benefits: Independently trained skill modules enhance reusability and development efficiency

Limitations

Performance Overhead: Hybrid method's computational cost exceeds pure BT approach
Complexity: Requires simultaneous maintenance of BT structure and multiple RL models
Optimization Space: Insufficient exploration of performance optimization techniques such as batch processing
Evaluation Scope: Primarily validated in specific game scenarios; generalization requires further verification

Future Directions

Performance Optimization: Implementing model batch processing and other optimization techniques
Architectural Improvements: Exploring more efficient BT+RL integration approaches
Application Extension: Validating method effectiveness across more game types and scenarios
Tool Enhancement: Improving AMD Schola plugin functionality and usability

In-Depth Evaluation

Strengths

High Practical Value: Directly addressing industry needs with usable tools and methods
Methodological Innovation: Effectively combining BT and RL advantages while avoiding individual limitations
Comprehensive Experiments: Multi-faceted evaluation including performance, win rate, and computational efficiency
Open-Source Contribution: Complete open-source release promotes community development and method adoption
Complete Technical Details: Providing detailed implementation details and configuration parameters

Weaknesses

Insufficient Theoretical Analysis: Lacking theoretical analysis and convergence guarantees for BT+RL combination
Limited Evaluation Scenarios: Primarily validated in shooting game contexts; applicability to other game types unknown
Limited Baseline Comparisons: Insufficient comparison with more advanced game AI methods
Long-Term Stability: Lacking evaluation of stability and consistency during extended runtime
User Experience: Absent subjective player evaluation of NPC behavior quality

Impact

Academic Value: Providing practical hybrid method framework for game AI research
Industrial Significance: Offering directly applicable tools and methods for game developers
Technology Promotion: Open-source implementation facilitates widespread adoption and improvement
Cross-Domain Applications: Method potentially applicable to other scenarios requiring intelligent decision-making

Applicable Scenarios

Action Games: Shooting and fighting games requiring complex NPC behavior
Strategy Games: Real-time strategy games requiring intelligent opponents
RPG Games: Role-playing games requiring diverse NPC behavior
Simulation Training: Simulation training systems in military and security domains

References

This paper cites 21 relevant references covering important works in game AI, reinforcement learning, behavior trees, and other research domains, providing solid theoretical foundation and technical support for the research.

Overall Assessment: This is a highly practical, application-oriented research paper that successfully transforms theoretical methods into usable tools, making important contributions to the game AI field. While there is room for improvement in theoretical depth and evaluation breadth, its open-source nature and complete implementation provide a solid foundation for subsequent research.