2025-11-24T19:19:17.959493

Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola

Liu, Cann, Colbert et al.
While the rapid advancements in the reinforcement learning (RL) research community have been remarkable, the adoption in commercial video games remains slow. In this paper, we outline common challenges the Game AI community faces when using RL-driven NPCs in practice, and highlight the intersection of RL with traditional behavior trees (BTs) as a crucial juncture to be explored further. Although the BT+RL intersection has been suggested in several research papers, its adoption is rare. We demonstrate the viability of this approach using AMD Schola -- a plugin for training RL agents in Unreal Engine -- by creating multi-task NPCs in a complex 3D environment inspired by the commercial video game ``The Last of Us". We provide detailed methodologies for jointly training RL models with BTs while showcasing various skills.
academic

Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola

Basic Information

  • Paper ID: 2510.14154
  • Title: Combining Reinforcement Learning and Behavior Trees for NPCs in Video Games with AMD Schola
  • Authors: Tian Liu, Alex Cann, Ian Colbert, Mehdi Saeedi (Advanced Micro Devices)
  • Classification: cs.AI cs.LG
  • Publication Date: October 17, 2025 (preprint)
  • Paper Link: https://arxiv.org/abs/2510.14154

Abstract

Despite significant advances in reinforcement learning (RL) research, its application in commercial video games remains limited. This paper outlines common challenges faced by the game AI community when implementing RL-driven NPCs in practice, and highlights the intersection of RL with traditional behavior trees (BT) as a critical node requiring further exploration. While the combination of BT+RL has been mentioned in several research papers, its practical application remains scarce. The authors utilize AMD Schola—a plugin for training RL agents in Unreal Engine—to demonstrate the feasibility of this approach by creating multi-task NPCs in a complex 3D environment inspired by the commercial game The Last of Us.

Research Background and Motivation

1. Core Problem

Despite rapid advances in reinforcement learning technology, the adoption of RL-driven NPCs in commercial game development faces significant challenges. Traditional behavior tree approaches, while highly structured, become complex and lack adaptability when handling multiple tasks; conversely, RL methods, despite their dynamic adaptation capabilities, suffer from reward shaping difficulties, negative transfer learning, and high computational resource requirements.

2. Problem Significance

  • Game Experience: The consistency and human-likeness of NPC behavior are crucial for maintaining game quality and enhancing user experience
  • Development Efficiency: Game developers prefer reusing developed assets, requiring reusable and adjustable models
  • Technical Barriers: Insufficient tool support, particularly regarding interpretability and controllability

3. Limitations of Existing Approaches

  • Pure BT Approach: Complex multi-task BT development is tedious, lacks adaptability, and easily produces repetitive gameplay experiences
  • Pure RL Approach: Difficult to train general capability models, with issues including reward shaping, negative task transfer, and high computational costs
  • Large Model Approaches: Increasing model parameters or using large foundation models significantly increases training time and game latency

Core Contributions

  1. Proposed a BT+RL Hybrid Architecture: Integrating RL models into behavior trees, combining the advantages of both approaches
  2. Developed a Multi-Skill NPC System: Implementing five core skills including Flee, Search, Combat, Hide, and Move
  3. Constructed a Complete Training Framework: Based on the AMD Schola plugin, providing a complete solution for training and deployment in Unreal Engine
  4. Provided Empirical Validation: Verifying the method's effectiveness in a 3D environment inspired by The Last of Us
  5. Released Open-Source Implementation: Including environment, models, and implementation code to facilitate community research

Methodology Details

Task Definition

Building NPCs capable of executing multiple skills in complex 3D environments, specifically including:

  • Input: Environmental observations (depth information, health status, ammunition count, target direction, etc.)
  • Output: Action sequences (movement, shooting, rotation, etc.)
  • Constraints: Maintaining behavioral consistency and ensuring game balance

Model Architecture

1. Behavior Tree Structure

Root → Healthy? → [Ammo>0 → Collect → InSight → Combat]
                               ↓
                           Search → [Distance<2000 → Flee]
                                           ↓
                                        Hide

2. RL Model Configuration

  • Core Observations: 36 ray casts for detecting targets, obstacles, and ammunition reload locations; floating-point observations including current health, ammunition count, and normalized target direction
  • Network Architecture:
    • Base Skills: MLP with depth 2 and width 64
    • Curriculum Learning: MLP with depth 2, width 128 + attention layer (attention dimension 60, maximum sequence length 20)
  • Action Space: Lateral movement, forward movement, shooting

3. Skill-Specific Configuration

SkillSpecial ObservationsSpecial ActionsTermination ConditionTraining Steps
FleePlayer visibility, distanceMovementPlayer distance < 10002M
Combat-ShootingPlayer health ≤ 02M
HidePlayer visibility, obstacle distanceMovementPlayer discovered10M
CollectNearest ammunition locationMovementSuccessful reload12M

Technical Innovations

  1. Modular Design: Each skill trained independently, enabling reusability and composition
  2. Hierarchical Control: BT handles high-level decision-making, RL handles specific execution
  3. Interpretability: Developers can understand and adjust NPC behavior logic
  4. Consistency Guarantee: BT structure ensures behavioral predictability

Experimental Setup

Dataset

  • Environment: Closed square map of 4000×4000 units containing static obstacles and 8 ammunition reload points
  • NPC Configuration: 100 HP, 10 ammunition, 10 HP damage per attack, 0.15-second shooting interval, 600 units/second movement speed
  • Training Environment: Specialized training scenarios designed for each skill

Evaluation Metrics

  • Win Rate: Percentage of victories against different opponents
  • Average Steps: Duration of each game session
  • Damage Output: Damage inflicted when facing aggressive NPCs
  • FPS Performance: Frame rate during real-time execution

Comparison Methods

  1. Pure BT Baseline: Using the same tree structure with predefined BT tasks at leaf nodes
  2. Curriculum Learning RL: End-to-end RL model trained with 5-stage curriculum learning
  3. Static NPC: Non-moving, non-attacking test subject
  4. Aggressive NPC: Simplified BT control with combat advantage (unlimited ammunition)

Implementation Details

  • Optimization Algorithm: Proximal Policy Optimization (PPO)
  • Learning Rate: 3e-4
  • Maximum Steps: 2000 steps per episode
  • Training Framework: RLlib with AMD Schola plugin

Experimental Results

Main Results

Combat Performance Comparison

MethodWin Rate vs Static NPCWin Rate vs Aggressive NPCAverage StepsDamage Output
BT1.000.591839.63170.48
Hybrid Method1.000.533969.22149.86
Curriculum Learning1.000.413836.95137.80

Performance Analysis

  • Win Rate: Hybrid method significantly outperforms curriculum learning RL, only slightly lower than pure BT
  • Game Duration: BT method shows fewest steps with concentrated distribution; RL methods exhibit greater variability, indicating behavioral diversity
  • Computational Performance: Pure BT > Curriculum Learning > Hybrid Method

FPS Performance Testing

Configuration1 Agent10 Agents
No Model267.73±3.37188.83±4.14
BT261.90±10.88155.82±4.31
Hybrid Method211.90±4.11109.71±1.88
Curriculum Learning215.80±9.77116.14±2.54

Experimental Findings

  1. Behavioral Diversity: RL methods produce more diverse gameplay trajectories, increasing game unpredictability
  2. Performance Trade-offs: Hybrid method provides better adaptability while maintaining reasonable performance
  3. Optimization Potential: Further performance improvements possible through techniques such as batch processing

Main Research Directions

  1. RL Applications in Game AI: Behavioral cloning and reinforcement learning in games like Counter-Strike
  2. Multi-Task Reinforcement Learning: Knowledge sharing and contextual representation learning
  3. BT and RL Integration: Applications in safety-critical systems and robotics
  4. Large-Scale Models: Enhancing NPC capabilities through parameter scaling and foundation models

Differentiation of This Work

  • Practical Orientation: Focusing on actual game developer needs rather than pure research scenarios
  • Complete Toolchain: Providing end-to-end solutions from training to deployment
  • Open-Source Implementation: Promoting community adoption and further development

Conclusions and Discussion

Main Conclusions

  1. Feasibility Verification: BT+RL hybrid method demonstrates practical feasibility in game environments
  2. Balanced Advantages: Successfully combining RL's adaptability with BT's interpretability
  3. Modularity Benefits: Independently trained skill modules enhance reusability and development efficiency

Limitations

  1. Performance Overhead: Hybrid method's computational cost exceeds pure BT approach
  2. Complexity: Requires simultaneous maintenance of BT structure and multiple RL models
  3. Optimization Space: Insufficient exploration of performance optimization techniques such as batch processing
  4. Evaluation Scope: Primarily validated in specific game scenarios; generalization requires further verification

Future Directions

  1. Performance Optimization: Implementing model batch processing and other optimization techniques
  2. Architectural Improvements: Exploring more efficient BT+RL integration approaches
  3. Application Extension: Validating method effectiveness across more game types and scenarios
  4. Tool Enhancement: Improving AMD Schola plugin functionality and usability

In-Depth Evaluation

Strengths

  1. High Practical Value: Directly addressing industry needs with usable tools and methods
  2. Methodological Innovation: Effectively combining BT and RL advantages while avoiding individual limitations
  3. Comprehensive Experiments: Multi-faceted evaluation including performance, win rate, and computational efficiency
  4. Open-Source Contribution: Complete open-source release promotes community development and method adoption
  5. Complete Technical Details: Providing detailed implementation details and configuration parameters

Weaknesses

  1. Insufficient Theoretical Analysis: Lacking theoretical analysis and convergence guarantees for BT+RL combination
  2. Limited Evaluation Scenarios: Primarily validated in shooting game contexts; applicability to other game types unknown
  3. Limited Baseline Comparisons: Insufficient comparison with more advanced game AI methods
  4. Long-Term Stability: Lacking evaluation of stability and consistency during extended runtime
  5. User Experience: Absent subjective player evaluation of NPC behavior quality

Impact

  1. Academic Value: Providing practical hybrid method framework for game AI research
  2. Industrial Significance: Offering directly applicable tools and methods for game developers
  3. Technology Promotion: Open-source implementation facilitates widespread adoption and improvement
  4. Cross-Domain Applications: Method potentially applicable to other scenarios requiring intelligent decision-making

Applicable Scenarios

  1. Action Games: Shooting and fighting games requiring complex NPC behavior
  2. Strategy Games: Real-time strategy games requiring intelligent opponents
  3. RPG Games: Role-playing games requiring diverse NPC behavior
  4. Simulation Training: Simulation training systems in military and security domains

References

This paper cites 21 relevant references covering important works in game AI, reinforcement learning, behavior trees, and other research domains, providing solid theoretical foundation and technical support for the research.


Overall Assessment: This is a highly practical, application-oriented research paper that successfully transforms theoretical methods into usable tools, making important contributions to the game AI field. While there is room for improvement in theoretical depth and evaluation breadth, its open-source nature and complete implementation provide a solid foundation for subsequent research.