2025-11-12T02:22:29.481811

PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network

Qiu, Ouano, Palafox et al.
While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game: a learning-based, game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete information games where agents' intentions are unknown. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players in the game, and therefore reducing the number of variables in the corresponding optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it 1) relies solely on observations of players' past trajectories, without requiring full state, action, or other game-specific information; and 2) requires no online parameter tuning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that PSNs outperform baseline selection methods in 1) prediction accuracy; and 2) planning safety. PSNs also generalize effectively to real-world scenarios in which agents' objectives are unknown without fine-tuning. By selecting only the most relevant players for decision-making, PSN Game offers a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.
academic

PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network

Basic Information

  • Paper ID: 2505.00213
  • Title: PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network
  • Authors: Tianyu Qiu, Eric Ouano, Fernando Palafox, Christian Ellis, David Fridovich-Keil (University of Texas at Austin)
  • Categories: cs.RO (Robotics), math.OC (Optimization and Control)
  • Publication Date: 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2505.00213

Abstract

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large-scale optimization problems whose variable count increases with the number of agents, resulting in excessive computational time and limiting applicability in large-scale real-time systems. To address this challenge, this paper proposes: 1) PSN Game—a learning-based game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); 2) Goal Inference Network (GIN), enabling PSN to operate in incomplete information games where agent intentions are unknown. PSN outputs player selection masks that distinguish influential players from less relevant ones, allowing the ego agent to solve smaller-scale masked games involving only selected players. By reducing the number of players in the game and consequently the variables in the corresponding optimization problem, PSN directly decreases computational time.

Research Background and Motivation

Problem Definition

The core challenge faced by game-theoretic planning frameworks in multi-agent systems is that computational complexity grows cubically with the number of agents. As shown in Figure 2, using existing solvers, computation time grows as O(N³), where N is the number of players. This renders game-theoretic methods impractical for large-scale real-time systems.

Research Significance

  1. Real-time Requirements: Applications such as autonomous driving and robot navigation require frequent replanning, making computational efficiency critical
  2. Scalability Challenges: Real-world scenarios often involve large numbers of agents (e.g., dense traffic environments)
  3. Human Behavior Inspiration: Research shows that human drivers in dense traffic instinctively prioritize nearby threatening vehicles rather than monitoring all vehicles

Limitations of Existing Methods

Existing player selection methods suffer from the following issues:

  1. Strong Information Dependency: Require game-specific information such as control inputs and cost functions
  2. Complex Parameter Tuning: Necessitate environment-specific parameter adjustments
  3. Rigid Selection Strategies: Ranking methods based on simple heuristics (e.g., distance, gradient) lack adaptability

Core Contributions

  1. Proposed Unsupervised Player Selection Network (PSN): Trained using differentiable dynamic game solvers, supporting backpropagation through selection masks
  2. Developed Supervised Goal Inference Network (GIN): Infers agent objectives from historical trajectories, enabling PSN applicability in scenarios with unknown intentions
  3. Developed Receding Horizon Framework: Leverages PSN to efficiently identify equilibrium strategies by solving reduced-scale games
  4. Experimental Validation: Verified on multi-agent simulations and real human trajectory datasets; PSN Game effectively reduces game scale by 50%-75%, achieving significant acceleration

Methodology Details

Task Definition

Consider a finite-horizon discrete-time open-loop Nash game with N agents, where each agent i has state xkiRnx_k^i \in \mathbb{R}^n and control input ukiRmu_k^i \in \mathbb{R}^m. Agent state transitions follow the dynamics equation: xk+1i=fi(xki,uki)x_{k+1}^i = f^i(x_k^i, u_k^i)

Each agent's objective is to minimize cumulative cost: Ji(x,u;θi)=k=0Tcki(xk,uk;θi)J^i(x,u;\theta^i) = \sum_{k=0}^T c_k^i(x_k, u_k; \theta^i)

Model Architecture

1. Player Selection Network (PSN)

PSN is a neural network tasked with inferring mask MiM^i to balance performance and sparsity. Two variants are provided:

  • PSN-Full: Input is complete historical state of all agents x0:Kx_{0:K}
  • PSN-Partial: Input is partial observations {h(xk)}k=0K\{h(x_k)\}_{k=0}^K (e.g., position information only)

Network Structure:

  • GRU encoder (hidden dimension 64) processes K-step sequences for each agent
  • MLP layers: 256→128→32 (ReLU activation, dropout=0.3)
  • Sigmoid output layer produces continuous masks mji[0,1]m_j^i \in [0,1]

2. Masked Nash Game

Define player selection mask Mi=(mji){0,1}N1M^i = (m_j^i) \in \{0,1\}^{N-1}, where: mji={1,agent j is included in the game0,agent j is excludedm_j^i = \begin{cases} 1, & \text{agent j is included in the game} \\ 0, & \text{agent j is excluded} \end{cases}

The masked game Γi(x~0,f~;θ,Mi)\Gamma^i(\tilde{x}_0, \tilde{f}; \theta, M^i) retains only agent parameters and states most relevant to agent i.

3. Goal Inference Network (GIN)

GIN is a data-driven network that infers agent objectives pgip_g^i from partial trajectory observations:

  • Input: Historical trajectory {h(xk)}k=0K\{h(x_k)\}_{k=0}^K
  • Output: 2D goal position pgip_g^i
  • Loss function: Mean squared error LGoal=1DNdDi[N]pg,refiGϕ(x0:K)L_{Goal} = \frac{1}{|D| \cdot N}\sum_{d \in D}\sum_{i \in [N]} \|p_{g,ref}^i - G_\phi(x_{0:K})\|

Loss Function Design

PSN training employs a composite loss function:

Prediction Task: LPred=LBinary+σ1LSparsity+σ2LSimilarityL_{Pred} = L_{Binary} + \sigma_1 L_{Sparsity} + \sigma_2 L_{Similarity}

Planning Task: LPlan=LBinary+σ3LSparsity+σ4LCostL_{Plan} = L_{Binary} + \sigma_3 L_{Sparsity} + \sigma_4 L_{Cost}

Where:

  • LBinary=1Nj=1Nmji(1mji)L_{Binary} = \frac{1}{N}\sum_{j=1}^N m_j^i(1-m_j^i): Encourages masks to converge to binary values
  • LSparsity=Mi1NL_{Sparsity} = \frac{\|M^i\|_1}{N}: Encourages selection of fewer agents
  • LSimilarityL_{Similarity}: Encourages selection of agent subsets that recover observed trajectories
  • LCostL_{Cost}: Encourages selection of appropriate agents to minimize game cost

Receding Horizon Implementation

The algorithm at each time step kk:

  1. Infers agent objectives through GIN
  2. Obtains adaptive mask MkiM_k^i using PSN
  3. Solves masked game to obtain OLNE policy
  4. Executes first-step control input and updates state

Experimental Setup

Datasets

Simulation Scenarios:

  • 4-agent scenario: 5m×5m environment
  • 10-agent scenario: 7m×7m environment
  • 20-agent scenario: 7m×7m environment (scalability testing)

Real Data:

  • CITR pedestrian dataset: 7.5m×25.5m environment, 10 pedestrians

Game Settings:

  • State space: 4-dimensional (position + velocity)
  • Dynamics: Double integrator model
  • Cost function: Includes goal tracking, velocity penalty, control cost, and collision avoidance terms

Evaluation Metrics

Prediction Task:

  • ADE (Average Displacement Error)
  • FDE (Final Displacement Error)
  • Consistency: Selection consistency

Planning Task:

  • Navigation Cost
  • Collision Cost
  • Control Cost
  • Minimum Distance
  • Trajectory Smoothness
  • Trajectory Length

Comparison Methods

  1. PSN Variants: PSN-Threshold, PSN-Rank
  2. Distance Methods: Distance, kNNs
  3. Gradient Methods: Gradient, Hessian, Cost Evolution
  4. Constraint Methods: BF (Barrier Function), CBF (Control Barrier Function)

Experimental Results

Main Results

Prediction Performance (Table 2)

In 4-agent scenario:

  • PSN-Full ADE: 0.1834m (optimal)
  • PSN-Partial ADE: 0.1876m
  • Best baseline (Cost Evolution) ADE: 0.1861m

In 10-agent scenario:

  • PSN-Partial ADE: 0.2213m (optimal)
  • PSN-Full ADE: 0.2314m
  • Best baseline (kNNs) ADE: 0.2343m

Planning Performance (Table 3)

PSN demonstrates outstanding performance on safety metrics:

  • Collision Cost: Achieves optimality in both 4 and 10-agent scenarios
  • Minimum Distance: Ranks in top three in both scenarios
  • Safety performance degradation compared to complete game is minimal

Incomplete Information Game Adaptability

In scenarios with unknown objectives (Tables 4-5), PSN combined with GIN:

  • Prediction metrics maintain top-two ranking
  • Planning metrics rank in top three
  • Safety metrics remain optimal

Scalability Verification

PSN trained on 10-agent scenario applied to 20-agent scenario:

  • PSN-Full ADE: 0.3108m (optimal)
  • PSN-Partial ADE: 0.3152m (second)
  • Adapts to larger-scale scenarios without retraining

Real Data Generalization

On CITR pedestrian dataset:

  • PSN-Partial ADE: 0.4931m (optimal)
  • PSN-Full ADE: 0.4966m
  • Even outperforms solving the complete game

Game-Theoretic Planning

Existing methods primarily focus on small-scale environments (≤5 agents), using Newton-style solvers for iterative solution. Complexity grows cubically with player count, becoming a fundamental barrier for large-scale real-time systems.

Player Selection Methods

Threshold Methods: Select agents based on predefined distance thresholds, requiring extensive parameter tuning.

Ranking Methods: Select top-k most influential agents, but fixed-count selection may be overly aggressive or conservative.

Advantages of This Work:

  1. Requires only historical trajectory observations, no privileged information
  2. No online parameter tuning needed
  3. Adaptively selects number of agents

Conclusions and Discussion

Main Conclusions

  1. PSN Game successfully reduces game scale by 50%-75%, achieving significant computational acceleration
  2. Outperforms existing baseline methods in prediction accuracy and planning safety
  3. Demonstrates good generalization capability, adapting to larger-scale scenarios and real data

Limitations

  1. Continuous Mask Approximation: Uses continuous masks to support backpropagation; requires thresholding conversion at runtime
  2. Training Data Dependency: Requires training data generated using complete games
  3. Dynamics Model Assumptions: Experiments primarily based on double integrator model

Future Directions

  1. Extend to more complex dynamics models
  2. Investigate other types of game parameter inference
  3. Explore end-to-end training strategies

In-Depth Evaluation

Strengths

  1. Problem Importance: Addresses core computational bottleneck in game-theoretic planning
  2. Methodological Innovation: First to combine deep learning with game theory for player selection
  3. Experimental Comprehensiveness: Covers simulations and real data, validating multiple hypotheses
  4. Practical Value: Provides general mechanism directly integrable into existing frameworks

Weaknesses

  1. Lack of Theoretical Analysis: No theoretical guarantees on convergence or approximation error
  2. Computational Overhead: While reducing game-solving time, adds network inference overhead
  3. Scenario Limitations: Primarily validated on navigation tasks; applicability to other game types unknown

Impact

  1. Academic Contribution: Provides new solution paradigm for large-scale multi-agent systems
  2. Practical Applications: Direct applicability in autonomous driving, robot swarms, and related domains
  3. Reproducibility: Provides detailed implementation details and parameter settings

Applicable Scenarios

  1. Dense Multi-Agent Environments: Such as urban traffic and crowd navigation
  2. High Real-Time Requirement Systems: Dynamic environments requiring frequent replanning
  3. Partially Observable Scenarios: Incomplete information games with unknown agent intentions

References

The paper cites 35 relevant references, primarily covering:

  • Game-theoretic planning methods 8, 28, 9, 15
  • Multi-agent interaction modeling 17, 20, 21, 24
  • Player selection heuristics 2, 27, 30
  • Attention mechanisms and neighbor selection 3, 6, 32

This paper makes significant contributions to addressing computational complexity in game-theoretic planning. Through learning-driven player selection, it substantially improves computational efficiency in multi-agent systems, paving the way for large-scale real-time applications.