2025-11-12T02:22:29.481811

PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network

Qiu, Ouano, Palafox et al.

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game: a learning-based, game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete information games where agents' intentions are unknown. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players in the game, and therefore reducing the number of variables in the corresponding optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it 1) relies solely on observations of players' past trajectories, without requiring full state, action, or other game-specific information; and 2) requires no online parameter tuning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that PSNs outperform baseline selection methods in 1) prediction accuracy; and 2) planning safety. PSNs also generalize effectively to real-world scenarios in which agents' objectives are unknown without fine-tuning. By selecting only the most relevant players for decision-making, PSN Game offers a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.

academic

PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network

Basic Information

Paper ID: 2505.00213
Title: PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network
Authors: Tianyu Qiu, Eric Ouano, Fernando Palafox, Christian Ellis, David Fridovich-Keil (University of Texas at Austin)
Categories: cs.RO (Robotics), math.OC (Optimization and Control)
Publication Date: 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2505.00213

Abstract

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large-scale optimization problems whose variable count increases with the number of agents, resulting in excessive computational time and limiting applicability in large-scale real-time systems. To address this challenge, this paper proposes: 1) PSN Game—a learning-based game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); 2) Goal Inference Network (GIN), enabling PSN to operate in incomplete information games where agent intentions are unknown. PSN outputs player selection masks that distinguish influential players from less relevant ones, allowing the ego agent to solve smaller-scale masked games involving only selected players. By reducing the number of players in the game and consequently the variables in the corresponding optimization problem, PSN directly decreases computational time.

Research Background and Motivation

Problem Definition

The core challenge faced by game-theoretic planning frameworks in multi-agent systems is that computational complexity grows cubically with the number of agents. As shown in Figure 2, using existing solvers, computation time grows as O(N³), where N is the number of players. This renders game-theoretic methods impractical for large-scale real-time systems.

Research Significance

Real-time Requirements: Applications such as autonomous driving and robot navigation require frequent replanning, making computational efficiency critical
Scalability Challenges: Real-world scenarios often involve large numbers of agents (e.g., dense traffic environments)
Human Behavior Inspiration: Research shows that human drivers in dense traffic instinctively prioritize nearby threatening vehicles rather than monitoring all vehicles

Limitations of Existing Methods

Existing player selection methods suffer from the following issues:

Strong Information Dependency: Require game-specific information such as control inputs and cost functions
Complex Parameter Tuning: Necessitate environment-specific parameter adjustments
Rigid Selection Strategies: Ranking methods based on simple heuristics (e.g., distance, gradient) lack adaptability

Core Contributions

Proposed Unsupervised Player Selection Network (PSN): Trained using differentiable dynamic game solvers, supporting backpropagation through selection masks
Developed Supervised Goal Inference Network (GIN): Infers agent objectives from historical trajectories, enabling PSN applicability in scenarios with unknown intentions
Developed Receding Horizon Framework: Leverages PSN to efficiently identify equilibrium strategies by solving reduced-scale games
Experimental Validation: Verified on multi-agent simulations and real human trajectory datasets; PSN Game effectively reduces game scale by 50%-75%, achieving significant acceleration

Methodology Details

Task Definition

Consider a finite-horizon discrete-time open-loop Nash game with N agents, where each agent i has state $x_k^i \in \mathbb{R}^n$ and control input $u_k^i \in \mathbb{R}^m$ . Agent state transitions follow the dynamics equation: $x_{k+1}^i = f^i(x_k^i, u_k^i)$

Each agent's objective is to minimize cumulative cost: $J^i(x,u;\theta^i) = \sum_{k=0}^T c_k^i(x_k, u_k; \theta^i)$

Model Architecture

1. Player Selection Network (PSN)

PSN is a neural network tasked with inferring mask $M^i$ to balance performance and sparsity. Two variants are provided:

PSN-Full: Input is complete historical state of all agents $x_{0:K}$
PSN-Partial: Input is partial observations $\{h(x_k)\}_{k=0}^K$ (e.g., position information only)

Network Structure:

GRU encoder (hidden dimension 64) processes K-step sequences for each agent
MLP layers: 256→128→32 (ReLU activation, dropout=0.3)
Sigmoid output layer produces continuous masks $m_j^i \in [0,1]$

2. Masked Nash Game

Define player selection mask $M^i = (m_j^i) \in \{0,1\}^{N-1}$ , where: $m_j^i = \begin{cases} 1, & \text{agent j is included in the game} \\ 0, & \text{agent j is excluded} \end{cases}$

The masked game $\Gamma^i(\tilde{x}_0, \tilde{f}; \theta, M^i)$ retains only agent parameters and states most relevant to agent i.

3. Goal Inference Network (GIN)

GIN is a data-driven network that infers agent objectives $p_g^i$ from partial trajectory observations:

Input: Historical trajectory $\{h(x_k)\}_{k=0}^K$
Output: 2D goal position $p_g^i$
Loss function: Mean squared error $L_{Goal} = \frac{1}{|D| \cdot N}\sum_{d \in D}\sum_{i \in [N]} \|p_{g,ref}^i - G_\phi(x_{0:K})\|$

Loss Function Design

PSN training employs a composite loss function:

Prediction Task: $L_{Pred} = L_{Binary} + \sigma_1 L_{Sparsity} + \sigma_2 L_{Similarity}$

Planning Task: $L_{Plan} = L_{Binary} + \sigma_3 L_{Sparsity} + \sigma_4 L_{Cost}$

Where:

$L_{Binary} = \frac{1}{N}\sum_{j=1}^N m_j^i(1-m_j^i)$ : Encourages masks to converge to binary values
$L_{Sparsity} = \frac{\|M^i\|_1}{N}$ : Encourages selection of fewer agents
$L_{Similarity}$ : Encourages selection of agent subsets that recover observed trajectories
$L_{Cost}$ : Encourages selection of appropriate agents to minimize game cost

Receding Horizon Implementation

The algorithm at each time step $k$ :

Infers agent objectives through GIN
Obtains adaptive mask $M_k^i$ using PSN
Solves masked game to obtain OLNE policy
Executes first-step control input and updates state

Experimental Setup

Datasets

Simulation Scenarios:

4-agent scenario: 5m×5m environment
10-agent scenario: 7m×7m environment
20-agent scenario: 7m×7m environment (scalability testing)

Real Data:

CITR pedestrian dataset: 7.5m×25.5m environment, 10 pedestrians

Game Settings:

State space: 4-dimensional (position + velocity)
Dynamics: Double integrator model
Cost function: Includes goal tracking, velocity penalty, control cost, and collision avoidance terms

Evaluation Metrics

Prediction Task:

ADE (Average Displacement Error)
FDE (Final Displacement Error)
Consistency: Selection consistency

Planning Task:

Navigation Cost
Collision Cost
Control Cost
Minimum Distance
Trajectory Smoothness
Trajectory Length

Comparison Methods

PSN Variants: PSN-Threshold, PSN-Rank
Distance Methods: Distance, kNNs
Gradient Methods: Gradient, Hessian, Cost Evolution
Constraint Methods: BF (Barrier Function), CBF (Control Barrier Function)

Experimental Results

Main Results

Prediction Performance (Table 2)

In 4-agent scenario:

PSN-Full ADE: 0.1834m (optimal)
PSN-Partial ADE: 0.1876m
Best baseline (Cost Evolution) ADE: 0.1861m

In 10-agent scenario:

PSN-Partial ADE: 0.2213m (optimal)
PSN-Full ADE: 0.2314m
Best baseline (kNNs) ADE: 0.2343m

Planning Performance (Table 3)

PSN demonstrates outstanding performance on safety metrics:

Collision Cost: Achieves optimality in both 4 and 10-agent scenarios
Minimum Distance: Ranks in top three in both scenarios
Safety performance degradation compared to complete game is minimal

Incomplete Information Game Adaptability

In scenarios with unknown objectives (Tables 4-5), PSN combined with GIN:

Prediction metrics maintain top-two ranking
Planning metrics rank in top three
Safety metrics remain optimal

Scalability Verification

PSN trained on 10-agent scenario applied to 20-agent scenario:

PSN-Full ADE: 0.3108m (optimal)
PSN-Partial ADE: 0.3152m (second)
Adapts to larger-scale scenarios without retraining

Real Data Generalization

On CITR pedestrian dataset:

PSN-Partial ADE: 0.4931m (optimal)
PSN-Full ADE: 0.4966m
Even outperforms solving the complete game

Game-Theoretic Planning

Existing methods primarily focus on small-scale environments (≤5 agents), using Newton-style solvers for iterative solution. Complexity grows cubically with player count, becoming a fundamental barrier for large-scale real-time systems.

Player Selection Methods

Threshold Methods: Select agents based on predefined distance thresholds, requiring extensive parameter tuning.

Ranking Methods: Select top-k most influential agents, but fixed-count selection may be overly aggressive or conservative.

Advantages of This Work:

Requires only historical trajectory observations, no privileged information
No online parameter tuning needed
Adaptively selects number of agents

Conclusions and Discussion

Main Conclusions

PSN Game successfully reduces game scale by 50%-75%, achieving significant computational acceleration
Outperforms existing baseline methods in prediction accuracy and planning safety
Demonstrates good generalization capability, adapting to larger-scale scenarios and real data

Limitations

Continuous Mask Approximation: Uses continuous masks to support backpropagation; requires thresholding conversion at runtime
Training Data Dependency: Requires training data generated using complete games
Dynamics Model Assumptions: Experiments primarily based on double integrator model

Future Directions

Extend to more complex dynamics models
Investigate other types of game parameter inference
Explore end-to-end training strategies

In-Depth Evaluation

Strengths

Problem Importance: Addresses core computational bottleneck in game-theoretic planning
Methodological Innovation: First to combine deep learning with game theory for player selection
Experimental Comprehensiveness: Covers simulations and real data, validating multiple hypotheses
Practical Value: Provides general mechanism directly integrable into existing frameworks

Weaknesses

Lack of Theoretical Analysis: No theoretical guarantees on convergence or approximation error
Computational Overhead: While reducing game-solving time, adds network inference overhead
Scenario Limitations: Primarily validated on navigation tasks; applicability to other game types unknown

Impact

Academic Contribution: Provides new solution paradigm for large-scale multi-agent systems
Practical Applications: Direct applicability in autonomous driving, robot swarms, and related domains
Reproducibility: Provides detailed implementation details and parameter settings

Applicable Scenarios

Dense Multi-Agent Environments: Such as urban traffic and crowd navigation
High Real-Time Requirement Systems: Dynamic environments requiring frequent replanning
Partially Observable Scenarios: Incomplete information games with unknown agent intentions

References

The paper cites 35 relevant references, primarily covering:

Game-theoretic planning methods 8, 28, 9, 15
Multi-agent interaction modeling 17, 20, 21, 24
Player selection heuristics 2, 27, 30
Attention mechanisms and neighbor selection 3, 6, 32

This paper makes significant contributions to addressing computational complexity in game-theoretic planning. Through learning-driven player selection, it substantially improves computational efficiency in multi-agent systems, paving the way for large-scale real-time applications.