2025-11-17T07:07:13.254790

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

Zheng, Ye, Zhao et al.
Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However, existing approaches primarily focus on information processing and strategy selection, overlooking the significance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on making correct deductions but on convincing others to response in alignment with one's intent. To address this limitation, we formalize turn-based dialogue in SDGs as a Stackelberg competition, where the current player acts as the leader who strategically influences the follower's response. Building on this theoretical foundation, we propose a reinforcement learning framework that trains agents to optimize utterances for persuasive impact. Through comprehensive experiments across three diverse SDGs, we demonstrate that our agents significantly outperform baselines. This work represents a significant step toward developing AI agents capable of strategic social influence, with implications extending to scenarios requiring persuasive communication.
academic

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

Basic Information

  • Paper ID: 2510.09087
  • Title: Leading the Follower: Learning Persuasive Agents in Social Deduction Games
  • Authors: Zheng Zhang, Deheng Ye, Peilin Zhao, Hao Wang
  • Classification: cs.AI
  • Conference: ICLR 2026
  • Paper Link: https://arxiv.org/abs/2510.09087

Abstract

Large Language Model (LLM) agents have demonstrated significant progress in social deduction games (SDGs). However, existing methods primarily focus on information processing and strategy selection, overlooking the importance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on correct reasoning but also on persuading others to act according to one's intentions. To address this limitation, the authors formalize turn-based dialogue in SDGs as Stackelberg competition, where the current player acts as a leader strategically influencing the follower's response. Based on this theoretical foundation, the authors propose a reinforcement learning framework that trains agents to optimize the persuasive impact of utterances. Comprehensive experiments on three different SDGs demonstrate that the proposed method significantly outperforms baseline approaches.

Research Background and Motivation

Problem Definition

Existing LLM agents in social reasoning games face the following issues:

  1. Neglecting Persuasive Communication: Existing methods primarily focus on information processing and strategy selection, lacking consideration of persuasiveness
  2. Lack of Influence Modeling: No systematic modeling of how to influence other players' behavior through language
  3. Insufficient Local Optimization: Lack of strategic optimization for each utterance in turn-based dialogue

Research Significance

Social deduction games serve as ideal testing platforms for studying AI social intelligence because they:

  • Involve uncertainty, deception, and strategic communication
  • Require achieving victory conditions through persuading others
  • Reflect the complexity of real-world interpersonal interactions

Limitations of Existing Methods

  1. Strategy Selection Orientation: Existing methods such as ReAct and ReCon primarily focus on selecting strategies from predefined action spaces
  2. Lack of Persuasiveness Optimization: No specialized optimization for the persuasive effect of utterances
  3. Neglect of Dialogue Dynamics: Failure to fully exploit strategic opportunities in turn-based dialogue

Core Contributions

  1. Theoretical Innovation: Formalizes turn-based dialogue in SDGs as a Stackelberg competition model, providing a systematic theoretical foundation for persuasive communication
  2. Methodological Framework: Proposes a reinforcement learning framework that directly optimizes the influence of utterances on subsequent players' responses
  3. Experimental Validation: Validates the effectiveness and generalizability of the method on three different SDGs (Werewolf, Avalon, ONUW)
  4. Technical Contribution: Develops a complete training pipeline combining the advantages of API-based LLMs and open-source LLMs

Methodology Details

Task Definition

In social deduction games, players must influence other players' behavior through turn-based dialogue to achieve their respective victory conditions. This paper models each round of dialogue as a Stackelberg competition:

  • Input: Game rules R, current game state G_t, dialogue history D_t, player role r_t
  • Output: Optimized persuasive utterance u_t
  • Objective: Maximize favorable influence on the next player's response

Model Architecture

1. Intent Identification

(û⁺_{t+1}, û⁻_{t+1}) = f_identify(R, G_t, D_t, r_t)

The system analyzes the current situation and identifies the most desired and least desired responses from the next player.

2. Impact Measurement

Uses a dual-stage architecture:

  • Backend LLM (API-based): Generates base utterances
  • Refiner (open-source LLM): Optimizes utterance persuasiveness

Reward function design:

R(u_t^{(i)}) = log P_F(û⁺_{t+1}|context) - log P_F(û⁻_{t+1}|context)

3. Strategy Optimization

Uses GRPO (Group Relative Policy Optimization) to optimize the Refiner:

A^{(i)} = (R(u_t^{(i)}) - μ_n) / σ_n

where μ_n and σ_n are the mean and standard deviation of batch rewards.

Technical Innovations

  1. Stackelberg Modeling: First to model turn-based dialogue as a leader-follower game, capturing the essence of persuasion
  2. Dual-Stage Optimization: Combines the generative capability of API LLMs with the trainability of open-source LLMs
  3. Direct Utterance Optimization: Optimizes directly in natural language space rather than discrete action selection
  4. Relative Advantage Calculation: Uses GRPO to avoid the need for explicit value functions

Experimental Setup

Datasets

  • Game Types: Werewolf (7-player), Avalon (5-player), ONUW (5-player)
  • Training Data: 500 self-play games per game, with 4000 round instances randomly selected
  • Data Diversity: Uses three backend LLMs: GPT-4o, Gemini-2.5-Flash, Claude-3.5-Haiku

Evaluation Metrics

  • Win Rate: Victory percentage for different roles and factions
  • Overall Performance: Average win rate across all roles

Comparison Methods

  • Werewolf: ReAct, ReCon, SLA, LSPO
  • Avalon: ReAct, ReCon, LASI, Strategist
  • ONUW: ReAct, Belief, LLM-ins., RL-ins.

Implementation Details

  • Model: Llama-3-8B-Instruct as Refiner and Measurer
  • Training: LoRA adapter (rank=16), learning rate 1×10⁻⁶, 3 epochs
  • Hardware: 4 A800 GPUs, approximately 50 hours training time
  • Hyperparameters: n=8, ε=0.2, β=0.04

Experimental Results

Main Results

GameMethodVillager Team Win RateWerewolf Team Win RateOverall Win Rate
WerewolfLSPO25.3%73.2%39.0%
Ours + LSPO28.3%83.6%44.1%
AvalonStrategist77.9%27.3%57.7%
Ours + Strategist77.9%34.6%60.6%
ONUWRL-ins.54.5%47.6%48.9%
Ours + RL-ins.54.5%50.0%50.8%

Ablation Studies

Ablation studies were conducted on different variants of the reward function:

  1. Positive-Only: Only maximizes expected response probability
  2. Negative-Only: Only minimizes undesired response probability
  3. Complete: Considers both positive and negative feedback

Results show that the complete method significantly outperforms single-objective variants, demonstrating the necessity of bidirectional optimization.

Generalization Verification

Testing on GPT-5 and Qwen3-14B without additional training yields consistent performance improvements, demonstrating the cross-model generalization capability of the method.

Case Studies

The paper provides three detailed case studies:

  • Werewolf Case: Seer role successfully identifies Werewolf through clever reasoning and ally mobilization
  • Avalon Case: Minion gains team support through logical reconstruction and social pressure
  • ONUW Case: Werewolf successfully misleads villagers through false reasoning and attention redirection

SDG Agent Research

Early work was primarily based on rule systems, with recent shifts toward LLM-driven approaches:

  • Prompt Engineering Methods: Xu et al. (2023) on information retrieval and experience reflection
  • Reinforcement Learning Methods: SLA, LSPO, etc., selecting predefined actions through RL
  • Code Generation Methods: Strategist through code generation and tree search

LLM Reinforcement Learning

  • PPO/DPO: Optimizing LLMs through human feedback
  • GRPO: Relative optimization without explicit preference data

Game-Theoretic Modeling

  • Traditional Methods: Perfect Bayesian Equilibrium solving
  • Modern Applications: Success of DeepRole, Cicero in specific games

Conclusions and Discussion

Main Conclusions

  1. Persuasive communication is a key factor in SDG success
  2. Stackelberg modeling provides an effective framework for optimizing persuasiveness
  3. Direct utterance optimization is more effective than action selection
  4. The method demonstrates good cross-game and cross-model generalization

Limitations

  1. Computational Overhead: Requires multiple forward passes to compute probabilities
  2. Dependency: Still requires powerful backend LLM support
  3. Evaluation Limitations: Using a frozen Measurer may differ from actual opponents
  4. Game Scope: Currently validated only on three types of SDGs

Future Directions

  1. Extension to more types of social games
  2. Investigation of long-term persuasion strategies rather than single-round optimization
  3. Exploration of multimodal persuasion (voice, visual, etc.)
  4. Development of more efficient training methods

In-Depth Evaluation

Strengths

  1. Theoretical Innovation: Stackelberg modeling provides a new theoretical perspective for persuasive AI
  2. Advanced Technology: Cleverly combines the generative capability of API LLMs with the trainability of open-source LLMs
  3. Comprehensive Experiments: Full verification across multiple games, metrics, and ablations
  4. Practical Value: Can serve as a general plugin to enhance existing methods

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks theoretical guarantees on convergence of Stackelberg modeling
  2. Evaluation Bias: Using the same model as Measurer may introduce bias
  3. Computational Efficiency: High computational cost for training and inference
  4. Long-term Effects: Does not consider cumulative persuasion effects across multiple dialogue rounds

Impact

  1. Academic Contribution: Opens new directions for AI social intelligence research
  2. Practical Applications: Applicable to negotiation, education, customer service, and other persuasion-requiring scenarios
  3. Methodological Inspiration: Provides new modeling approaches for other multi-agent interaction tasks

Applicable Scenarios

  • Social games and online entertainment
  • Intelligent customer service and sales assistants
  • Educational tutoring and behavioral intervention
  • Negotiation and mediation systems
  • Social media content generation

References

This paper cites important works from multiple domains including social deduction games, reinforcement learning, and game theory, particularly:

  • Xu et al. (2024): SLA method
  • Light et al. (2025): Strategist method
  • Shao et al. (2024): GRPO algorithm
  • Bakhtin et al. (2022): Cicero system

Overall Assessment: This is a high-quality paper with significant contributions to the field of AI social intelligence. Through innovative theoretical modeling and effective technical implementation, it provides new research directions and practical methods for developing persuasive AI agents.