2025-11-17T07:07:13.254790

Leading the Follower: Learning Persuasive Agents in Social Deduction Games

Zheng, Ye, Zhao et al.

Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However, existing approaches primarily focus on information processing and strategy selection, overlooking the significance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on making correct deductions but on convincing others to response in alignment with one's intent. To address this limitation, we formalize turn-based dialogue in SDGs as a Stackelberg competition, where the current player acts as the leader who strategically influences the follower's response. Building on this theoretical foundation, we propose a reinforcement learning framework that trains agents to optimize utterances for persuasive impact. Through comprehensive experiments across three diverse SDGs, we demonstrate that our agents significantly outperform baselines. This work represents a significant step toward developing AI agents capable of strategic social influence, with implications extending to scenarios requiring persuasive communication.

academic

Basic Information

Paper ID: 2510.09087
Title: Leading the Follower: Learning Persuasive Agents in Social Deduction Games
Authors: Zheng Zhang, Deheng Ye, Peilin Zhao, Hao Wang
Classification: cs.AI
Conference: ICLR 2026
Paper Link: https://arxiv.org/abs/2510.09087

Abstract

Large Language Model (LLM) agents have demonstrated significant progress in social deduction games (SDGs). However, existing methods primarily focus on information processing and strategy selection, overlooking the importance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on correct reasoning but also on persuading others to act according to one's intentions. To address this limitation, the authors formalize turn-based dialogue in SDGs as Stackelberg competition, where the current player acts as a leader strategically influencing the follower's response. Based on this theoretical foundation, the authors propose a reinforcement learning framework that trains agents to optimize the persuasive impact of utterances. Comprehensive experiments on three different SDGs demonstrate that the proposed method significantly outperforms baseline approaches.

Research Background and Motivation

Problem Definition

Existing LLM agents in social reasoning games face the following issues:

Neglecting Persuasive Communication: Existing methods primarily focus on information processing and strategy selection, lacking consideration of persuasiveness
Lack of Influence Modeling: No systematic modeling of how to influence other players' behavior through language
Insufficient Local Optimization: Lack of strategic optimization for each utterance in turn-based dialogue

Research Significance

Social deduction games serve as ideal testing platforms for studying AI social intelligence because they:

Involve uncertainty, deception, and strategic communication
Require achieving victory conditions through persuading others
Reflect the complexity of real-world interpersonal interactions

Limitations of Existing Methods

Strategy Selection Orientation: Existing methods such as ReAct and ReCon primarily focus on selecting strategies from predefined action spaces
Lack of Persuasiveness Optimization: No specialized optimization for the persuasive effect of utterances
Neglect of Dialogue Dynamics: Failure to fully exploit strategic opportunities in turn-based dialogue

Core Contributions

Theoretical Innovation: Formalizes turn-based dialogue in SDGs as a Stackelberg competition model, providing a systematic theoretical foundation for persuasive communication
Methodological Framework: Proposes a reinforcement learning framework that directly optimizes the influence of utterances on subsequent players' responses
Experimental Validation: Validates the effectiveness and generalizability of the method on three different SDGs (Werewolf, Avalon, ONUW)
Technical Contribution: Develops a complete training pipeline combining the advantages of API-based LLMs and open-source LLMs

Methodology Details

Task Definition

In social deduction games, players must influence other players' behavior through turn-based dialogue to achieve their respective victory conditions. This paper models each round of dialogue as a Stackelberg competition:

Input: Game rules R, current game state G_t, dialogue history D_t, player role r_t
Output: Optimized persuasive utterance u_t
Objective: Maximize favorable influence on the next player's response

Model Architecture

1. Intent Identification

(û⁺_{t+1}, û⁻_{t+1}) = f_identify(R, G_t, D_t, r_t)

The system analyzes the current situation and identifies the most desired and least desired responses from the next player.

2. Impact Measurement

Uses a dual-stage architecture:

Backend LLM (API-based): Generates base utterances
Refiner (open-source LLM): Optimizes utterance persuasiveness

Reward function design:

R(u_t^{(i)}) = log P_F(û⁺_{t+1}|context) - log P_F(û⁻_{t+1}|context)

3. Strategy Optimization

Uses GRPO (Group Relative Policy Optimization) to optimize the Refiner:

A^{(i)} = (R(u_t^{(i)}) - μ_n) / σ_n

where μ_n and σ_n are the mean and standard deviation of batch rewards.

Technical Innovations

Stackelberg Modeling: First to model turn-based dialogue as a leader-follower game, capturing the essence of persuasion
Dual-Stage Optimization: Combines the generative capability of API LLMs with the trainability of open-source LLMs
Direct Utterance Optimization: Optimizes directly in natural language space rather than discrete action selection
Relative Advantage Calculation: Uses GRPO to avoid the need for explicit value functions

Experimental Setup

Datasets

Game Types: Werewolf (7-player), Avalon (5-player), ONUW (5-player)
Training Data: 500 self-play games per game, with 4000 round instances randomly selected
Data Diversity: Uses three backend LLMs: GPT-4o, Gemini-2.5-Flash, Claude-3.5-Haiku

Evaluation Metrics

Win Rate: Victory percentage for different roles and factions
Overall Performance: Average win rate across all roles

Comparison Methods

Werewolf: ReAct, ReCon, SLA, LSPO
Avalon: ReAct, ReCon, LASI, Strategist
ONUW: ReAct, Belief, LLM-ins., RL-ins.

Implementation Details

Model: Llama-3-8B-Instruct as Refiner and Measurer
Training: LoRA adapter (rank=16), learning rate 1×10⁻⁶, 3 epochs
Hardware: 4 A800 GPUs, approximately 50 hours training time
Hyperparameters: n=8, ε=0.2, β=0.04

Experimental Results

Main Results

Game	Method	Villager Team Win Rate	Werewolf Team Win Rate	Overall Win Rate
Werewolf	LSPO	25.3%	73.2%	39.0%
	Ours + LSPO	28.3%	83.6%	44.1%
Avalon	Strategist	77.9%	27.3%	57.7%
	Ours + Strategist	77.9%	34.6%	60.6%
ONUW	RL-ins.	54.5%	47.6%	48.9%
	Ours + RL-ins.	54.5%	50.0%	50.8%

Ablation Studies

Ablation studies were conducted on different variants of the reward function:

Positive-Only: Only maximizes expected response probability
Negative-Only: Only minimizes undesired response probability
Complete: Considers both positive and negative feedback

Results show that the complete method significantly outperforms single-objective variants, demonstrating the necessity of bidirectional optimization.

Generalization Verification

Testing on GPT-5 and Qwen3-14B without additional training yields consistent performance improvements, demonstrating the cross-model generalization capability of the method.

Case Studies

The paper provides three detailed case studies:

Werewolf Case: Seer role successfully identifies Werewolf through clever reasoning and ally mobilization
Avalon Case: Minion gains team support through logical reconstruction and social pressure
ONUW Case: Werewolf successfully misleads villagers through false reasoning and attention redirection

SDG Agent Research

Early work was primarily based on rule systems, with recent shifts toward LLM-driven approaches:

Prompt Engineering Methods: Xu et al. (2023) on information retrieval and experience reflection
Reinforcement Learning Methods: SLA, LSPO, etc., selecting predefined actions through RL
Code Generation Methods: Strategist through code generation and tree search

LLM Reinforcement Learning

PPO/DPO: Optimizing LLMs through human feedback
GRPO: Relative optimization without explicit preference data

Game-Theoretic Modeling

Traditional Methods: Perfect Bayesian Equilibrium solving
Modern Applications: Success of DeepRole, Cicero in specific games

Conclusions and Discussion

Main Conclusions

Persuasive communication is a key factor in SDG success
Stackelberg modeling provides an effective framework for optimizing persuasiveness
Direct utterance optimization is more effective than action selection
The method demonstrates good cross-game and cross-model generalization

Limitations

Computational Overhead: Requires multiple forward passes to compute probabilities
Dependency: Still requires powerful backend LLM support
Evaluation Limitations: Using a frozen Measurer may differ from actual opponents
Game Scope: Currently validated only on three types of SDGs

Future Directions

Extension to more types of social games
Investigation of long-term persuasion strategies rather than single-round optimization
Exploration of multimodal persuasion (voice, visual, etc.)
Development of more efficient training methods

In-Depth Evaluation

Strengths

Theoretical Innovation: Stackelberg modeling provides a new theoretical perspective for persuasive AI
Advanced Technology: Cleverly combines the generative capability of API LLMs with the trainability of open-source LLMs
Comprehensive Experiments: Full verification across multiple games, metrics, and ablations
Practical Value: Can serve as a general plugin to enhance existing methods

Weaknesses

Insufficient Theoretical Analysis: Lacks theoretical guarantees on convergence of Stackelberg modeling
Evaluation Bias: Using the same model as Measurer may introduce bias
Computational Efficiency: High computational cost for training and inference
Long-term Effects: Does not consider cumulative persuasion effects across multiple dialogue rounds

Impact

Academic Contribution: Opens new directions for AI social intelligence research
Practical Applications: Applicable to negotiation, education, customer service, and other persuasion-requiring scenarios
Methodological Inspiration: Provides new modeling approaches for other multi-agent interaction tasks

Applicable Scenarios

Social games and online entertainment
Intelligent customer service and sales assistants
Educational tutoring and behavioral intervention
Negotiation and mediation systems
Social media content generation

References

This paper cites important works from multiple domains including social deduction games, reinforcement learning, and game theory, particularly:

Xu et al. (2024): SLA method
Light et al. (2025): Strategist method
Shao et al. (2024): GRPO algorithm
Bakhtin et al. (2022): Cicero system

Overall Assessment: This is a high-quality paper with significant contributions to the field of AI social intelligence. Through innovative theoretical modeling and effective technical implementation, it provides new research directions and practical methods for developing persuasive AI agents.