2025-11-24T23:31:16.955941

SIGN: Schema-Induced Games for Naming

Zhang, WoisetsclÃ¤ger

Real-world AI systems are tackling increasingly complex problems, often through interactions among large language model (LLM) agents. When these agents develop inconsistent conventions, coordination can break down. Applications such as collaborative coding and distributed planning therefore require reliable, consistent communication, and scalability is a central concern as systems grow. We introduce Schema-Induced Games for Naming (SIGN), a naming game that examines how lightweight structure can steer convention formation. We compare schema-induced communication to unconstrained natural language and find faster convergence with up to 5.8x higher agreement. These results suggest that minimal structure can act as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond the naming game.

academic

SIGN: Schema-Induced Games for Naming

Basic Information

Paper ID: 2510.21855
Title: SIGN: Schema-Induced Games for Naming
Authors: Ryan Zhang (Horace Greeley High School), Herbert Woisetschläger (Technical University of Munich)
Classification: cs.AI, cs.CL, cs.LG, cs.MA
Publication Date: October 22, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.21855

Abstract

Real-world AI systems are tackling increasingly complex problems, often through interactions between large language model (LLM) agents. When these agents form inconsistent conventions, coordination may collapse. Applications such as collaborative coding and distributed planning require reliable, consistent communication, with scalability being a core concern for system growth. This paper introduces Schema-Induced Games for Naming (SIGN), a naming game that investigates how lightweight structures guide convention formation. The study compares schema-induced communication with unconstrained natural language, finding that the former converges faster with consistency improvements of up to 5.8×. These results suggest that minimal structure can serve as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond naming games.

Research Background and Motivation

1. Core Problem to Address

With the development of LLM multi-agent systems, agents need to establish common naming conventions to achieve effective coordination. When agents form inconsistent conventions during interactions, it leads to coordination failures, affecting practical applications such as collaborative coding and distributed planning. This paper investigates how lightweight structured constraints can guide convention formation, improving consistency and convergence speed among agents.

2. Problem Importance

Practical Application Needs: Multi-agent systems in real-world applications (e.g., collaborative coding, distributed planning) require reliable communication protocols
Scalability Challenges: As system scale grows, maintaining consistency becomes increasingly difficult
Efficiency Requirements: Reducing the interaction cost (token consumption) needed to reach consensus is critical for practical deployment

3. Limitations of Existing Approaches

Natural Language Communication: While flexible, it lacks structure, leading to slow and unstable convention formation
Purely Free Convention Emergence: Convention formation relying on pure interaction is inefficient, requiring extensive interaction to reach consensus
Lack of Control Mechanisms: Existing research lacks simple and effective control means to guide convention formation

4. Research Motivation

Inspired by two lines of work:

Naming game research shows that conventions can emerge from interaction (Ashery et al. 2025)
Structured formats (e.g., JSON schema) improve LLM reasoning and collaboration in supervised tasks (Chen et al. 2024)

This paper poses a key question: Can lightweight schema priors guide convention formation itself?

Core Contributions

Proposes SIGN Framework: First to introduce schema-induced mechanisms into naming games, investigating how structured constraints affect convention formation in LLM agents
Empirically Validates Structured Communication Advantages:
- Convergence speed improved by an order of magnitude (significant token consumption reduction)
- Population agreement improved up to 5.8× (from 0.111 to 0.639)
Provides Controllable Coordination Mechanism: Demonstrates that schema constraints can serve as a model-agnostic "control knob" to simply and effectively improve multi-agent coordination
Cross-Model Validation: Verifies method effectiveness and robustness on Phi-3 and LLaMA models and their hybrid populations
Theoretical Insights: Reveals how minimal structural priors shape convention emergence processes, providing guidance for multi-agent system design

Method Details

Task Definition

The naming game is defined on the following setting:

Population: N agents
Vocabulary: Fixed vocabulary L = {C₁, ..., Cₘ}
Time Steps: t = 1, ..., T
Interaction Mechanism: Each round randomly pairs two agents
Objective: Through interaction, converge the population to a common naming convention

Input: Agent i generates message m^t_i at step t

Output: Decoder maps message to a name y^t_i ∈ L in the vocabulary

Constraint: Each agent maintains a memory window of size K, storing the most recent K interactions with partners

Three Experimental Conditions

1. Natural Language (NL)

Agents generate unconstrained natural language output
Decoder extracts valid tokens as much as possible
No memory mechanism (K=0)

2. Natural Language Sliding Window (NL-SW)

Extends NL condition with a memory window of size K
Recent interactions influence future proposals
Still uses natural language communication

3. Schema (Core Innovation)

Enforced Format: Requires replies matching @say {name: Ck} format
Parsing Mechanism: Uses regular expressions to extract Ck token
Error Handling:
- Non-compliant outputs get one retry opportunity (with reminder)
- If still invalid, decode free text
- If completely undecodable, set y ← None
Design Philosophy: Provides explicit, easily parseable vocabulary entry handles, maintaining transparency to listeners with minimal overhead

Algorithm Flow (Algorithm 1)

Input: N (number of agents), L (vocabulary), K (memory size), T (steps), α (adoption probability)

for t = 1 to T:
    1. Uniformly randomly pair agents i, j
    2. Each agent forms proposal m^t based on partner-specific K memories
    3. Parse @say {name: Ck} → y
    4. if non-compliant:
           retry once with reminder
           if still invalid:
               decode free text
               if undecodable:
                   y ← None
    5. if y_i ≠ y_j:
           adopt partner's Ck with probability α (lose-shift mechanism)

Technical Innovations

1. Lightweight Schema Design

Minimal Constraints: Only requires specific format labels, does not restrict content choice
Transparency: Clear format, easy to parse and debug
Flexibility: Retains sufficient freedom for convention emergence

2. Error Handling Mechanism

Single retry avoids over-penalization
Graceful degradation ensures experiment continuity
Balances structural constraints with practicality

3. Partner-Specific Memory

Only records history with interaction partners
Simulates local information in real social networks
Reduces memory complexity

4. Probabilistic Adoption Mechanism

Lose-shift strategy: adopt partner's choice with probability α when mismatch occurs
Parameter α controls learning speed
Models sociological learning dynamics

Experimental Setup

Dataset

Vocabulary: Fixed 12 entries (M=12)
No External Dataset: Pure simulation experiments, data generated through agent interactions

Experimental Parameters

Parameter	Value
Population Size (N)	12, 24
Vocabulary Size (M)	12
Total Steps (T)	300 (100 for mixed experiments)
Memory Window (K)	0, 5, 10
Adoption Probability (α)	0.5, 0.75, 0.9/0.99
Random Seeds	3

Model Configuration

Main Experimental Models:

Phi-3 Mini 4K Instruct
LLaMA 3.2 3B Instruct

Decoding Parameters (identical for both models):

max_new_tokens = 32
temperature = 0.7
top_p = 0.9
repeat_penalty = 1.1

Evaluation Metrics

Population Agreement
- Definition: Proportion of agents in population reaching same naming for specific concept
- Range: 0, 1, higher indicates better convention formation
Tokens-to-Convergence
- Definition: Total tokens needed to reach specific agreement threshold (50%, 60%, 70%)
- Key metric for measuring efficiency
Standard Deviation
- Measures stability across different runs

Comparison Methods

NL (Baseline 1): Unstructured, memory-free natural language communication
NL-SW (Baseline 2): Natural language communication with memory window
Schema (Proposed Method): Schema-induced structured communication

Experimental Results

Main Results

1. Significant Population Agreement Improvement (Table 1)

N	K	NL	NL-SW	Schema
12	0	0.111±0.048	—	—
24	0	0.125±0.042	—	—
12	5	—	0.278±0.127	0.611±0.293
24	5	—	0.292±0.042	0.556±0.064
12	10	—	0.333±0.144	0.639±0.096
24	10	—	0.295±0.039	0.588±0.085

Key Findings:

Schema achieves agreement of 0.556-0.639, compared to NL's 0.111-0.125, an improvement of 5-5.8×
Compared to NL-SW's 0.278-0.333, an improvement of approximately 2×
Best performance at K=10 (0.639), validating importance of memory

2. Impact of Different Adoption Probabilities (Figure 1)

α=0.5: Schema reaches 0.6-0.65, NL-SW approximately 0.3, NL below 0.2
α=0.75, 0.9: Similar trends, but slightly lower
Counter-intuitive Finding: Higher α (more aggressive adoption) slightly reduces agreement
Stability: Schema shows smallest standard deviation at α=0.5, most consistent results

3. Token Efficiency (Figure 2)

Tokens Required to Reach 50% Agreement:

Schema: approximately 10⁴ magnitude
NL-SW: approximately 10⁵ magnitude
NL: approximately 10⁵-10⁶ magnitude

Efficiency Improvement: Schema is one order of magnitude faster than NL/NL-SW

4. High-Threshold Convergence (Appendix Figures 5a, 5b)

60% Agreement:

Schema converges, requiring nearly two orders of magnitude fewer tokens than NL-SW
NL never reaches this threshold

70% Agreement:

Only Schema achieves convergence
Requires slightly more tokens than 60% threshold

Cross-Model Validation

1. LLaMA-Only Experiment (Figure 3)

Schema agreement: 0.75-0.8
NL and NL-SW: 0.65-0.7
Finding: LLaMA overall outperforms Phi, but Schema advantage remains significant

2. Mixed Model Experiment (Figure 4)

6 Phi-3 + 6 LLaMA 3.2
Limited to 100 rounds
Result: Schema maintains clear advantage in heterogeneous populations
Significance: Method is robust to model differences

Ablation Studies

While not explicitly labeled as ablation studies, the three-condition comparison allows analysis of factor contributions:

Role of Memory (NL vs NL-SW)
- Adding memory (K=5,10) improves agreement from 0.111 to 0.278-0.333
- Improvement of approximately 2.5-3×
Role of Schema (NL-SW vs Schema)
- Under same memory conditions, schema improves agreement from 0.278-0.333 to 0.556-0.639
- Improvement of approximately 1.7-2×
Combined Effect (NL vs Schema)
- Combined memory + schema achieves 5-5.8× improvement
- Not simple addition, exhibits synergistic effect

Experimental Findings

Structured Constraints are Key Driver: Schema improvement exceeds memory window contribution
Impact of Population Size:
- N increases from 12 to 24, agreement slightly decreases (expected scaling challenge)
- But Schema maintains absolute advantage
Marginal Effect of Memory Window:
- K increases from 5 to 10, limited improvement (0.611→0.639)
- Suggests K=5 already captures key information
Non-Monotonicity of Adoption Probability:
- α=0.5 performs best, challenging intuition that "more aggressive learning is better"
- Possible reason: too-rapid adoption causes local locking, hindering global optimization
Model Family Differences:
- LLaMA outperforms Phi in naming games
- Both benefit from Schema

1. Multi-Agent LLM Systems

Guo et al. 2024: Survey of multi-agent systems, identifying coordination and communication as core challenges
This paper's contribution: Provides specific coordination mechanism design

2. Convention Emergence Research

Baronchelli et al. 2008: Classical theoretical analysis of naming games
Ashery et al. 2025: Social conventions and collective bias in LLM populations
This paper's contribution: Introduces structured constraints as control variable, studying their impact on emergence processes

3. Structured Formats and LLM Reasoning

Chen et al. 2024: Alternative formats (e.g., JSON) enhance LLM reasoning and communication
This paper's contribution: Extends structured formats from single-agent tasks to multi-agent coordination scenarios

Theory→Practice: Applies naming games from theoretical models to actual LLM systems
Passive→Active: Not just observing convention emergence, but actively guiding its formation
Single-Task→General: Proposed mechanism has potential cross-task applicability

Conclusions and Discussion

Main Conclusions

Lightweight Schemas Effectively Guide Convention Formation: Fixed @say {name: Ck} format improves LLM agent agreement in naming games up to 5.8×
Significant Efficiency Gains: Achieving same agreement level, Schema requires one order of magnitude fewer tokens
Robustness Verification: Effects remain stable across different models (Phi-3, LLaMA), population sizes (12, 24), and heterogeneous settings
Power of Minimal Structural Priors: Even very simple structural constraints significantly shape emergence processes
Practical Control Mechanism: Schema constraints provide model-agnostic, easy-to-implement coordination control

Limitations

Limited Task Scope
- Validated only on naming games
- Not tested on more complex coordination tasks (e.g., dialogue, planning)
Small-Scale Experiments
- Maximum population size of 24 agents
- Fixed vocabulary of 12 entries
- Real applications may require larger scales
Limited Model Selection
- Only two model families tested (Phi-3, LLaMA)
- Does not include larger or more advanced models (e.g., GPT-4)
Round Limitations
- Main experiments 300 rounds, mixed experiments only 100 rounds
- May not fully observe long-term dynamics
Lack of Theoretical Analysis
- Primarily empirical research
- No deep theoretical explanation for why Schema works
Potential Flexibility Trade-offs
- Paper mentions need to study "whether consistency might limit broader tasks"
- Structured constraints may sacrifice expressiveness in some scenarios

Future Directions

Directions explicitly proposed in paper:

Test Schema Impact on LLM Response Variability
- Study trade-off between consistency and task diversity
Larger-Scale Experiments
- More agents, larger vocabularies
Alternative Schema Designs
- Explore effects of different structured formats
- Adaptive or learnable schemas
Longer Experimental Periods
- Observe long-term evolution dynamics
Extension to Other Tasks
- Collaborative coding, distributed planning, and other practical applications

Potential extension directions:

Theoretical Modeling: Establish mathematical models explaining how schemas accelerate convergence
Dynamic Schemas: Automatically adjust structuring degree based on task complexity
Human-Machine Hybrid: Test in systems with human participants
Adversarial Settings: Study structured constraint performance in competitive environments

In-Depth Evaluation

Strengths

1. Method Innovation

Simple Yet Effective: Proposed schema mechanism is extremely lightweight (only one format label), yet produces significant effects
Controllability: Provides clear control knob (schema present/absent), easy to apply in practice
Theory-Practice Integration: Connects classical naming game theory with modern LLM systems

2. Experimental Sufficiency

Multi-Dimensional Comparison: Three conditions (NL, NL-SW, Schema) clearly show each factor's role
Parameter Sweeping: Systematically tests different values of N, K, α
Cross-Model Validation: Includes single-model and mixed-model experiments
Multi-Threshold Analysis: 50%, 60%, 70% convergence analysis provides comprehensive perspective

3. Result Convincingness

Quantitatively Significant: 5.8× improvement and one order of magnitude efficiency gain are strong evidence
Statistical Stability: Three random seeds, reports standard deviations
Consistent Trends: All experimental configurations show Schema advantage

4. Writing Clarity

Clear Structure: Problem→Method→Experiments→Conclusion flows logically
Algorithm Description: Pseudocode is concise and clear
Visualization: Figures effectively communicate core findings
Open Source Commitment: Provides code links, promoting reproducibility

5. Practical Value

Low-Cost Deployment: Schema mechanism is easy to implement, requires no model retraining
Model-Agnostic: Applicable to any LLM supporting structured output
Broad Applicability: Principles extensible beyond naming games to other coordination tasks

Weaknesses

1. Insufficient Theoretical Depth

Lack of Mechanism Explanation: Why is simple format labeling so effective? Does it reduce search space? Improve parsing accuracy? Or something else?
No Convergence Analysis: No theoretical guarantees (e.g., convergence rate bounds)
Unexplained α Non-Monotonicity: Why does α=0.5 outperform α=0.9? Needs deeper analysis

2. Limited Experimental Scope

Single Task: Only naming games, generalization unknown
Small Scale: N≤24, M=12 may be insufficient for real applications
Short Duration: 300 rounds may not capture certain long-term phenomena (e.g., convention drift)

3. Incomplete Comparisons

Missing Alternative Formats: No comparison with XML, YAML, or other structured formats
No Optimal Baselines: Not compared with purpose-designed coordination protocols (e.g., voting mechanisms)
Unexplored Prompt Engineering: Could carefully designed prompts achieve similar effects in NL condition?

4. Shallow Analysis

No Error Analysis: Lacks detailed analysis of non-compliance types and causes
Missing Qualitative Analysis: No examples of actual agent-generated messages
Unexplored Memory Contents: What is stored in memory window? How does it influence decisions?

5. Insufficiently Discussed Negative Impacts

Flexibility Loss: Structured constraints may limit certain creative tasks
Error Propagation: If incorrect conventions form early, schema may accelerate their spread
Fairness: Different models may have different adaptation capabilities to schemas

6. Incomplete Implementation Details

Unquantified Error Handling Impact: Specific impact of retry and degradation handling on results not quantified
Decoding Parameter Justification: Rationale for choices like temperature=0.7 not explained
Pairing Strategy: Is uniform random pairing optimal?

Impact Assessment

1. Contribution to Field

Methodological Contribution: Provides new experimental paradigm for multi-agent LLM research
Empirical Contribution: First systematic quantification of structured constraints' impact on convention formation
Inspirational Value: Stimulates further research on "minimal effective structure"

2. Practical Value

Immediately Applicable: Simple method, directly applicable to existing systems
Cost-Benefit: Significant token consumption reduction, lower API call costs
Scalability: Provides foundation for building large-scale multi-agent systems

3. Reproducibility

High: Code repository provided, detailed parameter settings
Open Models: Uses open-source models (Phi-3, LLaMA)
Reasonable Compute: Small-scale experiments runnable on standard GPUs

4. Potential Application Scenarios

Collaborative Coding: Multiple AI assistants coordinating naming conventions during development
Distributed Planning: Multi-robot systems for task allocation and naming
Knowledge Graph Construction: Multi-agent collaborative entity and relation annotation
Multilingual Systems: Cross-language agent concept alignment

Applicability Analysis

Most Suitable Scenarios

Limited Discrete Choice Spaces: Classification, annotation tasks
Fast Convergence Needed: Real-time or resource-constrained applications
Heterogeneous Agent Systems: Different models need unified interface
Predefinable Formats: Tasks allow explicit output structure

Less Suitable Scenarios

Open-Ended Creative Tasks: Creative writing, brainstorming
Requiring Nuance: Structured formats may lose subtle information
Dynamically Evolving Tasks: Fixed schemas may limit adaptability
Human-Involved Dialogue: Over-structuring may harm user experience

Scenarios Requiring Caution

High-Risk Decisions: Need additional verification to prevent error convention propagation
Long-Running Systems: Need monitoring for convention drift and schema failure
Cross-Cultural/Cross-Domain Applications: Schema design needs domain-specific consideration

References

Key references cited in paper:

Ashery, A. F.; Aiello, L. M.; Baronchelli, A. (2025). Emergent social conventions and collective bias in LLM populations. Science Advances, 11(20): eadu9368.
- Social convention emergence in LLM populations
Baronchelli, A.; Loreto, V.; Steels, L. (2008). In-depth analysis of the Naming Game dynamics: the homogeneous mixing case. arXiv:0803.0398.
- Classical theoretical analysis of naming games
Chen, W. et al. (2024). Beyond natural language: LLMs leveraging alternative formats for enhanced reasoning and communication. arXiv:2402.18439.
- Structured formats enhance LLM reasoning
Guo, T. et al. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv:2402.01680.
- Survey of multi-agent LLM systems

Summary

The SIGN paper proposes a simple yet powerful idea: guide multi-agent system convention formation through minimal structured constraints. Experimental results are impressive, with 5.8× agreement improvement and order-of-magnitude efficiency gains providing strong support for practical applications.

Core Value lies in providing a low-cost, efficient, model-agnostic coordination mechanism, highly significant given the growing importance of multi-agent LLM systems. The method's simplicity is itself an advantage—requiring no complex training or architectural modifications, merely output format constraints significantly improve coordination.

Main Limitations are theoretical depth and application scope. The paper is more empirical demonstration than deep analysis, with future work needed to answer "why" and "when" questions. Extension to more complex tasks and larger-scale systems is necessary next step.

Overall, this is a well-executed, clearly-contributing research work providing practical tools and research insights for multi-agent coordination, worthy of attention and further exploration.