2025-11-24T23:31:16.955941

SIGN: Schema-Induced Games for Naming

Zhang, Woisetscläger
Real-world AI systems are tackling increasingly complex problems, often through interactions among large language model (LLM) agents. When these agents develop inconsistent conventions, coordination can break down. Applications such as collaborative coding and distributed planning therefore require reliable, consistent communication, and scalability is a central concern as systems grow. We introduce Schema-Induced Games for Naming (SIGN), a naming game that examines how lightweight structure can steer convention formation. We compare schema-induced communication to unconstrained natural language and find faster convergence with up to 5.8x higher agreement. These results suggest that minimal structure can act as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond the naming game.
academic

SIGN: Schema-Induced Games for Naming

Basic Information

  • Paper ID: 2510.21855
  • Title: SIGN: Schema-Induced Games for Naming
  • Authors: Ryan Zhang (Horace Greeley High School), Herbert Woisetschläger (Technical University of Munich)
  • Classification: cs.AI, cs.CL, cs.LG, cs.MA
  • Publication Date: October 22, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.21855

Abstract

Real-world AI systems are tackling increasingly complex problems, often through interactions between large language model (LLM) agents. When these agents form inconsistent conventions, coordination may collapse. Applications such as collaborative coding and distributed planning require reliable, consistent communication, with scalability being a core concern for system growth. This paper introduces Schema-Induced Games for Naming (SIGN), a naming game that investigates how lightweight structures guide convention formation. The study compares schema-induced communication with unconstrained natural language, finding that the former converges faster with consistency improvements of up to 5.8×. These results suggest that minimal structure can serve as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond naming games.

Research Background and Motivation

1. Core Problem to Address

With the development of LLM multi-agent systems, agents need to establish common naming conventions to achieve effective coordination. When agents form inconsistent conventions during interactions, it leads to coordination failures, affecting practical applications such as collaborative coding and distributed planning. This paper investigates how lightweight structured constraints can guide convention formation, improving consistency and convergence speed among agents.

2. Problem Importance

  • Practical Application Needs: Multi-agent systems in real-world applications (e.g., collaborative coding, distributed planning) require reliable communication protocols
  • Scalability Challenges: As system scale grows, maintaining consistency becomes increasingly difficult
  • Efficiency Requirements: Reducing the interaction cost (token consumption) needed to reach consensus is critical for practical deployment

3. Limitations of Existing Approaches

  • Natural Language Communication: While flexible, it lacks structure, leading to slow and unstable convention formation
  • Purely Free Convention Emergence: Convention formation relying on pure interaction is inefficient, requiring extensive interaction to reach consensus
  • Lack of Control Mechanisms: Existing research lacks simple and effective control means to guide convention formation

4. Research Motivation

Inspired by two lines of work:

  1. Naming game research shows that conventions can emerge from interaction (Ashery et al. 2025)
  2. Structured formats (e.g., JSON schema) improve LLM reasoning and collaboration in supervised tasks (Chen et al. 2024)

This paper poses a key question: Can lightweight schema priors guide convention formation itself?

Core Contributions

  1. Proposes SIGN Framework: First to introduce schema-induced mechanisms into naming games, investigating how structured constraints affect convention formation in LLM agents
  2. Empirically Validates Structured Communication Advantages:
    • Convergence speed improved by an order of magnitude (significant token consumption reduction)
    • Population agreement improved up to 5.8× (from 0.111 to 0.639)
  3. Provides Controllable Coordination Mechanism: Demonstrates that schema constraints can serve as a model-agnostic "control knob" to simply and effectively improve multi-agent coordination
  4. Cross-Model Validation: Verifies method effectiveness and robustness on Phi-3 and LLaMA models and their hybrid populations
  5. Theoretical Insights: Reveals how minimal structural priors shape convention emergence processes, providing guidance for multi-agent system design

Method Details

Task Definition

The naming game is defined on the following setting:

  • Population: N agents
  • Vocabulary: Fixed vocabulary L = {C₁, ..., Cₘ}
  • Time Steps: t = 1, ..., T
  • Interaction Mechanism: Each round randomly pairs two agents
  • Objective: Through interaction, converge the population to a common naming convention

Input: Agent i generates message m^t_i at step t

Output: Decoder maps message to a name y^t_i ∈ L in the vocabulary

Constraint: Each agent maintains a memory window of size K, storing the most recent K interactions with partners

Three Experimental Conditions

1. Natural Language (NL)

  • Agents generate unconstrained natural language output
  • Decoder extracts valid tokens as much as possible
  • No memory mechanism (K=0)

2. Natural Language Sliding Window (NL-SW)

  • Extends NL condition with a memory window of size K
  • Recent interactions influence future proposals
  • Still uses natural language communication

3. Schema (Core Innovation)

  • Enforced Format: Requires replies matching @say {name: Ck} format
  • Parsing Mechanism: Uses regular expressions to extract Ck token
  • Error Handling:
    • Non-compliant outputs get one retry opportunity (with reminder)
    • If still invalid, decode free text
    • If completely undecodable, set y ← None
  • Design Philosophy: Provides explicit, easily parseable vocabulary entry handles, maintaining transparency to listeners with minimal overhead

Algorithm Flow (Algorithm 1)

Input: N (number of agents), L (vocabulary), K (memory size), T (steps), α (adoption probability)

for t = 1 to T:
    1. Uniformly randomly pair agents i, j
    2. Each agent forms proposal m^t based on partner-specific K memories
    3. Parse @say {name: Ck} → y
    4. if non-compliant:
           retry once with reminder
           if still invalid:
               decode free text
               if undecodable:
                   y ← None
    5. if y_i ≠ y_j:
           adopt partner's Ck with probability α (lose-shift mechanism)

Technical Innovations

1. Lightweight Schema Design

  • Minimal Constraints: Only requires specific format labels, does not restrict content choice
  • Transparency: Clear format, easy to parse and debug
  • Flexibility: Retains sufficient freedom for convention emergence

2. Error Handling Mechanism

  • Single retry avoids over-penalization
  • Graceful degradation ensures experiment continuity
  • Balances structural constraints with practicality

3. Partner-Specific Memory

  • Only records history with interaction partners
  • Simulates local information in real social networks
  • Reduces memory complexity

4. Probabilistic Adoption Mechanism

  • Lose-shift strategy: adopt partner's choice with probability α when mismatch occurs
  • Parameter α controls learning speed
  • Models sociological learning dynamics

Experimental Setup

Dataset

  • Vocabulary: Fixed 12 entries (M=12)
  • No External Dataset: Pure simulation experiments, data generated through agent interactions

Experimental Parameters

ParameterValue
Population Size (N)12, 24
Vocabulary Size (M)12
Total Steps (T)300 (100 for mixed experiments)
Memory Window (K)0, 5, 10
Adoption Probability (α)0.5, 0.75, 0.9/0.99
Random Seeds3

Model Configuration

Main Experimental Models:

  • Phi-3 Mini 4K Instruct
  • LLaMA 3.2 3B Instruct

Decoding Parameters (identical for both models):

  • max_new_tokens = 32
  • temperature = 0.7
  • top_p = 0.9
  • repeat_penalty = 1.1

Evaluation Metrics

  1. Population Agreement
    • Definition: Proportion of agents in population reaching same naming for specific concept
    • Range: 0, 1, higher indicates better convention formation
  2. Tokens-to-Convergence
    • Definition: Total tokens needed to reach specific agreement threshold (50%, 60%, 70%)
    • Key metric for measuring efficiency
  3. Standard Deviation
    • Measures stability across different runs

Comparison Methods

  • NL (Baseline 1): Unstructured, memory-free natural language communication
  • NL-SW (Baseline 2): Natural language communication with memory window
  • Schema (Proposed Method): Schema-induced structured communication

Experimental Results

Main Results

1. Significant Population Agreement Improvement (Table 1)

NKNLNL-SWSchema
1200.111±0.048
2400.125±0.042
1250.278±0.1270.611±0.293
2450.292±0.0420.556±0.064
12100.333±0.1440.639±0.096
24100.295±0.0390.588±0.085

Key Findings:

  • Schema achieves agreement of 0.556-0.639, compared to NL's 0.111-0.125, an improvement of 5-5.8×
  • Compared to NL-SW's 0.278-0.333, an improvement of approximately
  • Best performance at K=10 (0.639), validating importance of memory

2. Impact of Different Adoption Probabilities (Figure 1)

  • α=0.5: Schema reaches 0.6-0.65, NL-SW approximately 0.3, NL below 0.2
  • α=0.75, 0.9: Similar trends, but slightly lower
  • Counter-intuitive Finding: Higher α (more aggressive adoption) slightly reduces agreement
  • Stability: Schema shows smallest standard deviation at α=0.5, most consistent results

3. Token Efficiency (Figure 2)

Tokens Required to Reach 50% Agreement:

  • Schema: approximately 10⁴ magnitude
  • NL-SW: approximately 10⁵ magnitude
  • NL: approximately 10⁵-10⁶ magnitude

Efficiency Improvement: Schema is one order of magnitude faster than NL/NL-SW

4. High-Threshold Convergence (Appendix Figures 5a, 5b)

60% Agreement:

  • Schema converges, requiring nearly two orders of magnitude fewer tokens than NL-SW
  • NL never reaches this threshold

70% Agreement:

  • Only Schema achieves convergence
  • Requires slightly more tokens than 60% threshold

Cross-Model Validation

1. LLaMA-Only Experiment (Figure 3)

  • Schema agreement: 0.75-0.8
  • NL and NL-SW: 0.65-0.7
  • Finding: LLaMA overall outperforms Phi, but Schema advantage remains significant

2. Mixed Model Experiment (Figure 4)

  • 6 Phi-3 + 6 LLaMA 3.2
  • Limited to 100 rounds
  • Result: Schema maintains clear advantage in heterogeneous populations
  • Significance: Method is robust to model differences

Ablation Studies

While not explicitly labeled as ablation studies, the three-condition comparison allows analysis of factor contributions:

  1. Role of Memory (NL vs NL-SW)
    • Adding memory (K=5,10) improves agreement from 0.111 to 0.278-0.333
    • Improvement of approximately 2.5-3×
  2. Role of Schema (NL-SW vs Schema)
    • Under same memory conditions, schema improves agreement from 0.278-0.333 to 0.556-0.639
    • Improvement of approximately 1.7-2×
  3. Combined Effect (NL vs Schema)
    • Combined memory + schema achieves 5-5.8× improvement
    • Not simple addition, exhibits synergistic effect

Experimental Findings

  1. Structured Constraints are Key Driver: Schema improvement exceeds memory window contribution
  2. Impact of Population Size:
    • N increases from 12 to 24, agreement slightly decreases (expected scaling challenge)
    • But Schema maintains absolute advantage
  3. Marginal Effect of Memory Window:
    • K increases from 5 to 10, limited improvement (0.611→0.639)
    • Suggests K=5 already captures key information
  4. Non-Monotonicity of Adoption Probability:
    • α=0.5 performs best, challenging intuition that "more aggressive learning is better"
    • Possible reason: too-rapid adoption causes local locking, hindering global optimization
  5. Model Family Differences:
    • LLaMA outperforms Phi in naming games
    • Both benefit from Schema

1. Multi-Agent LLM Systems

  • Guo et al. 2024: Survey of multi-agent systems, identifying coordination and communication as core challenges
  • This paper's contribution: Provides specific coordination mechanism design

2. Convention Emergence Research

  • Baronchelli et al. 2008: Classical theoretical analysis of naming games
  • Ashery et al. 2025: Social conventions and collective bias in LLM populations
  • This paper's contribution: Introduces structured constraints as control variable, studying their impact on emergence processes

3. Structured Formats and LLM Reasoning

  • Chen et al. 2024: Alternative formats (e.g., JSON) enhance LLM reasoning and communication
  • This paper's contribution: Extends structured formats from single-agent tasks to multi-agent coordination scenarios
  • Theory→Practice: Applies naming games from theoretical models to actual LLM systems
  • Passive→Active: Not just observing convention emergence, but actively guiding its formation
  • Single-Task→General: Proposed mechanism has potential cross-task applicability

Conclusions and Discussion

Main Conclusions

  1. Lightweight Schemas Effectively Guide Convention Formation: Fixed @say {name: Ck} format improves LLM agent agreement in naming games up to 5.8×
  2. Significant Efficiency Gains: Achieving same agreement level, Schema requires one order of magnitude fewer tokens
  3. Robustness Verification: Effects remain stable across different models (Phi-3, LLaMA), population sizes (12, 24), and heterogeneous settings
  4. Power of Minimal Structural Priors: Even very simple structural constraints significantly shape emergence processes
  5. Practical Control Mechanism: Schema constraints provide model-agnostic, easy-to-implement coordination control

Limitations

  1. Limited Task Scope
    • Validated only on naming games
    • Not tested on more complex coordination tasks (e.g., dialogue, planning)
  2. Small-Scale Experiments
    • Maximum population size of 24 agents
    • Fixed vocabulary of 12 entries
    • Real applications may require larger scales
  3. Limited Model Selection
    • Only two model families tested (Phi-3, LLaMA)
    • Does not include larger or more advanced models (e.g., GPT-4)
  4. Round Limitations
    • Main experiments 300 rounds, mixed experiments only 100 rounds
    • May not fully observe long-term dynamics
  5. Lack of Theoretical Analysis
    • Primarily empirical research
    • No deep theoretical explanation for why Schema works
  6. Potential Flexibility Trade-offs
    • Paper mentions need to study "whether consistency might limit broader tasks"
    • Structured constraints may sacrifice expressiveness in some scenarios

Future Directions

Directions explicitly proposed in paper:

  1. Test Schema Impact on LLM Response Variability
    • Study trade-off between consistency and task diversity
  2. Larger-Scale Experiments
    • More agents, larger vocabularies
  3. Alternative Schema Designs
    • Explore effects of different structured formats
    • Adaptive or learnable schemas
  4. Longer Experimental Periods
    • Observe long-term evolution dynamics
  5. Extension to Other Tasks
    • Collaborative coding, distributed planning, and other practical applications

Potential extension directions:

  1. Theoretical Modeling: Establish mathematical models explaining how schemas accelerate convergence
  2. Dynamic Schemas: Automatically adjust structuring degree based on task complexity
  3. Human-Machine Hybrid: Test in systems with human participants
  4. Adversarial Settings: Study structured constraint performance in competitive environments

In-Depth Evaluation

Strengths

1. Method Innovation

  • Simple Yet Effective: Proposed schema mechanism is extremely lightweight (only one format label), yet produces significant effects
  • Controllability: Provides clear control knob (schema present/absent), easy to apply in practice
  • Theory-Practice Integration: Connects classical naming game theory with modern LLM systems

2. Experimental Sufficiency

  • Multi-Dimensional Comparison: Three conditions (NL, NL-SW, Schema) clearly show each factor's role
  • Parameter Sweeping: Systematically tests different values of N, K, α
  • Cross-Model Validation: Includes single-model and mixed-model experiments
  • Multi-Threshold Analysis: 50%, 60%, 70% convergence analysis provides comprehensive perspective

3. Result Convincingness

  • Quantitatively Significant: 5.8× improvement and one order of magnitude efficiency gain are strong evidence
  • Statistical Stability: Three random seeds, reports standard deviations
  • Consistent Trends: All experimental configurations show Schema advantage

4. Writing Clarity

  • Clear Structure: Problem→Method→Experiments→Conclusion flows logically
  • Algorithm Description: Pseudocode is concise and clear
  • Visualization: Figures effectively communicate core findings
  • Open Source Commitment: Provides code links, promoting reproducibility

5. Practical Value

  • Low-Cost Deployment: Schema mechanism is easy to implement, requires no model retraining
  • Model-Agnostic: Applicable to any LLM supporting structured output
  • Broad Applicability: Principles extensible beyond naming games to other coordination tasks

Weaknesses

1. Insufficient Theoretical Depth

  • Lack of Mechanism Explanation: Why is simple format labeling so effective? Does it reduce search space? Improve parsing accuracy? Or something else?
  • No Convergence Analysis: No theoretical guarantees (e.g., convergence rate bounds)
  • Unexplained α Non-Monotonicity: Why does α=0.5 outperform α=0.9? Needs deeper analysis

2. Limited Experimental Scope

  • Single Task: Only naming games, generalization unknown
  • Small Scale: N≤24, M=12 may be insufficient for real applications
  • Short Duration: 300 rounds may not capture certain long-term phenomena (e.g., convention drift)

3. Incomplete Comparisons

  • Missing Alternative Formats: No comparison with XML, YAML, or other structured formats
  • No Optimal Baselines: Not compared with purpose-designed coordination protocols (e.g., voting mechanisms)
  • Unexplored Prompt Engineering: Could carefully designed prompts achieve similar effects in NL condition?

4. Shallow Analysis

  • No Error Analysis: Lacks detailed analysis of non-compliance types and causes
  • Missing Qualitative Analysis: No examples of actual agent-generated messages
  • Unexplored Memory Contents: What is stored in memory window? How does it influence decisions?

5. Insufficiently Discussed Negative Impacts

  • Flexibility Loss: Structured constraints may limit certain creative tasks
  • Error Propagation: If incorrect conventions form early, schema may accelerate their spread
  • Fairness: Different models may have different adaptation capabilities to schemas

6. Incomplete Implementation Details

  • Unquantified Error Handling Impact: Specific impact of retry and degradation handling on results not quantified
  • Decoding Parameter Justification: Rationale for choices like temperature=0.7 not explained
  • Pairing Strategy: Is uniform random pairing optimal?

Impact Assessment

1. Contribution to Field

  • Methodological Contribution: Provides new experimental paradigm for multi-agent LLM research
  • Empirical Contribution: First systematic quantification of structured constraints' impact on convention formation
  • Inspirational Value: Stimulates further research on "minimal effective structure"

2. Practical Value

  • Immediately Applicable: Simple method, directly applicable to existing systems
  • Cost-Benefit: Significant token consumption reduction, lower API call costs
  • Scalability: Provides foundation for building large-scale multi-agent systems

3. Reproducibility

  • High: Code repository provided, detailed parameter settings
  • Open Models: Uses open-source models (Phi-3, LLaMA)
  • Reasonable Compute: Small-scale experiments runnable on standard GPUs

4. Potential Application Scenarios

  • Collaborative Coding: Multiple AI assistants coordinating naming conventions during development
  • Distributed Planning: Multi-robot systems for task allocation and naming
  • Knowledge Graph Construction: Multi-agent collaborative entity and relation annotation
  • Multilingual Systems: Cross-language agent concept alignment

Applicability Analysis

Most Suitable Scenarios

  1. Limited Discrete Choice Spaces: Classification, annotation tasks
  2. Fast Convergence Needed: Real-time or resource-constrained applications
  3. Heterogeneous Agent Systems: Different models need unified interface
  4. Predefinable Formats: Tasks allow explicit output structure

Less Suitable Scenarios

  1. Open-Ended Creative Tasks: Creative writing, brainstorming
  2. Requiring Nuance: Structured formats may lose subtle information
  3. Dynamically Evolving Tasks: Fixed schemas may limit adaptability
  4. Human-Involved Dialogue: Over-structuring may harm user experience

Scenarios Requiring Caution

  1. High-Risk Decisions: Need additional verification to prevent error convention propagation
  2. Long-Running Systems: Need monitoring for convention drift and schema failure
  3. Cross-Cultural/Cross-Domain Applications: Schema design needs domain-specific consideration

References

Key references cited in paper:

  1. Ashery, A. F.; Aiello, L. M.; Baronchelli, A. (2025). Emergent social conventions and collective bias in LLM populations. Science Advances, 11(20): eadu9368.
    • Social convention emergence in LLM populations
  2. Baronchelli, A.; Loreto, V.; Steels, L. (2008). In-depth analysis of the Naming Game dynamics: the homogeneous mixing case. arXiv:0803.0398.
    • Classical theoretical analysis of naming games
  3. Chen, W. et al. (2024). Beyond natural language: LLMs leveraging alternative formats for enhanced reasoning and communication. arXiv:2402.18439.
    • Structured formats enhance LLM reasoning
  4. Guo, T. et al. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv:2402.01680.
    • Survey of multi-agent LLM systems

Summary

The SIGN paper proposes a simple yet powerful idea: guide multi-agent system convention formation through minimal structured constraints. Experimental results are impressive, with 5.8× agreement improvement and order-of-magnitude efficiency gains providing strong support for practical applications.

Core Value lies in providing a low-cost, efficient, model-agnostic coordination mechanism, highly significant given the growing importance of multi-agent LLM systems. The method's simplicity is itself an advantage—requiring no complex training or architectural modifications, merely output format constraints significantly improve coordination.

Main Limitations are theoretical depth and application scope. The paper is more empirical demonstration than deep analysis, with future work needed to answer "why" and "when" questions. Extension to more complex tasks and larger-scale systems is necessary next step.

Overall, this is a well-executed, clearly-contributing research work providing practical tools and research insights for multi-agent coordination, worthy of attention and further exploration.