2025-11-28T04:01:18.891206

Natural Strategic Ability in Stochastic Multi-Agent Systems

Berthon, Katoen, Mittelmann et al.

Strategies synthesized using formal methods can be complex and often require infinite memory, which does not correspond to the expected behavior when trying to model Multi-Agent Systems (MAS). To capture such behaviors, natural strategies are a recently proposed framework striking a balance between the ability of agents to strategize with memory and the model-checking complexity, but until now has been restricted to fully deterministic settings. For the first time, we consider the probabilistic temporal logics PATL and PATL* under natural strategies (NatPATL and NatPATL*, resp.). As main result we show that, in stochastic MAS, NatPATL model-checking is NP-complete when the active coalition is restricted to deterministic strategies. We also give a 2NEXPTIME complexity result for NatPATL* with the same restriction. In the unrestricted case, we give an EXPSPACE complexity for NatPATL and 3EXPSPACE complexity for NatPATL*.

academic

Natural Strategic Ability in Stochastic Multi-Agent Systems

Basic Information

Paper ID: 2401.12170
Title: Natural Strategic Ability in Stochastic Multi-Agent Systems
Authors: Raphaël Berthon, Joost-Pieter Katoen (RWTH Aachen University), Munyque Mittelmann, Aniello Murano (University of Naples Federico II)
Classification: cs.LO (Logic in Computer Science), cs.AI (Artificial Intelligence)
Publication Time/Venue: AAAI 2024 (Extended version, revised November 2025)
Paper Link: https://arxiv.org/abs/2401.12170

Abstract

This paper extends the natural strategies framework to stochastic multi-agent systems (MAS) for the first time, proposing variants of probabilistic alternating temporal logic PATL and PATL* under natural strategies (NatPATL and NatPATL*). The main results show that when coalitions are restricted to deterministic strategies, NatPATL model checking is ∆P₂-complete; NatPATL* has 2NEXPTIME complexity. In the unrestricted case (probabilistic strategies), NatPATL has EXPSPACE complexity, and NatPATL* has 3EXPSPACE. This work achieves a good balance between strategic ability verification for finite-memory agents and computational complexity.

Research Background and Motivation

1. Core Problem

Strategies synthesized by formal methods typically have high complexity and require infinite memory, which is unrealistic in practical multi-agent system modeling. Human agents and computationally resource-constrained artificial agents cannot execute such complex strategies.

2. Problem Significance

Practical Requirements: Real-world agents (human or constrained AI) need executable, understandable strategies
Uncertainty Modeling: MAS commonly face stochasticity (natural events, agent behavior, sensor errors, etc.)
Safety-Critical Applications: Electronic voting systems, access control, etc. require verification of strategic ability under uncertain environments

3. Limitations of Existing Approaches

PATL/PATL*: Require infinite-memory strategies; model checking complexity is in NP∩co-NP but impractical
PSL: Undecidable; even restricted to memoryless strategies requires 3EXPSPACE
Stochastic Game Logic: Memoryless deterministic strategies are PSPACE, memoryless probabilistic strategies are EXPSPACE, but the memoryless assumption is too restrictive
Existing Natural Strategy Research: Limited to fully deterministic environments, cannot handle stochasticity

4. Research Motivation

Extend natural strategies to stochastic environments, filling a theoretical gap
Achieve balance between bounded memory and reasonable complexity
Provide theoretical foundation for multi-agent POMDP extensions

Core Contributions

Theoretical Extension: First extension of the natural strategies framework from deterministic environments to stochastic multi-agent systems, defining behavioral natural strategies
New Logic Systems: Propose NatPATL and NatPATL* two logic systems, supporting both memoryless (r) and bounded recall (R) semantics
Complexity Results (see Table 1 summary):
- Deterministic Strategies: NatPATLr/R is ∆P₂-complete (NP-hard lower bound), NatPATL*r/R is 2NEXPTIME
- Probabilistic Strategies: NatPATLr/R is EXPSPACE, NatPATL*r/R is 3EXPSPACE
Expressiveness Analysis: Prove that NatPATL() and PATL() have incomparable distinguishing power and expressiveness
Application Examples: Demonstrate practical value through electronic voting systems and access control systems

Methodology Details

Task Definition

Model Checking Problem: Given a stochastic concurrent game structure (CGS) G, state s, and NatPATL(*) formula φ, determine whether G, s ⊨ρ φ holds (ρ∈{r,R} denotes memoryless or bounded recall).

Input:

CGS G = (St, L, δ, ℓ): state set, legality function, stochastic transition function, labeling function
Initial state s ∈ St
NatPATL(*) formula φ, including complexity bound k

Output: Boolean value indicating whether the formula is satisfied

Core Concept: Behavioral Natural Strategies

1. Definition Structure

A behavioral natural strategy σₐ is an ordered list of condition-action pairs:

σₐ = [(r₁, d₁), (r₂, d₂), ..., (rₙ, dₙ)]

where:

rᵢ ∈ Reg(Bool(AP)): Regular expression conditions (based on propositional formula sequences)
dᵢ ∈ Dist(Ac): Probability distribution over actions
The last pair must be (⊤*, d) as the default action

2. Matching Mechanism

Given history h, the strategy selects the first matching rule:

match(h, σₐ) = min{i : h matches condᵢ(σₐ) and actᵢ(σₐ) is legal at last(h)}

History h matches regular expression r if and only if there exists a word b∈L(r) such that each state in h satisfies the corresponding boolean formula in b.

3. Complexity Measure

Strategy complexity c(σ) = Σ|r| (total number of symbols in all regular expressions)

4. Example (Electronic Voting System)

Voter's deterministic memoryless strategy:

1. (hasBallot_v ∧ ¬scanned_v, scanBallot)
2. (¬vot_v ∧ scanned_v, enterVote)
3. (¬vot_v ∧ entVote_{v,s} ∧ ¬(sigOk_s ∨ sigFail_s), checkSig_s)
4. (¬vot_v ∧ entVote_{v,s} ∧ sigFail_s, cnlVote)
5. (¬vot_v ∧ entVote_{v,s} ∧ sigOk_s, conf)
6. (vot_v ∧ rec_{v,r} ∧ ¬shreded_r, shred_r)
7. (⊤, noop)

Coercer's probabilistic recall strategy:

1. (⊤* · ⋀_v ¬coerced_v, d_V)  // Randomly select coercion target
2. (⊤* · coerced_v ∧ ¬requested_v, request_v)
3. (⊤* · ¬requested_v* · (requested_v ∧ ¬vot_v)* ∧ ¬punished_v, punish_v)
4. (⊤* · ¬coerced_v ∧ ¬coerced_{v'}, d_{v,v'})
5. (⊤*, noop)

Logic Syntax and Semantics

NatPATL* Syntax

φ ::= p | φ∨φ | ¬φ | Xφ | φUφ | ⟨⟨C⟩⟩_{▷◁d}^k φ

where ⟨⟨C⟩⟩_{▷◁d}^k φ denotes: coalition C has a strategy with complexity ≤k such that the probability of satisfying φ has relation ▷◁ with d.

NatPATL Syntax (Restricted)

φ ::= p | φ∨φ | ¬φ | ⟨⟨C⟩⟩_{▷◁d}^k (Xφ) | ⟨⟨C⟩⟩_{▷◁d}^k (φUφ)

Temporal operators must immediately follow coalition operators.

Probability Space Construction

Strategy profile σ and state s induce Markov chain M_{σ,s}
Transition probability: p(h, hs') = Σ_c σ(h)(c) × δ(last(h), c)(s')
Generate canonical probability measure out(σ, s)
Possible outcomes of coalition strategy σ_C: out_C(σ_C, s) = {out((σ_C, σ_{-C}), s) : σ_{-C}∈∏_{a∈Ag-C} Str^ρ_a}

Semantic Definition

Key coalition operator semantics:

G, π ⊨ρ ⟨⟨C⟩⟩_{▷◁d}^k φ ⟺ 
  ∃σ_C ∈ ∏_{a∈C} {α∈Str^ρ_a : c(α)≤k} such that
  ∀μ^{σ_C}_{π₀} ∈ out_C(σ_C, π₀), μ^{σ_C}_{π₀}({π' : G,π'⊨ρ φ}) ▷◁ d

Technical Innovations

Probabilistic Strategy Extension: Extends originally deterministic natural strategies to probability distributions, more aligned with practical decision-making
Regular Expression Conditions: Uses regular expressions rather than state histories, enabling imperfect information modeling
Complexity Parameterization: Incorporates strategy complexity k as a formula parameter, enabling expression of properties like "no simple strategy exists"
Behavioral Strategy Semantics: Adopts behavioral strategies (state-action probabilities) rather than mixed strategies (strategy selection probabilities), related to Kuhn equivalence in game theory
Dual Adversariality: Nested structure of coalition strategy existential quantification and opponent strategy universal quantification

Experimental Setup

Note: This is a theoretical computer science paper providing complexity-theoretic results rather than experimental evaluation.

Theoretical Proof Methods

Upper Bound Proof Techniques

∆P₂ Algorithm (Theorem 1):
- Guess polynomial-length witnesses (strategies for each subformula, state, and agent)
- Utilize NP oracle polynomially many times
- Verify formula bottom-up, converting to MDP reachability problems
2NEXPTIME Algorithm (Theorem 5):
- Non-deterministically guess strategies
- Each subformula verification requires 2EXPTIME (LTL model checking)
- Polynomial number of oracle calls
EXPSPACE/3EXPSPACE Algorithms (Theorems 7-8):
- Convert to real arithmetic
- Introduce variables r_{x,s,a,q} representing probability of strategy x selecting action a at state s and automaton state q
- Automaton state count is exponential in k, leading to exponential variable count

Lower Bound Proof Techniques

NP-hardness (Theorem 4): Reduction from POMDP almost-sure reachability
2EXPTIME-hardness (Theorem 6): Reduction from LTL model checking over MDPs

Case Systems

1. Access Control System (Example 3)

Structure: Grid-like tiles, randomly moving robot, coalition-controlled doors
Objective: Reach all targets with probability ≥0.7 infinitely often
Formula: ⟨⟨C⟩⟩^k_{≥0.7} G⋀_{t_j∈T,t_j≠t_i} Ft_j
Analysis Result: Demonstrates sufficiency of deterministic strategies in stochastic environments

2. Electronic Voting System (Examples 1, 2, 4)

Agents: Voters V, coercer C
Actions: Scan, vote, cancel, confirm, check signature, shred receipt, etc.
Stochasticity: Actions may fail (e.g., coercion may be unsuccessful)
Verification Properties:
- Voter Verifiability: ⟨⟨v⟩⟩^k_{≥0.9} F(sigOk_s ∨ sigFail_s)
- Receipt Freeness: ¬⟨⟨v⟩⟩^k_{≥0.5} F⋁r (receipt{v,r} ∧ ¬shreded_r)

Experimental Results

Main Complexity Results

Logic	Deterministic Strategies	Probabilistic Strategies
NatPATLr	∆P₂-complete	EXPSPACE
NatPATL*r	2NEXPTIME	3EXPSPACE
NatPATLR	∆P₂-complete	EXPSPACE
NatPATL*R	2NEXPTIME	3EXPSPACE

Key Theorem Summary

Theorems 1-4 (NatPATL Deterministic Strategies)

Upper Bound: ∆P₂ = P^NP (polynomial time with NP oracle)
Lower Bound: NP-hard (reduction from POMDP)
Positive Fragment: NP-complete (Theorem 3)
Significance: Matches POMDP bounded-memory deterministic strategy complexity

Theorems 5-6 (NatPATL* Deterministic Strategies)

Upper Bound: 2NEXPTIME
Lower Bound: 2EXPTIME-hard
Gap: Exponential gap exists; whether it can be improved is an open question

Theorems 7-8 (Probabilistic Strategies)

NatPATL*R: 3EXPSPACE (matches PSL memoryless strategies)
NatPATLR: EXPSPACE (avoids LTL's double exponential blowup)
Technical Key: Leverages polynomial complexity of reachability/invariance

Expressiveness Results (Theorem 9)

Conclusion: NatPATL() and PATL() have incomparable distinguishing power and expressiveness

Proof Strategy:

NatPATL ⊀_d PATL: Natural strategies can express "no strategy with complexity ≤k exists," which combined strategies cannot
PATL ⊀_d NatPATL: Combined strategies can express certain properties requiring infinite memory that natural strategies cannot
Derive expressiveness incomparability from distinguishing power

1. Stochastic MAS Verification

Huang & Luo (2013): Deterministic strategies + probabilistic knowledge ATL
Song et al. (2019): Probabilistic alternating temporal μ-calculus
Belardinelli et al. (2023): PATL with imperfect information and memoryless strategies
Chen et al. (2013): PATL with cumulative cost/reward

2. Finite-Memory Strategy Representation

Vester (2013): Input/output automata representation
Brázdil et al. (2015): Decision tree representation
Ågotnes & Walther (2009): Bounded-memory ATL
Deuser & Naumov (2020): Mealy machine representation, bounded recall impact on planning ability

3. Prior Natural Strategy Work

Jamroga et al. (2019a): Original definition, concurrent games in deterministic environments, Nash equilibrium, ATL model checking
Jamroga et al. (2019b): Extension to imperfect information ATL
Belardinelli et al. (2022): Extension to strategy logic SL

Infinite Memory: Büchi/reachability targets EXPTIME, parity targets undecidable
Bounded Memory (Junges et al. 2018):
- Deterministic strategies: NP-complete
- Probabilistic strategies: ETR-complete
This Paper's Contribution: Extends POMDP to multi-agent + bounded memory

Paper's Advantages

First combination of probabilistic environments with natural strategies
Complexity results match POMDP in deterministic case, comparable to PSL in probabilistic case
Provides good balance between practicality and complexity

Conclusions and Discussion

Main Conclusions

Theoretical Contribution: Successfully extends natural strategies to stochastic MAS, establishing a complete complexity landscape
Practical Value:
- ∆P₂ complexity of deterministic strategies has practical potential
- Can capture bounded-memory POMDP without significant complexity loss
Theoretical Insights:
- Double exponential blowup from PATL to PATL* stems from LTL model checking
- Exponential space blowup for probabilistic strategies originates from real arithmetic encoding
- Complexity gap between deterministic and probabilistic strategies remains unresolved in related work

Limitations

Complexity Gaps:
- NatPATL* deterministic strategies: 2EXPTIME-hard vs 2NEXPTIME upper bound
- Whether upper or lower bounds can be improved is an open question
Practical Implementation:
- 3EXPSPACE complexity may be infeasible in practice
- Lacks experimental evaluation on actual systems
Expressiveness Limitations:
- Cannot express certain properties requiring infinite memory
- Choice of complexity bound k requires domain knowledge
Probabilistic Strategy Lower Bounds:
- No evidence of complexity separation between probabilistic and deterministic strategies
- May require new constructions from POMDP or Dec-POMDP

Future Directions

Extension to PSL: Feasible but difficult to improve 3EXPSPACE complexity
Qualitative Fragments: Consider qualitative PATL* or PSL with only >0 and =1 thresholds, potentially achieving better complexity
Quantitative Techniques: Apply probabilistic model checking techniques (graph analysis, bisimulation minimization, symbolic techniques, partial order reduction)
Epistemic Operators: Extend to epistemic logic framework, handling knowledge and belief
Approximation Algorithms: Develop heuristic or approximate algorithms for practical applications
Tool Implementation: Develop prototype tools and evaluate on practical cases

In-Depth Evaluation

Strengths

Theoretical Rigor:
- Complete complexity upper and lower bound proofs
- Detailed semantic definitions (probability space construction)
- Rigorous expressiveness analysis
Problem Significance:
- Fills theoretical gap in natural strategies under stochasticity
- Addresses key requirements for practical MAS modeling
- Connects multiple research areas (MAS, POMDP, game theory)
Technical Contributions:
- Elegant probabilistic extension of behavioral strategies
- Innovative introduction of complexity parameterization k
- Regular expression conditions enable imperfect information modeling
Application-Oriented:
- Electronic voting system case closely aligned with practice
- Access control example clearly demonstrates stochasticity modeling
- Formula examples have practical significance
Writing Quality:
- Clear structure, progressively developing from motivation to techniques
- Rich examples aid understanding of abstract concepts
- Comprehensive related work survey

Weaknesses

Experimental Absence:
- Purely theoretical research without practical system evaluation
- No tool implementation or case studies provided
- Cannot assess practical feasibility
Complexity Gaps:
- NatPATL* deterministic strategies have exponential gap
- Probabilistic strategy lower bounds are not tight
- Lacks deep analysis of gap sources
Expressiveness Analysis:
- Only proves incomparability, doesn't quantify differences
- Lacks discussion of which practical properties are/aren't expressible
- Relationship with other logics (e.g., SL) not deeply explored
Strategy Complexity:
- Complexity measure c(σ) is coarse (only symbol count)
- Doesn't consider other factors like automaton state count
- Lacks guidance on practical k selection
Probabilistic Model:
- Only considers finite-state CGS
- Doesn't discuss continuous state/action spaces
- Probability distributions limited to rationals

Impact

Theoretical Impact:
- Provides new tools for stochastic MAS verification
- Advances natural strategy theory development
- Bridges MAS and POMDP communities
Practical Value:
- Safety-critical applications: electronic voting, access control
- Human-machine collaboration system design
- Resource-constrained agent planning
Reproducibility:
- Detailed definitions and proofs
- Complete proofs in technical appendix
- But lacks code and datasets
Follow-up Research:
- Epistemic logic extensions
- Approximation algorithm development
- Tool implementation
- Practical case studies

Applicable Scenarios

Theoretical Research:
- Multi-agent system formal verification
- Game theory and logic intersection research
- Complexity theory
Safety-Critical Systems:
- Electronic voting protocol verification
- Access control policy analysis
- Autonomous system safety verification
Human-Machine Interaction:
- Explainable AI strategy design
- Human-understandable planning
- Collaborative robotics
Resource-Constrained Environments:
- Embedded systems
- IoT devices
- Mobile robots

Inapplicable Scenarios:

Systems requiring precise numerical optimization
Continuous state/action spaces
Requiring fast online decision-making (complexity too high)

Selected References

Jamroga, W., Malvone, V., & Murano, A. (2019). Natural strategic ability. Artificial Intelligence, 277, 103170.
- Original definition of natural strategies
Aminof, B., et al. (2019). Probabilistic Strategy Logic. IJCAI.
- PSL's 3EXPSPACE complexity
Chatterjee, K., Chmelik, M., & Davies, J. (2016). A Symbolic SAT-Based Algorithm for Almost-Sure Reachability with Small Strategies in POMDPs. AAAI.
- NP-completeness of POMDP bounded-memory strategies
Baier, C., et al. (2012). Stochastic game logic. Acta Informatica, 49(4), 203-224.
- EXPSPACE complexity of stochastic game logic
Alur, R., Henzinger, T., & Kupferman, O. (2002). Alternating-time temporal logic. J. ACM, 49(5), 672-713.
- Seminal ATL paper

Overall Assessment: This is a high-quality theoretical computer science paper making significant contributions to stochastic multi-agent system verification. The theoretical results are rigorous and complete, problem motivation is well-established, and technical innovations are notable. Main shortcomings are the absence of experimental validation and tool implementation. For theoretical researchers and formal methods researchers, this is essential reading; for practitioners, implementation and case studies are needed. The paper's complexity results establish important theoretical benchmarks for the field.