2025-11-24T11:16:18.396523

Uniform Value and Decidability in Ergodic Blind Stochastic Games

Chatterjee, Lurie, Saona et al.
We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value $v$ if for every $\varepsilon>0$, Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large $n$, his average payoff over $n$ stages is at least $v-\varepsilon$ (resp., at most $v+\varepsilon$). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.
academic

Uniform Value and Decidability in Ergodic Blind Stochastic Games

Basic Information

  • Paper ID: 2405.12583
  • Title: Uniform Value and Decidability in Ergodic Blind Stochastic Games
  • Authors: Krishnendu Chatterjee (IST Austria), David Lurie (NyxAir & Paris Dauphine University), Raimundo Saona (IST Austria), Bruno Ziliotto (Toulouse School of Economics)
  • Classification: math.OC (Optimization and Control), cs.CC (Computational Complexity)
  • Publication Date: arXiv v2, November 21, 2025
  • Paper Link: https://arxiv.org/abs/2405.12583v2

Abstract

This paper investigates blind stochastic games—a class of two-player zero-sum stochastic games where participants neither observe the state nor receive any information about it. The paper introduces ergodic blind stochastic games as a subclass, defined by imposing ergodicity conditions on state transitions. For this subclass, the paper proves the existence of uniform value and provides approximation algorithms, establishing decidability of the approximation problem. This decidability result is novel even in the single-player setting of partially observable Markov decision processes (POMDPs). Furthermore, the paper proves that no algorithm can compute the uniform value exactly, emphasizing the tightness of the results. Finally, the paper establishes that the uniform value is independent of the initial belief.

Research Background and Motivation

Research Questions

The core problems addressed in this paper are: How can we characterize the value of long-term play in blind stochastic games? Specifically:

  1. Existence of Uniform Value: Ziliotto Zil16 has shown that general blind stochastic games may not have a uniform value. Under what conditions does the uniform value exist?
  2. Computability: Even if the uniform value exists, can it be computed or approximated algorithmically? Madani et al. MHC03 have shown that computing and approximating the uniform value are both undecidable in general blind MDPs.

Problem Significance

  1. Theoretical Importance: Blind stochastic games are the simplest model of incomplete information games and serve as a theoretical benchmark for studying more complex models. They are equivalent to probabilistic finite automata, which have broad applications in computer science.
  2. Practical Applications: Including concise specification languages for infinite strings BGB12, CT12, sequence analysis in computational biology DEKM98, speech processing Moh97, and others.
  3. Methodological Contribution: Identifying conditions on data that ensure ergodic behavior in belief dynamics represents a key methodological contribution.

Limitations of Existing Approaches

  1. General Blind Stochastic Games: Do not guarantee existence of uniform value Zil16
  2. Exchangeable Blind Stochastic Games: Venel Ven15 proved uniform value exists but provided no computational algorithm
  3. Blind MDPs: Rosenberg et al. RSV02 proved uniform value exists, but Madani et al. MHC03 proved both computation and approximation are undecidable
  4. Existing Ergodicity Research: Primarily focused on standard stochastic games or MDPs; blind stochastic games not systematically studied

Research Motivation

The paper's starting point is to identify a decidable subclass of blind stochastic games by introducing ergodicity conditions. These conditions:

  • Are natural and commonly used in game theory literature
  • Guarantee existence of uniform value
  • Make the approximation problem decidable
  • Keep exact computation undecidable, showing result tightness

Core Contributions

The paper's main contributions include:

  1. Definition of Ergodic Blind Stochastic Games: Formally defines this new subclass based on ergodicity properties of transition matrix products (Definition 3.3)
  2. Uniform Value Existence and Approximation Decidability (Theorem 3.5):
    • Proves all ergodic blind stochastic games have uniform value
    • Provides approximation algorithms establishing decidability of the approximation problem
    • Complexity upper bound: 2-EXPSPACE
  3. Undecidability of Exact Computation (Theorem 3.6):
    • Proves that even for Markov blind MDPs (a subclass of ergodic blind stochastic games), exact computation of uniform value is undecidable
    • Demonstrates tightness of the approximation decidability result
  4. Initial Belief Independence (Theorem 3.7):
    • Proves that in ergodic blind stochastic games, uniform value does not depend on initial belief
  5. Sufficient Conditions: Identifies several classes of transition matrices guaranteeing ergodicity (C1-C5), including Markov matrices, scrambling matrices, Sarymsakov matrices, etc.
  6. Verification Algorithm (Proposition 3.9): Provides an exponential-space algorithm for verifying whether a blind stochastic game satisfies the ergodicity property

Methodology Details

Task Definition

Blind Stochastic Game is defined as a five-tuple Γ = (K, I, J, p, g):

  • K: finite state set
  • I, J: finite action sets for players 1 and 2
  • p: K × I × J → Δ(K): probabilistic transition function
  • g: K × I × J → 0,1: stage reward function

Key Feature: Players do not observe the state, knowing only the initial belief b₁ ∈ Δ(K) and history of actions.

Uniform Value is defined as: Game Γ has uniform value v: Δ(K) → 0,1 if for all b₁ ∈ Δ(K) and ε > 0, there exist strategies (σ*, π*) and n ∈ ℕ* such that for all N ≥ n:

  • Player 1 can guarantee: γ^(b₁)_N(σ*, π) ≥ v(b₁) - ε
  • Player 2 can guarantee: γ^(b₁)_N(σ, π*) ≤ v(b₁) + ε

where γ^(b₁)N(σ, π) = 𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G_m is the N-stage average reward.

Ergodicity Characterization

Core Concept: Ergodicity coefficient τ₁. For a stochastic matrix P, define:

τ₁(P) := 1/2 max_(k,k̄) ∑^|K|(k'=1) |p(k,k') - p_(k̄,k')|

Ergodic Blind Stochastic Game (Definition 3.3): Game Γ is ergodic if for all ε > 0, there exists an integer n₀ such that for all action pair sequences aⁿ of length n ≥ n₀:

τ₁(Tⁿ(aⁿ)) ≤ ε

where Tⁿ(aⁿ) = P(a₁)P(a₂)···P(aₙ) is the product of forward transition matrices.

Key Property (Proposition 3.8): Game Γ is ergodic if and only if there exists n₀ ≤ (3^|K| - 2^(|K|+1) + 1)/2 such that for all action pair sequences of length n₀:

τ₁(Tⁿ⁰(aⁿ⁰)) < 1

Model Architecture: Abstract Stochastic Game

The paper's core method constructs an abstract stochastic game G*(b₁, ε), which is finite-state and approximates the original blind stochastic game with error ε.

Construction Steps:

Step 1: Determine Time Scale (Proposition 4.1)

  • Given ε > 0, compute nε = n₀⌈ln(ε)/ln(sup_(aⁿ⁰) τ₁(Tⁿ⁰(aⁿ⁰)))⌉
  • For all sequences aⁿ of length n ≥ nε, we have τ₁(Tⁿ(aⁿ)) ≤ ε

Step 2: Construct Stable Matrix Set

  • T(ε): set of all forward product matrices Tⁿ of length n satisfying the τ₁ condition
  • T̃(ε): abstract stable matrix set; for each Tⁿ ∈ T(ε), define T̃ⁿ such that:
    t̃ⁿ_(k,k')(aⁿ) := 1/|K| ∑^|K|(k=1) tⁿ(k,k')(aⁿ)
    i.e., each row equals the average of all rows, making the matrix stable (all rows identical)

Step 3: Define Abstract State Space

Abstract belief set: B* := {b* ∈ Δ(K) | ∃T̃ⁿ ∈ T̃(ε) s.t. b* = b₁^⊤T̃ⁿ} ∪ {b₁}

State space: X := ⋃^(n-1)_(m=0) (B* × (I × J)^m)

This is a finite set with size O(|I × J|^(2n)), where n is doubly exponential.

Step 4: Define Transitions and Rewards

  • Projection function proj: X → Δ(K) maps abstract states to beliefs
  • Abstract update function ψ*:
    • If m ∈ 0, n-2: ψ*(x, a) = (b*, a₁, ···, a_m, a)
    • If m = n-1: ψ*(x, a) = (proj(x)^⊤T̃ⁿ(aⁿ)) (reset to new abstract belief)
  • Transition function: p*(x'|x, a) = 1 if x' = ψ*(x, a), otherwise 0 (deterministic)
  • Reward function: g*(x, i, j) = ∑_(k∈K) proj(x)(k)·g(k, i, j)

Technical Innovations

1. Aggregation Scheme Design

  • Key Insight: Ergodicity ensures that after length n, all transition matrix rows become similar (difference ≤ ε)
  • Stabilization Technique: Construct stable matrices through averaging, making belief updates independent of initial belief
  • Block Structure: Reset every n steps, exploiting the "memory loss" property of ergodicity

2. Fine-grained Error Control Analysis

  • Lemma 4.2: Proves bounded belief distance between original and abstract games:
    ||b^(b₁,h_m)(m,σ,π) - proj(x^(b₁,h_m)(m,σ,π))||₁ ≤ 4ε
  • Lemma 4.3: Proves bounded average reward difference:
    |𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G_m - 𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G*_m| ≤ 4ε

3. Distinction from Baselines

  • vs. Venel Ven15's Exchangeable Games: Not only proves existence but provides decidable algorithms
  • vs. Standard Stochastic Game Ergodicity: Conditions imposed on original data rather than belief games
  • vs. Blind MDP Literature: First to establish approximation decidability in two-player games

4. Cleverness of Undecidability Proof

  • Reduction from the universality problem of probabilistic finite automata (PFA)
  • Constructs Restart action enabling blind MDP to simulate PFA
  • Adds state k̂ ensuring all transition matrices are Markov (guaranteeing ergodicity)
  • Proves acceptance probability > 1/2 equivalent to long-term average > 1/2

Experimental Setup

Example: Machine Maintenance Problem (Example 3.1)

Problem Description:

  • States: K = {Good, Fair, Poor} (machine condition)
  • Actions: I = {Wait, Basic Maintenance, Critical Repair}
  • Players monitor an inaccessible machine and make decisions based on action history

Transition Matrices (partial display):

Under Wait action:
     G    F    P
G  0.9  0.1  0.0
F  0.0  0.7  0.3
P  0.0  0.1  0.9

Verify Ergodicity:

  • Each action's transition matrix P(i) is a Markov matrix (at least one column entirely positive)
  • Therefore τ₁(P(i)) < 1 for all i ∈ I
  • Satisfies ergodicity property

Algorithm Implementation (Algorithm 1)

Input: Initial belief b₁, blind stochastic game Γ, precision ε
1. Verify ergodicity property of Γ
2. Construct matrix set T(ε)
3. Derive abstract matrix set T̃(ε) from T(ε)
4. Construct abstract stochastic game G*(b₁, ε)
5. Compute uniform value v*(b₁, ε) of G*(b₁, ε)
Output: v*(b₁, ε) (if Γ is ergodic)

Complexity Analysis:

  • Number of states: O(|I × J|^(2n)), where n ≤ (3^|K| - 2^(|K|+1) + 1)/2
  • Via real closed field theory, standard stochastic games solvable in PSPACE CMH08
  • Overall complexity: 2-EXPSPACE

Experimental Results

Main Theoretical Results

The paper is primarily theoretical work; core results summarized in Table 2:

CategoryUniform Value ExistsExact DecidableApprox DecidableValue ConstantSufficient Conditions
Ergodic Blind MDPYesUndecidableDecidableYesYes
Ergodic Blind Stochastic GameYesUndecidableDecidableYesYes
Scrambling Hidden Stochastic Game?Undecidable??Yes

(Bold indicates new results from this paper)

Theorem Verification

Theorem 3.5 (Existence and Decidability):

  • Proof Strategy: Via 4-step construction
    1. Construct abstract game G*(b₁, ε)
    2. Prove beliefs remain close (Lemma 4.2)
    3. Prove rewards remain close (Lemma 4.3)
    4. Leverage uniform value existence for standard stochastic games
  • Error Bound: |v*(b₁, ε) - v(b₁)| ≤ 4ε
  • Decidability: Via reduction to finite-state stochastic games

Theorem 3.6 (Undecidability):

  • Proof Strategy: Reduction from PFA universality problem
  • Key Construction:
    • Add absorbing state k̂
    • Restart action resets to initial state
    • Reward design ensures acceptance probability > 1/2 ⟺ long-term average > 1/2
  • Separation Result: Contrasts with standard stochastic games (exact computation decidable OB21)

Theorem 3.7 (Initial Belief Independence):

  • Proof Points: In abstract game, different initial beliefs differ only in first nε steps
  • Error Bound: |v(b₁) - v(b'₁)| ≤ 8ε for any b₁, b'₁

Hierarchy of Sufficient Conditions

The paper identifies strict inclusion relationships among matrix classes:

C₅ ⊊ C₄ ⊊ C₃ ⊊ C₂ ⊊ C₁

where:

  • C₅ (Markov): At least one column entirely positive
  • C₄ (Scrambling): Any two rows have common successor
  • C₃ (Sarymsakov): Any two disjoint subsets either have common reachable state or reachable sets expand
  • C₂ (C₂-matrices): SIA and product with all SIA matrices remains SIA
  • C₁ (SIA): Pⁿ converges to stable matrix

All transition matrices in these classes guarantee ergodic blind stochastic games.

Blind Stochastic Game Foundations

  1. Standard Stochastic Games Sha53: Complete state observation
    • Mertens-Neyman MN81: Uniform value always exists
    • Algorithms: OB21 and others provide computation methods
  2. Blind Stochastic Games:
    • N-stage value exists vN28
    • Ziliotto Zil16: Counterexample showing uniform value may not exist
    • Venel Ven15: Uniform value exists under exchangeability (no algorithm)

Single-Player Case (Blind MDP/PFA)

  1. Existence: Rosenberg et al. RSV02 prove uniform value exists for blind MDPs
  2. Undecidability: Madani et al. MHC03 prove both computation and approximation undecidable
  3. Finite Memory: Chatterjee et al. CSZ22 prove finite-memory strategies suffice, but memory size has no computable bound

Ergodicity Research

  1. Standard Stochastic Games:
    • Finite state HK66, Sob71, Vri03
    • Countable state Now99
    • Applications: cryptocurrency attack analysis CKGIV18
  2. Ergodicity in MDPs:
    • Finite state AS92, HBS87, WSS11
    • Countable state BSL90, PBS93
    • Surveys ABFG+93, Put14
  3. Automata Theory:
    • Single-player case CSV13 studies probabilistic automata with isolated cut vertices
    • This paper extends to two-player games

Decidability Research

  1. Positive Results:
    • Under strong assumptions CH10
    • Other objectives CCT16, CDH13, GO14
  2. Negative Results:
    • ω-regular objectives Cha07
    • Partially observable games CCGK16

Advantages Over Prior Work

  1. vs. Venel Ven15: Not only existence but algorithms
  2. vs. Blind MDP Literature: First approximation decidability for two-player games
  3. vs. Standard Ergodicity: Conditions on original data, identifying ergodic behavior in belief dynamics
  4. Novelty: Even in single-player POMDPs, approximation decidability is new

Conclusions and Discussion

Main Conclusions

  1. Existence: Ergodic blind stochastic games always have uniform value
  2. Decidability Dichotomy:
    • Approximation: Decidable (2-EXPSPACE)
    • Exact: Undecidable (even for Markov blind MDPs)
  3. Initial Belief Independence: Uniform value is a constant function
  4. Sufficient Conditions: Identified 5 matrix classes guaranteeing ergodicity
  5. Verification Algorithm: Exponential-space decidability of ergodicity property

Limitations

1. Complexity

  • 2-EXPSPACE upper bound is high
  • Practical computation may be infeasible
  • No lower bound provided

2. Applicability Range

  • Limited to ergodic case
  • Ergodicity condition (Eq. 1) must hold uniformly over all action sequences
  • Many practical games may not satisfy it

3. Exact Computation

  • Theorem 3.6 shows exact computation undecidable
  • Can only approximate to arbitrary precision
  • Cannot judge approximation quality

4. Extensibility Issues

  • Extension to hidden stochastic games (with signals) faces major challenges
  • Random transitions cause error propagation
  • Existence and decidability remain open

Future Directions

1. Hidden Stochastic Games (Section 5 discussion)

  • Scrambling Hidden Games: Definition 5.2 provides generalization
  • Open Questions:
    • Does uniform value exist?
    • Is approximation decidable?
  • Main Obstacles:
    • Belief transitions become random (vs. deterministic in blind games)
    • Cannot perfectly couple original and abstract games
    • Errors may propagate RSV02

2. Complexity Improvement

  • Reduce 2-EXPSPACE upper bound
  • Establish matching lower bound
  • Identify polynomial-time solvable subclasses

3. Algorithm Optimization

  • Practically implementable algorithms
  • Exploit problem structure
  • Online approximation quality estimation

4. Application Exploration

  • Robot navigation (partial observability)
  • Network security (attack-defense games)
  • Resource management (incomplete information)

5. Theoretical Deepening

  • Alternative ergodicity characterizations
  • Relationships with other game classes
  • Generalization to multiplayer games

In-Depth Evaluation

Strengths

1. Significant Theoretical Contribution

  • Pioneering: First approximation decidability in blind stochastic games
  • Tightness: Undecidability result shows approximation is the best possible
  • Comprehensiveness: Simultaneously addresses existence, decidability, and initial belief independence

2. Methodological Innovation

  • Abstract Game Construction: Elegantly exploits ergodicity to construct finite approximation
  • Stabilization Technique: Averaging makes belief updates independent
  • Error Analysis: Fine-grained ε-control throughout proofs

3. Technical Depth

  • Matrix Theory: Deep exploitation of ergodicity coefficients and matrix product theory
  • Complexity Theory: Clever reduction proving undecidability
  • Game Theory: Connects multiple research areas (automata, MDPs, stochastic games)

4. Result Completeness

  • Provides sufficient conditions (5 matrix classes)
  • Gives verification algorithm (Proposition 3.9)
  • Includes concrete example (Example 3.1)
  • Discusses extension directions (Section 5)

5. Writing Clarity

  • Well-structured, logically clear
  • Rigorous definitions, standard notation
  • Detailed proofs, easy to verify
  • Comprehensive related work

Weaknesses

1. Computational Complexity

  • 2-EXPSPACE Too High: Practically infeasible
  • No Lower Bound: Unknown if improvable
  • No Experiments: Actual performance not evaluated

2. Applicability Limitations

  • Strong Ergodicity Condition: Equation (1) requires uniformity over all sequences
  • Finite State Only: Infinite state spaces not considered
  • Zero-Sum Assumption: General-sum games not discussed

3. Algorithm Details

  • Algorithm 1 Too Abstract: Lacks implementation details
  • Numerical Stability: Floating-point computation issues not discussed
  • Parameter Selection: No guidance on choosing ε

4. Experimental Validation

  • Only One Example: Example 3.1 too simple
  • No Performance Evaluation: Runtime, memory usage unknown
  • No Comparison: Not compared with other methods (e.g., sampling)

5. Extensibility Issues

  • Hidden Games: Main obstacles unresolved
  • Multiplayer Games: Not discussed
  • Other Objectives: Discounted rewards, reachability not thoroughly explored

Impact

1. Theoretical Impact

  • Fills Gap: First approximation decidability result in blind stochastic games and POMDPs
  • Methodology: Abstract game construction may inspire other incomplete information game research
  • Complexity Theory: Exact vs. approximate dichotomy is important insight

2. Practical Value

  • Limited: 2-EXPSPACE complexity restricts practical applications
  • Theoretical Guidance: Identifying tractable subclasses valuable
  • Benchmark: Serves as theoretical benchmark for heuristic methods

3. Reproducibility

  • Theoretical Results: Detailed proofs, verifiable
  • Algorithms: Clear description, implementable
  • Examples: Simple, reproducible
  • Code: No implementation provided

4. Follow-up Research

  • Direct Extensions: Hidden games, multiplayer games
  • Algorithm Improvement: Reduce complexity, practical implementation
  • Applications: Domain-specific modeling and solving

Applicable Scenarios

Theoretically Applicable:

  1. Small-Scale Problems: Relatively small state and action spaces
  2. Strong Ergodicity: Transition matrices mix rapidly
  3. Approximation Sufficient: Exact values not required
  4. Offline Computation: High computational cost tolerable

Specific Applications:

  1. Machine Maintenance: As in Example 3.1, maintenance decisions with unobservable state
  2. Network Monitoring: Intrusion detection based on historical data
  3. Robot Navigation: Path planning in partially observable environments
  4. Resource Allocation: Competitive resource allocation under incomplete information

Inapplicable Scenarios:

  1. Large-Scale Systems: Exponential state space explosion
  2. Non-Ergodic Systems: Long-term history dependence
  3. Real-Time Decision: Limited computation time
  4. Exact Requirements: Need for exact optimal strategies

Selected References

  1. MN81 Mertens & Neyman (1981): Foundational work on uniform value existence in standard stochastic games
  2. Zil16 Ziliotto (2016): Counterexample showing uniform value may not exist in blind stochastic games
  3. MHC03 Madani, Hanks & Condon (2003): Classical undecidability results for blind MDPs
  4. Sen06 Seneta (2006): Authoritative survey on non-negative matrices and ergodicity theory
  5. Ven15 Venel (2015): Uniform value existence for exchangeable stochastic games
  6. RSV02 Rosenberg, Solan & Vieille (2002): Uniform value existence for blind MDPs
  7. CSZ22 Chatterjee, Saona & Ziliotto (2022): Finite-memory strategies in POMDPs
  8. OB21 Oliu-Barton (2021): New algorithms for standard stochastic games

Overall Assessment: This is an excellent paper with profound theoretical contributions and solid technical execution. It achieves important breakthroughs on the fundamental but challenging problem of blind stochastic games, cleverly balancing uniform value existence and computability through ergodicity conditions. While high algorithmic complexity limits practical applications, its theoretical contributions and methodological innovations are significant for game theory, automata theory, and computational complexity theory. The tightness result (approximation decidable but exact computation undecidable) is particularly impressive, clearly delineating the computational boundaries of the problem.