2025-11-24T11:16:18.396523

Uniform Value and Decidability in Ergodic Blind Stochastic Games

Chatterjee, Lurie, Saona et al.

We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value $v$ if for every $\varepsilon>0$, Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large $n$, his average payoff over $n$ stages is at least $v-\varepsilon$ (resp., at most $v+\varepsilon$). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.

academic

Uniform Value and Decidability in Ergodic Blind Stochastic Games

Basic Information

Paper ID: 2405.12583
Title: Uniform Value and Decidability in Ergodic Blind Stochastic Games
Authors: Krishnendu Chatterjee (IST Austria), David Lurie (NyxAir & Paris Dauphine University), Raimundo Saona (IST Austria), Bruno Ziliotto (Toulouse School of Economics)
Classification: math.OC (Optimization and Control), cs.CC (Computational Complexity)
Publication Date: arXiv v2, November 21, 2025
Paper Link: https://arxiv.org/abs/2405.12583v2

Abstract

This paper investigates blind stochastic games—a class of two-player zero-sum stochastic games where participants neither observe the state nor receive any information about it. The paper introduces ergodic blind stochastic games as a subclass, defined by imposing ergodicity conditions on state transitions. For this subclass, the paper proves the existence of uniform value and provides approximation algorithms, establishing decidability of the approximation problem. This decidability result is novel even in the single-player setting of partially observable Markov decision processes (POMDPs). Furthermore, the paper proves that no algorithm can compute the uniform value exactly, emphasizing the tightness of the results. Finally, the paper establishes that the uniform value is independent of the initial belief.

Research Background and Motivation

Research Questions

The core problems addressed in this paper are: How can we characterize the value of long-term play in blind stochastic games? Specifically:

Existence of Uniform Value: Ziliotto Zil16 has shown that general blind stochastic games may not have a uniform value. Under what conditions does the uniform value exist?
Computability: Even if the uniform value exists, can it be computed or approximated algorithmically? Madani et al. MHC03 have shown that computing and approximating the uniform value are both undecidable in general blind MDPs.

Problem Significance

Theoretical Importance: Blind stochastic games are the simplest model of incomplete information games and serve as a theoretical benchmark for studying more complex models. They are equivalent to probabilistic finite automata, which have broad applications in computer science.
Practical Applications: Including concise specification languages for infinite strings BGB12, CT12, sequence analysis in computational biology DEKM98, speech processing Moh97, and others.
Methodological Contribution: Identifying conditions on data that ensure ergodic behavior in belief dynamics represents a key methodological contribution.

Limitations of Existing Approaches

General Blind Stochastic Games: Do not guarantee existence of uniform value Zil16
Exchangeable Blind Stochastic Games: Venel Ven15 proved uniform value exists but provided no computational algorithm
Blind MDPs: Rosenberg et al. RSV02 proved uniform value exists, but Madani et al. MHC03 proved both computation and approximation are undecidable
Existing Ergodicity Research: Primarily focused on standard stochastic games or MDPs; blind stochastic games not systematically studied

Research Motivation

The paper's starting point is to identify a decidable subclass of blind stochastic games by introducing ergodicity conditions. These conditions:

Are natural and commonly used in game theory literature
Guarantee existence of uniform value
Make the approximation problem decidable
Keep exact computation undecidable, showing result tightness

Core Contributions

The paper's main contributions include:

Definition of Ergodic Blind Stochastic Games: Formally defines this new subclass based on ergodicity properties of transition matrix products (Definition 3.3)
Uniform Value Existence and Approximation Decidability (Theorem 3.5):
- Proves all ergodic blind stochastic games have uniform value
- Provides approximation algorithms establishing decidability of the approximation problem
- Complexity upper bound: 2-EXPSPACE
Undecidability of Exact Computation (Theorem 3.6):
- Proves that even for Markov blind MDPs (a subclass of ergodic blind stochastic games), exact computation of uniform value is undecidable
- Demonstrates tightness of the approximation decidability result
Initial Belief Independence (Theorem 3.7):
- Proves that in ergodic blind stochastic games, uniform value does not depend on initial belief
Sufficient Conditions: Identifies several classes of transition matrices guaranteeing ergodicity (C1-C5), including Markov matrices, scrambling matrices, Sarymsakov matrices, etc.
Verification Algorithm (Proposition 3.9): Provides an exponential-space algorithm for verifying whether a blind stochastic game satisfies the ergodicity property

Methodology Details

Task Definition

Blind Stochastic Game is defined as a five-tuple Γ = (K, I, J, p, g):

K: finite state set
I, J: finite action sets for players 1 and 2
p: K × I × J → Δ(K): probabilistic transition function
g: K × I × J → 0,1: stage reward function

Key Feature: Players do not observe the state, knowing only the initial belief b₁ ∈ Δ(K) and history of actions.

Uniform Value is defined as: Game Γ has uniform value v: Δ(K) → 0,1 if for all b₁ ∈ Δ(K) and ε > 0, there exist strategies (σ*, π*) and n ∈ ℕ* such that for all N ≥ n:

Player 1 can guarantee: γ^(b₁)_N(σ*, π) ≥ v(b₁) - ε
Player 2 can guarantee: γ^(b₁)_N(σ, π*) ≤ v(b₁) + ε

where γ^(b₁)N(σ, π) = 𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G_m is the N-stage average reward.

Ergodicity Characterization

Core Concept: Ergodicity coefficient τ₁. For a stochastic matrix P, define:

τ₁(P) := 1/2 max_(k,k̄) ∑^|K|(k'=1) |p(k,k') - p_(k̄,k')|

Ergodic Blind Stochastic Game (Definition 3.3): Game Γ is ergodic if for all ε > 0, there exists an integer n₀ such that for all action pair sequences aⁿ of length n ≥ n₀:

τ₁(Tⁿ(aⁿ)) ≤ ε

where Tⁿ(aⁿ) = P(a₁)P(a₂)···P(aₙ) is the product of forward transition matrices.

Key Property (Proposition 3.8): Game Γ is ergodic if and only if there exists n₀ ≤ (3^|K| - 2^(|K|+1) + 1)/2 such that for all action pair sequences of length n₀:

τ₁(Tⁿ⁰(aⁿ⁰)) < 1

Model Architecture: Abstract Stochastic Game

The paper's core method constructs an abstract stochastic game G*(b₁, ε), which is finite-state and approximates the original blind stochastic game with error ε.

Construction Steps:

Step 1: Determine Time Scale (Proposition 4.1)

Given ε > 0, compute nε = n₀⌈ln(ε)/ln(sup_(aⁿ⁰) τ₁(Tⁿ⁰(aⁿ⁰)))⌉
For all sequences aⁿ of length n ≥ nε, we have τ₁(Tⁿ(aⁿ)) ≤ ε

Step 2: Construct Stable Matrix Set

T(ε): set of all forward product matrices Tⁿ of length n satisfying the τ₁ condition
T̃(ε): abstract stable matrix set; for each Tⁿ ∈ T(ε), define T̃ⁿ such that:
t̃ⁿ_(k,k')(aⁿ) := 1/|K| ∑^|K|(k=1) tⁿ(k,k')(aⁿ)
i.e., each row equals the average of all rows, making the matrix stable (all rows identical)

Step 3: Define Abstract State Space

Abstract belief set: B* := {b* ∈ Δ(K) | ∃T̃ⁿ ∈ T̃(ε) s.t. b* = b₁^⊤T̃ⁿ} ∪ {b₁}

State space: X := ⋃^(n-1)_(m=0) (B* × (I × J)^m)

This is a finite set with size O(|I × J|^(2n)), where n is doubly exponential.

Step 4: Define Transitions and Rewards

Projection function proj: X → Δ(K) maps abstract states to beliefs
Abstract update function ψ*:
- If m ∈ 0, n-2: ψ*(x, a) = (b*, a₁, ···, a_m, a)
- If m = n-1: ψ*(x, a) = (proj(x)^⊤T̃ⁿ(aⁿ)) (reset to new abstract belief)
Transition function: p*(x'|x, a) = 1 if x' = ψ*(x, a), otherwise 0 (deterministic)
Reward function: g*(x, i, j) = ∑_(k∈K) proj(x)(k)·g(k, i, j)

Technical Innovations

1. Aggregation Scheme Design

Key Insight: Ergodicity ensures that after length n, all transition matrix rows become similar (difference ≤ ε)
Stabilization Technique: Construct stable matrices through averaging, making belief updates independent of initial belief
Block Structure: Reset every n steps, exploiting the "memory loss" property of ergodicity

2. Fine-grained Error Control Analysis

Lemma 4.2: Proves bounded belief distance between original and abstract games:
||b^(b₁,h_m)(m,σ,π) - proj(x^(b₁,h_m)(m,σ,π))||₁ ≤ 4ε
Lemma 4.3: Proves bounded average reward difference:
|𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G_m - 𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G*_m| ≤ 4ε

3. Distinction from Baselines

vs. Venel Ven15's Exchangeable Games: Not only proves existence but provides decidable algorithms
vs. Standard Stochastic Game Ergodicity: Conditions imposed on original data rather than belief games
vs. Blind MDP Literature: First to establish approximation decidability in two-player games

4. Cleverness of Undecidability Proof

Reduction from the universality problem of probabilistic finite automata (PFA)
Constructs Restart action enabling blind MDP to simulate PFA
Adds state k̂ ensuring all transition matrices are Markov (guaranteeing ergodicity)
Proves acceptance probability > 1/2 equivalent to long-term average > 1/2

Experimental Setup

Example: Machine Maintenance Problem (Example 3.1)

Problem Description:

States: K = {Good, Fair, Poor} (machine condition)
Actions: I = {Wait, Basic Maintenance, Critical Repair}
Players monitor an inaccessible machine and make decisions based on action history

Transition Matrices (partial display):

Under Wait action:
     G    F    P
G  0.9  0.1  0.0
F  0.0  0.7  0.3
P  0.0  0.1  0.9

Verify Ergodicity:

Each action's transition matrix P(i) is a Markov matrix (at least one column entirely positive)
Therefore τ₁(P(i)) < 1 for all i ∈ I
Satisfies ergodicity property

Algorithm Implementation (Algorithm 1)

Input: Initial belief b₁, blind stochastic game Γ, precision ε
1. Verify ergodicity property of Γ
2. Construct matrix set T(ε)
3. Derive abstract matrix set T̃(ε) from T(ε)
4. Construct abstract stochastic game G*(b₁, ε)
5. Compute uniform value v*(b₁, ε) of G*(b₁, ε)
Output: v*(b₁, ε) (if Γ is ergodic)

Complexity Analysis:

Number of states: O(|I × J|^(2n)), where n ≤ (3^|K| - 2^(|K|+1) + 1)/2
Via real closed field theory, standard stochastic games solvable in PSPACE CMH08
Overall complexity: 2-EXPSPACE

Experimental Results

Main Theoretical Results

The paper is primarily theoretical work; core results summarized in Table 2:

Category	Uniform Value Exists	Exact Decidable	Approx Decidable	Value Constant	Sufficient Conditions
Ergodic Blind MDP	Yes	Undecidable	Decidable	Yes	Yes
Ergodic Blind Stochastic Game	Yes	Undecidable	Decidable	Yes	Yes
Scrambling Hidden Stochastic Game	?	Undecidable	?	?	Yes

(Bold indicates new results from this paper)

Theorem Verification

Theorem 3.5 (Existence and Decidability):

Proof Strategy: Via 4-step construction
1. Construct abstract game G*(b₁, ε)
2. Prove beliefs remain close (Lemma 4.2)
3. Prove rewards remain close (Lemma 4.3)
4. Leverage uniform value existence for standard stochastic games
Error Bound: |v*(b₁, ε) - v(b₁)| ≤ 4ε
Decidability: Via reduction to finite-state stochastic games

Theorem 3.6 (Undecidability):

Proof Strategy: Reduction from PFA universality problem
Key Construction:
- Add absorbing state k̂
- Restart action resets to initial state
- Reward design ensures acceptance probability > 1/2 ⟺ long-term average > 1/2
Separation Result: Contrasts with standard stochastic games (exact computation decidable OB21)

Theorem 3.7 (Initial Belief Independence):

Proof Points: In abstract game, different initial beliefs differ only in first nε steps
Error Bound: |v(b₁) - v(b'₁)| ≤ 8ε for any b₁, b'₁

Hierarchy of Sufficient Conditions

The paper identifies strict inclusion relationships among matrix classes:

C₅ ⊊ C₄ ⊊ C₃ ⊊ C₂ ⊊ C₁

where:

C₅ (Markov): At least one column entirely positive
C₄ (Scrambling): Any two rows have common successor
C₃ (Sarymsakov): Any two disjoint subsets either have common reachable state or reachable sets expand
C₂ (C₂-matrices): SIA and product with all SIA matrices remains SIA
C₁ (SIA): Pⁿ converges to stable matrix

All transition matrices in these classes guarantee ergodic blind stochastic games.

Blind Stochastic Game Foundations

Standard Stochastic Games Sha53: Complete state observation
- Mertens-Neyman MN81: Uniform value always exists
- Algorithms: OB21 and others provide computation methods
Blind Stochastic Games:
- N-stage value exists vN28
- Ziliotto Zil16: Counterexample showing uniform value may not exist
- Venel Ven15: Uniform value exists under exchangeability (no algorithm)

Single-Player Case (Blind MDP/PFA)

Existence: Rosenberg et al. RSV02 prove uniform value exists for blind MDPs
Undecidability: Madani et al. MHC03 prove both computation and approximation undecidable
Finite Memory: Chatterjee et al. CSZ22 prove finite-memory strategies suffice, but memory size has no computable bound

Ergodicity Research

Standard Stochastic Games:
- Finite state HK66, Sob71, Vri03
- Countable state Now99
- Applications: cryptocurrency attack analysis CKGIV18
Ergodicity in MDPs:
- Finite state AS92, HBS87, WSS11
- Countable state BSL90, PBS93
- Surveys ABFG+93, Put14
Automata Theory:
- Single-player case CSV13 studies probabilistic automata with isolated cut vertices
- This paper extends to two-player games

Decidability Research

Positive Results:
- Under strong assumptions CH10
- Other objectives CCT16, CDH13, GO14
Negative Results:
- ω-regular objectives Cha07
- Partially observable games CCGK16

Advantages Over Prior Work

vs. Venel Ven15: Not only existence but algorithms
vs. Blind MDP Literature: First approximation decidability for two-player games
vs. Standard Ergodicity: Conditions on original data, identifying ergodic behavior in belief dynamics
Novelty: Even in single-player POMDPs, approximation decidability is new

Conclusions and Discussion

Main Conclusions

Existence: Ergodic blind stochastic games always have uniform value
Decidability Dichotomy:
- Approximation: Decidable (2-EXPSPACE)
- Exact: Undecidable (even for Markov blind MDPs)
Initial Belief Independence: Uniform value is a constant function
Sufficient Conditions: Identified 5 matrix classes guaranteeing ergodicity
Verification Algorithm: Exponential-space decidability of ergodicity property

Limitations

1. Complexity

2-EXPSPACE upper bound is high
Practical computation may be infeasible
No lower bound provided

2. Applicability Range

Limited to ergodic case
Ergodicity condition (Eq. 1) must hold uniformly over all action sequences
Many practical games may not satisfy it

3. Exact Computation

Theorem 3.6 shows exact computation undecidable
Can only approximate to arbitrary precision
Cannot judge approximation quality

4. Extensibility Issues

Extension to hidden stochastic games (with signals) faces major challenges
Random transitions cause error propagation
Existence and decidability remain open

Future Directions

1. Hidden Stochastic Games (Section 5 discussion)

Scrambling Hidden Games: Definition 5.2 provides generalization
Open Questions:
- Does uniform value exist?
- Is approximation decidable?
Main Obstacles:
- Belief transitions become random (vs. deterministic in blind games)
- Cannot perfectly couple original and abstract games
- Errors may propagate RSV02

2. Complexity Improvement

Reduce 2-EXPSPACE upper bound
Establish matching lower bound
Identify polynomial-time solvable subclasses

3. Algorithm Optimization

Practically implementable algorithms
Exploit problem structure
Online approximation quality estimation

4. Application Exploration

Robot navigation (partial observability)
Network security (attack-defense games)
Resource management (incomplete information)

5. Theoretical Deepening

Alternative ergodicity characterizations
Relationships with other game classes
Generalization to multiplayer games

In-Depth Evaluation

Strengths

1. Significant Theoretical Contribution

Pioneering: First approximation decidability in blind stochastic games
Tightness: Undecidability result shows approximation is the best possible
Comprehensiveness: Simultaneously addresses existence, decidability, and initial belief independence

2. Methodological Innovation

Abstract Game Construction: Elegantly exploits ergodicity to construct finite approximation
Stabilization Technique: Averaging makes belief updates independent
Error Analysis: Fine-grained ε-control throughout proofs

3. Technical Depth

Matrix Theory: Deep exploitation of ergodicity coefficients and matrix product theory
Complexity Theory: Clever reduction proving undecidability
Game Theory: Connects multiple research areas (automata, MDPs, stochastic games)

4. Result Completeness

Provides sufficient conditions (5 matrix classes)
Gives verification algorithm (Proposition 3.9)
Includes concrete example (Example 3.1)
Discusses extension directions (Section 5)

5. Writing Clarity

Well-structured, logically clear
Rigorous definitions, standard notation
Detailed proofs, easy to verify
Comprehensive related work

Weaknesses

1. Computational Complexity

2-EXPSPACE Too High: Practically infeasible
No Lower Bound: Unknown if improvable
No Experiments: Actual performance not evaluated

2. Applicability Limitations

Strong Ergodicity Condition: Equation (1) requires uniformity over all sequences
Finite State Only: Infinite state spaces not considered
Zero-Sum Assumption: General-sum games not discussed

3. Algorithm Details

Algorithm 1 Too Abstract: Lacks implementation details
Numerical Stability: Floating-point computation issues not discussed
Parameter Selection: No guidance on choosing ε

4. Experimental Validation

Only One Example: Example 3.1 too simple
No Performance Evaluation: Runtime, memory usage unknown
No Comparison: Not compared with other methods (e.g., sampling)

5. Extensibility Issues

Hidden Games: Main obstacles unresolved
Multiplayer Games: Not discussed
Other Objectives: Discounted rewards, reachability not thoroughly explored

Impact

1. Theoretical Impact

Fills Gap: First approximation decidability result in blind stochastic games and POMDPs
Methodology: Abstract game construction may inspire other incomplete information game research
Complexity Theory: Exact vs. approximate dichotomy is important insight

2. Practical Value

Limited: 2-EXPSPACE complexity restricts practical applications
Theoretical Guidance: Identifying tractable subclasses valuable
Benchmark: Serves as theoretical benchmark for heuristic methods

3. Reproducibility

Theoretical Results: Detailed proofs, verifiable
Algorithms: Clear description, implementable
Examples: Simple, reproducible
Code: No implementation provided

4. Follow-up Research

Direct Extensions: Hidden games, multiplayer games
Algorithm Improvement: Reduce complexity, practical implementation
Applications: Domain-specific modeling and solving

Applicable Scenarios

Theoretically Applicable:

Small-Scale Problems: Relatively small state and action spaces
Strong Ergodicity: Transition matrices mix rapidly
Approximation Sufficient: Exact values not required
Offline Computation: High computational cost tolerable

Specific Applications:

Machine Maintenance: As in Example 3.1, maintenance decisions with unobservable state
Network Monitoring: Intrusion detection based on historical data
Robot Navigation: Path planning in partially observable environments
Resource Allocation: Competitive resource allocation under incomplete information

Inapplicable Scenarios:

Large-Scale Systems: Exponential state space explosion
Non-Ergodic Systems: Long-term history dependence
Real-Time Decision: Limited computation time
Exact Requirements: Need for exact optimal strategies

Selected References

MN81 Mertens & Neyman (1981): Foundational work on uniform value existence in standard stochastic games
Zil16 Ziliotto (2016): Counterexample showing uniform value may not exist in blind stochastic games
MHC03 Madani, Hanks & Condon (2003): Classical undecidability results for blind MDPs
Sen06 Seneta (2006): Authoritative survey on non-negative matrices and ergodicity theory
Ven15 Venel (2015): Uniform value existence for exchangeable stochastic games
RSV02 Rosenberg, Solan & Vieille (2002): Uniform value existence for blind MDPs
CSZ22 Chatterjee, Saona & Ziliotto (2022): Finite-memory strategies in POMDPs
OB21 Oliu-Barton (2021): New algorithms for standard stochastic games

Overall Assessment: This is an excellent paper with profound theoretical contributions and solid technical execution. It achieves important breakthroughs on the fundamental but challenging problem of blind stochastic games, cleverly balancing uniform value existence and computability through ergodicity conditions. While high algorithmic complexity limits practical applications, its theoretical contributions and methodological innovations are significant for game theory, automata theory, and computational complexity theory. The tightness result (approximation decidable but exact computation undecidable) is particularly impressive, clearly delineating the computational boundaries of the problem.