Uniform Value and Decidability in Ergodic Blind Stochastic Games
Chatterjee, Lurie, Saona et al.
We study a class of two-player zero-sum stochastic games known as \textit{blind stochastic games}, where players neither observe the state nor receive any information about it during the game. A central concept for analyzing long-duration stochastic games is the \textit{uniform value}. A game has a uniform value $v$ if for every $\varepsilon>0$, Player 1 (resp., Player 2) has a strategy such that, for all sufficiently large $n$, his average payoff over $n$ stages is at least $v-\varepsilon$ (resp., at most $v+\varepsilon$). Prior work has shown that the uniform value may not exist in general blind stochastic games. To address this, we introduce a subclass called \textit{ergodic blind stochastic games}, defined by imposing an ergodicity condition on the state transitions. For this subclass, we prove the existence of the uniform value and provide an algorithm to approximate it, establishing the \textit{decidability} of the approximation problem. Notably, this decidability result is novel even in the single-player setting of Partially Observable Markov Decision Processes (POMDPs). Furthermore, we show that no algorithm can compute the uniform value exactly, emphasizing the tightness of our result. Finally, we establish that the uniform value is independent of the initial belief.
academic
Uniform Value and Decidability in Ergodic Blind Stochastic Games
Title: Uniform Value and Decidability in Ergodic Blind Stochastic Games
Authors: Krishnendu Chatterjee (IST Austria), David Lurie (NyxAir & Paris Dauphine University), Raimundo Saona (IST Austria), Bruno Ziliotto (Toulouse School of Economics)
Classification: math.OC (Optimization and Control), cs.CC (Computational Complexity)
This paper investigates blind stochastic games—a class of two-player zero-sum stochastic games where participants neither observe the state nor receive any information about it. The paper introduces ergodic blind stochastic games as a subclass, defined by imposing ergodicity conditions on state transitions. For this subclass, the paper proves the existence of uniform value and provides approximation algorithms, establishing decidability of the approximation problem. This decidability result is novel even in the single-player setting of partially observable Markov decision processes (POMDPs). Furthermore, the paper proves that no algorithm can compute the uniform value exactly, emphasizing the tightness of the results. Finally, the paper establishes that the uniform value is independent of the initial belief.
The core problems addressed in this paper are: How can we characterize the value of long-term play in blind stochastic games? Specifically:
Existence of Uniform Value: Ziliotto Zil16 has shown that general blind stochastic games may not have a uniform value. Under what conditions does the uniform value exist?
Computability: Even if the uniform value exists, can it be computed or approximated algorithmically? Madani et al. MHC03 have shown that computing and approximating the uniform value are both undecidable in general blind MDPs.
Theoretical Importance: Blind stochastic games are the simplest model of incomplete information games and serve as a theoretical benchmark for studying more complex models. They are equivalent to probabilistic finite automata, which have broad applications in computer science.
Practical Applications: Including concise specification languages for infinite strings BGB12, CT12, sequence analysis in computational biology DEKM98, speech processing Moh97, and others.
Methodological Contribution: Identifying conditions on data that ensure ergodic behavior in belief dynamics represents a key methodological contribution.
Definition of Ergodic Blind Stochastic Games: Formally defines this new subclass based on ergodicity properties of transition matrix products (Definition 3.3)
Uniform Value Existence and Approximation Decidability (Theorem 3.5):
Proves all ergodic blind stochastic games have uniform value
Provides approximation algorithms establishing decidability of the approximation problem
Complexity upper bound: 2-EXPSPACE
Undecidability of Exact Computation (Theorem 3.6):
Proves that even for Markov blind MDPs (a subclass of ergodic blind stochastic games), exact computation of uniform value is undecidable
Demonstrates tightness of the approximation decidability result
Initial Belief Independence (Theorem 3.7):
Proves that in ergodic blind stochastic games, uniform value does not depend on initial belief
Sufficient Conditions: Identifies several classes of transition matrices guaranteeing ergodicity (C1-C5), including Markov matrices, scrambling matrices, Sarymsakov matrices, etc.
Verification Algorithm (Proposition 3.9): Provides an exponential-space algorithm for verifying whether a blind stochastic game satisfies the ergodicity property
Blind Stochastic Game is defined as a five-tuple Γ = (K, I, J, p, g):
K: finite state set
I, J: finite action sets for players 1 and 2
p: K × I × J → Δ(K): probabilistic transition function
g: K × I × J → 0,1: stage reward function
Key Feature: Players do not observe the state, knowing only the initial belief b₁ ∈ Δ(K) and history of actions.
Uniform Value is defined as: Game Γ has uniform value v: Δ(K) → 0,1 if for all b₁ ∈ Δ(K) and ε > 0, there exist strategies (σ*, π*) and n ∈ ℕ* such that for all N ≥ n:
Player 1 can guarantee: γ^(b₁)_N(σ*, π) ≥ v(b₁) - ε
Player 2 can guarantee: γ^(b₁)_N(σ, π*) ≤ v(b₁) + ε
where γ^(b₁)N(σ, π) = 𝔼^(b₁)(σ,π)1/N ∑^N_(m=1) G_m is the N-stage average reward.
Ergodic Blind Stochastic Game (Definition 3.3): Game Γ is ergodic if for all ε > 0, there exists an integer n₀ such that for all action pair sequences aⁿ of length n ≥ n₀:
τ₁(Tⁿ(aⁿ)) ≤ ε
where Tⁿ(aⁿ) = P(a₁)P(a₂)···P(aₙ) is the product of forward transition matrices.
Key Property (Proposition 3.8): Game Γ is ergodic if and only if there exists n₀ ≤ (3^|K| - 2^(|K|+1) + 1)/2 such that for all action pair sequences of length n₀:
The paper's core method constructs an abstract stochastic game G*(b₁, ε), which is finite-state and approximates the original blind stochastic game with error ε.
Construction Steps:
Step 1: Determine Time Scale (Proposition 4.1)
Given ε > 0, compute nε = n₀⌈ln(ε)/ln(sup_(aⁿ⁰) τ₁(Tⁿ⁰(aⁿ⁰)))⌉
For all sequences aⁿ of length n ≥ nε, we have τ₁(Tⁿ(aⁿ)) ≤ ε
Step 2: Construct Stable Matrix Set
T(ε): set of all forward product matrices Tⁿ of length n satisfying the τ₁ condition
T̃(ε): abstract stable matrix set; for each Tⁿ ∈ T(ε), define T̃ⁿ such that: t̃ⁿ_(k,k')(aⁿ) := 1/|K| ∑^|K|(k=1) tⁿ(k,k')(aⁿ) i.e., each row equals the average of all rows, making the matrix stable (all rows identical)
Sen06 Seneta (2006): Authoritative survey on non-negative matrices and ergodicity theory
Ven15 Venel (2015): Uniform value existence for exchangeable stochastic games
RSV02 Rosenberg, Solan & Vieille (2002): Uniform value existence for blind MDPs
CSZ22 Chatterjee, Saona & Ziliotto (2022): Finite-memory strategies in POMDPs
OB21 Oliu-Barton (2021): New algorithms for standard stochastic games
Overall Assessment: This is an excellent paper with profound theoretical contributions and solid technical execution. It achieves important breakthroughs on the fundamental but challenging problem of blind stochastic games, cleverly balancing uniform value existence and computability through ergodicity conditions. While high algorithmic complexity limits practical applications, its theoretical contributions and methodological innovations are significant for game theory, automata theory, and computational complexity theory. The tightness result (approximation decidable but exact computation undecidable) is particularly impressive, clearly delineating the computational boundaries of the problem.