2025-11-25T11:58:17.992104

Multi Timescale Stochastic Approximation: Stability and Convergence

Deb, Ganesh, Bhatnagar
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
academic

Multi Timescale Stochastic Approximation: Stability and Convergence

Basic Information

  • Paper ID: 2112.03515
  • Title: Multi Timescale Stochastic Approximation: Stability and Convergence
  • Authors: Rohan Deb (University of Illinois, Urbana-Champaign), Swetha Ganesh (Purdue University, West Lafayette), Shalabh Bhatnagar (Indian Institute of Science, Bengaluru)
  • Classification: eess.SY cs.SY
  • Publication Date: October 16, 2025 (arXiv version)
  • Paper Link: https://arxiv.org/abs/2112.03515v3

Abstract

This paper presents the first sufficient conditions guaranteeing stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterations. The work extends existing single-timescale and two-timescale SA results to general N-timescale stochastic recursions for arbitrary N≥1, employing the ordinary differential equation (ODE) method. As an application, the authors investigate SA algorithms with heavy-ball momentum enhancement in gradient temporal-difference (GTD) learning. The added momentum introduces auxiliary states evolving on an intermediate timescale, yielding three-timescale recursions. Under appropriate momentum parameters, the scheme is shown to conform to the framework and converge almost surely to the same fixed point as baseline GTD.

Research Background and Motivation

Problem Definition

Stochastic approximation algorithms constitute a class of iterative processes for finding function zeros when the true function is unknown but noisy observations are available. In numerous optimization and control problems under uncertainty, algorithms involving three or more timescale recursions naturally arise.

Research Significance

  1. Practical Demand: In reinforcement learning, multi-timescale algorithms naturally emerge in actor-critic algorithms for constrained Markov decision processes, hierarchical reinforcement learning, and related scenarios
  2. Theoretical Gap: Existing literature provides stability and convergence conditions only for single-timescale and two-timescale SA, lacking general theory for N>2 cases

Limitations of Existing Methods

  1. Stability Assumptions: Existing two-timescale analyses assume iterations remain stable (bounded), a non-trivial requirement
  2. Verification Difficulty: Only recently have conditions emerged to verify such stability requirements
  3. Restricted Applicability: Cannot handle complex algorithms with three or more timescales

Research Motivation

Provide the first set of sufficient conditions ensuring stability and convergence of general N-timescale stochastic recursions, filling theoretical gaps and supporting analysis of complex reinforcement learning algorithms.

Core Contributions

  1. Theoretical Breakthrough: Presents the first sufficient conditions guaranteeing stability and almost sure convergence of N-timescale SA iterations
  2. Method Extension: Generalizes Borkar-Meyn single-timescale and Lakshminarayanan-Bhatnagar two-timescale results to arbitrary N≥1
  3. Application Verification: Validates framework effectiveness in three important reinforcement learning scenarios:
    • Gradient temporal-difference (GTD) learning with momentum
    • Off-policy actor-critic algorithms
    • Constrained policy optimization
  4. Technical Innovation: Eliminates projection steps in actor updates, relying solely on convergence framework to ensure stability

Methodology Details

Task Definition

Consider N coupled stochastic recursions:

x^(j)_{n+1} = x^(j)_n + α^(j)_n (h^(j)(x^(1)_n, x^(2)_n, ..., x^(N)_n) + M^(j)_{n+1})

where j = 1, 2, ..., N, requiring:

  • Stability: sup_n ||x^(j)_n|| < ∞ a.s. ∀j
  • Convergence: x^(j)n → x^(j)* a.s. ∀j

Core Theoretical Framework

Basic Assumptions

(A:1) h^(j) is Lipschitz continuous
(A:2) {M^(j)_{n+1}} is a martingale difference sequence with bounded conditional expectations
(A:3) Step-size sequences satisfy:

  • α^(j)_n > 0, Σα^(j)_n = ∞
  • Σ(α^(j)_n)² < ∞
  • α^(j)_n/α^(j-1)_n → 0 (timescale separation)

Stability Condition (B.N.i)

For scaled functions h^(i)_c(x^(i), x^(i+1), ..., x^(N)) = h^(i)(cy^(1), cy^(2), ..., cy^(N))/c, require:

  1. h^(i)c → h^(i)∞ uniformly
  2. Limiting ODE ẋ^(i)(t) = h^(i)_∞(x^(i)(t), x^(i+1), ..., x^(N)) possesses unique globally asymptotically stable equilibrium
  3. Equilibrium point map λ^(i)_∞ is Lipschitz continuous

Convergence Condition (C.N.i)

Global asymptotic stability of the original ODE system under hierarchical fixed point structure.

Main Theorem

Theorem 1: Under assumptions (A:1)-(A:3), (B.N.i), and (C.N.i), the N-timescale iteration converges to the hierarchical fixed point:

x^(j)_n → x^(j)_* = λ^(j:N-1)(x^(N)_*)

Proof Strategy

  1. Stability-Convergence Decomposition: First prove convergence assuming boundedness, then establish stability
  2. Timescale Cascade: Analyze behavior starting from the fastest timescale, progressively through each scale
  3. Recursive Scaling Argument: Use scaled iteration comparison with original iteration to prove boundedness

Experimental Setup

Application Scenarios

1. GTD Learning with Momentum

  • Datasets: 5-State Random Walk, 7-state Boyan Chain
  • Algorithms: GTD2-M-3TS, TDC-M-3TS (three-timescale), GTD2-M-4TS, TDC-M-4TS (four-timescale)
  • Parameter Settings:
    • 5-State RW: α=0.4, β=0.4, ϱ^(1)=0.5, ϱ^(2)=0.25
    • Boyan Chain: α=0.35, β=0.35, ϱ^(1)=0.45, ϱ^(2)=0.35

2. Off-Policy Actor-Critic

  • Setup: Gibbs policy parameterization, importance sampling ratios
  • Update Rules: Critic (fastest), actor (intermediate), baseline (slowest) timescales

3. Constrained Actor-Critic

  • Problem: Average reward constrained optimization
  • Timescales: Critic (fastest), actor (intermediate), dual variable (slowest)

Evaluation Metrics

  • GTD: Root Mean Square Projected Bellman Error (RMSPBE)
  • Actor-Critic: Policy performance and convergence
  • Theoretical Verification: Almost sure convergence proof

Experimental Results

Main Findings

GTD Momentum Experiments

  1. Performance Improvement: Momentum versions outperform vanilla counterparts on both MDPs
  2. Convergence Verification: All algorithms converge to theoretically predicted fixed point θ* = -Ā^(-1)b̄
  3. Timescale Comparison:
    • GTD2: 4-TS scheme performs better
    • TDC: 3-TS version performs better

Theoretical Verification

  1. Stability: All three applications satisfy assumptions (B.N.i) and (C.N.i)
  2. Convergence: Proved almost sure convergence to expected hierarchical fixed points
  3. Projection-Free: Successfully eliminated projection operations in actor updates

Technical Findings

  1. Momentum Effect: Heavy-ball momentum significantly improves empirical convergence speed of GTD algorithms
  2. Framework Generality: Single theoretical framework successfully handles momentum acceleration, off-policy learning, and primal-dual methods
  3. Practical Value: Provides practical tools for verifying convergence of complex multi-timescale algorithms

Theoretical Foundations

  1. Single-Timescale SA: Borkar-Meyn (2000) ODE method and stability conditions
  2. Two-Timescale SA: Borkar (1997) convergence, Lakshminarayanan-Bhatnagar (2017) stability
  3. Reinforcement Learning Applications: Actor-critic algorithms, GTD methods, constrained MDPs

Advantages of This Work

  1. Theoretical Completeness: First complete stability and convergence theory for N-timescale case
  2. Practicality: Verifiable conditions applicable to actual algorithm design
  3. Breadth of Applications: Covers momentum methods, off-policy learning, constrained optimization and other important domains

Conclusions and Discussion

Main Conclusions

  1. Successfully establishes the first complete theoretical framework for N-timescale SA
  2. Proves stability and convergence for three important reinforcement learning algorithm classes
  3. Demonstrates theoretical feasibility of momentum techniques in temporal-difference learning

Limitations

  1. Markovian Noise: Current framework limited to martingale difference noise, not addressing more general Markovian noise
  2. Step-Size Requirements: Theoretical analysis requires square-summable step-sizes, though experiments show non-square-summable step-sizes also work
  3. Finite-Time Analysis: Lacks quantitative convergence rate analysis

Future Directions

  1. Extension to Markovian Noise: Generalize Ramaswamy-Bhatnagar single-timescale results
  2. Set-Valued Mappings: Handle RL algorithms under partial observation/information
  3. Convergence Rates: Develop weak convergence rate theory for N-timescale case
  4. Finite-Time Behavior: Quantify finite-time performance gains of momentum algorithms

In-Depth Evaluation

Strengths

  1. Theoretical Breakthrough: Fills important gap in multi-timescale SA theory with landmark significance
  2. Rigorous Methodology: Sophisticated proof techniques using innovative recursive scaling arguments
  3. Application Value: Three application scenarios possess important practical significance, demonstrating framework generality
  4. Clear Presentation: Well-structured exposition progressing from 3-timescale to N-timescale for accessibility

Weaknesses

  1. Assumption Limitations: i.i.d. sampling assumption is restrictive in practical RL contexts
  2. Experimental Scale: Experiments relatively simple, lacking verification in large-scale complex environments
  3. Computational Complexity: Does not discuss computational complexity of verifying theoretical conditions
  4. Practical Guidance: Lacks concrete guidance on selecting timescale separation parameters

Impact

  1. Theoretical Contribution: Provides solid theoretical foundation for multi-timescale algorithm design
  2. Practical Value: Enables convergence analysis of complex RL algorithms
  3. Inspirational: Likely to inspire further research in multi-timescale algorithms
  4. Reproducibility: Theoretical results are verifiable with clear experimental setup

Applicable Scenarios

  1. Constrained Reinforcement Learning: Scenarios requiring primal-dual updates
  2. Hierarchical Reinforcement Learning: Multi-level decision-making requiring different timescales
  3. Momentum Acceleration Methods: Provides theoretical support for momentum techniques in RL
  4. Algorithm Design: Offers convergence verification tool for new multi-timescale algorithms

References

This paper primarily builds upon the following important works:

  1. Borkar, V.S. (2008). Stochastic Approximation: A Dynamical Systems Viewpoint
  2. Lakshminarayanan, C. & Bhatnagar, S. (2017). A stability criterion for two-timescale stochastic approximation schemes
  3. Sutton, R. et al. (2009). Fast gradient-descent methods for temporal-difference learning with linear function approximation

Overall Assessment: This is a theoretically significant paper that successfully resolves stability and convergence issues in multi-timescale stochastic approximation, providing powerful theoretical tools for analyzing complex algorithms in reinforcement learning and related fields. While there remains room for improvement in practical applicability assumptions, its theoretical contributions and methodological innovations have lasting impact.