Multi Timescale Stochastic Approximation: Stability and Convergence
Deb, Ganesh, Bhatnagar
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
academic
Multi Timescale Stochastic Approximation: Stability and Convergence
Title: Multi Timescale Stochastic Approximation: Stability and Convergence
Authors: Rohan Deb (University of Illinois, Urbana-Champaign), Swetha Ganesh (Purdue University, West Lafayette), Shalabh Bhatnagar (Indian Institute of Science, Bengaluru)
Classification: eess.SY cs.SY
Publication Date: October 16, 2025 (arXiv version)
This paper presents the first sufficient conditions guaranteeing stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterations. The work extends existing single-timescale and two-timescale SA results to general N-timescale stochastic recursions for arbitrary N≥1, employing the ordinary differential equation (ODE) method. As an application, the authors investigate SA algorithms with heavy-ball momentum enhancement in gradient temporal-difference (GTD) learning. The added momentum introduces auxiliary states evolving on an intermediate timescale, yielding three-timescale recursions. Under appropriate momentum parameters, the scheme is shown to conform to the framework and converge almost surely to the same fixed point as baseline GTD.
Stochastic approximation algorithms constitute a class of iterative processes for finding function zeros when the true function is unknown but noisy observations are available. In numerous optimization and control problems under uncertainty, algorithms involving three or more timescale recursions naturally arise.
Practical Demand: In reinforcement learning, multi-timescale algorithms naturally emerge in actor-critic algorithms for constrained Markov decision processes, hierarchical reinforcement learning, and related scenarios
Theoretical Gap: Existing literature provides stability and convergence conditions only for single-timescale and two-timescale SA, lacking general theory for N>2 cases
Provide the first set of sufficient conditions ensuring stability and convergence of general N-timescale stochastic recursions, filling theoretical gaps and supporting analysis of complex reinforcement learning algorithms.
(A:1) h^(j) is Lipschitz continuous (A:2) {M^(j)_{n+1}} is a martingale difference sequence with bounded conditional expectations (A:3) Step-size sequences satisfy:
This paper primarily builds upon the following important works:
Borkar, V.S. (2008). Stochastic Approximation: A Dynamical Systems Viewpoint
Lakshminarayanan, C. & Bhatnagar, S. (2017). A stability criterion for two-timescale stochastic approximation schemes
Sutton, R. et al. (2009). Fast gradient-descent methods for temporal-difference learning with linear function approximation
Overall Assessment: This is a theoretically significant paper that successfully resolves stability and convergence issues in multi-timescale stochastic approximation, providing powerful theoretical tools for analyzing complex algorithms in reinforcement learning and related fields. While there remains room for improvement in practical applicability assumptions, its theoretical contributions and methodological innovations have lasting impact.