Time-Varying Optimization for Streaming Data Via Temporal Weighting
Abrar, Michelusi, Larsson
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
academic
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Traditional optimization theory addresses fixed, time-invariant objective functions. However, time-varying optimization has become an important topic for decision-making in dynamic environments. This paper investigates the streaming data learning problem from the perspective of time-varying optimization. Unlike prior work focusing on generic formulations, we introduce a structured weight-based formulation that explicitly captures the streaming data sources of time-varying objectives, where an agent aims to minimize the weighted average loss of all past data samples at each time step. We focus on two specific weighting strategies: (1) uniform weights, treating all samples equally; (2) discounted weights, with geometric decay of older data's influence. For both schemes, we derive tight bounds on "tracking error" (TE) under gradient descent (GD) updates, where TE is defined as the deviation between model parameters and the time-varying optimal solution at a given time step. We prove that under uniform weighting, TE asymptotically vanishes at a decay rate of O(1/t), while discounted weighting produces a non-zero error lower bound controlled by the discount factor and the number of gradient updates performed at each time step.
The core problem addressed in this paper is time-varying optimization learning in streaming data environments. Specifically:
Limitations of Traditional Optimization: Classical machine learning optimizes static objective functions, assuming static data distributions, but real-world solutions operate in dynamically evolving environments
Streaming Data Challenges: Data arrives sequentially, objective functions evolve over time, resulting in non-stationary optimization problems
Computational Constraints: In real-time or resource-limited settings, only a limited number of updates can be performed at each time step
Loose Bounds in Generic Formulations: Most existing work focuses on generic time-varying formulations, ignoring the inherent structure of streaming data, potentially leading to loose tracking error bounds
Lack of Structured Analysis: Existing methods do not explicitly exploit the weight structure of streaming data to obtain tighter performance bounds
Theory-Practice Gap: Methods in continual learning are mostly empirical, lacking theoretical foundations
Structured Weight Formulation: Introduces time-varying objective functions that explicitly capture streaming data structure, defined as weighted averages of losses from all past samples
Theoretical Analysis of Two Weighting Strategies:
Uniform weights: Proves tracking error vanishes asymptotically at O(1/t) rate
Impact of Gradient Updates: Increasing E from 10 to 20 improves empirical TE by approximately 0.09 factor, quantitatively matching theoretical predictions
Long-term Behavior: For large t, TE is dominated by residual error from minimizer drift
Proposition 3: To ensure ATEγ≤ϵ, the following number of gradient updates must be performed:
E≥ln(1−ημ)ln(C′(1−γ)+ϵϵ)
resulting in O(ln(1/ε)) gradient iteration complexity.
This paper is the first to theoretically bridge time-varying optimization and continual learning, providing explicit analysis of streaming data structure.
The paper cites 40 related references covering important works in time-varying optimization, continual learning, and convex optimization, providing a solid theoretical foundation for the research.