2025-11-18T22:10:13.514792

Time-Varying Optimization for Streaming Data Via Temporal Weighting

Abrar, Michelusi, Larsson

Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.

academic

Time-Varying Optimization for Streaming Data Via Temporal Weighting

Basic Information

Paper ID: 2510.13052
Title: Time-Varying Optimization for Streaming Data Via Temporal Weighting
Authors: Muhammad Faraz Ul Abrar (Arizona State University), Nicolò Michelusi (Arizona State University), Erik G. Larsson (Linköping University)
Classification: cs.LG cs.AI cs.SY eess.SP eess.SY math.OC
Publication Date: October 15, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2510.13052

Abstract

Traditional optimization theory addresses fixed, time-invariant objective functions. However, time-varying optimization has become an important topic for decision-making in dynamic environments. This paper investigates the streaming data learning problem from the perspective of time-varying optimization. Unlike prior work focusing on generic formulations, we introduce a structured weight-based formulation that explicitly captures the streaming data sources of time-varying objectives, where an agent aims to minimize the weighted average loss of all past data samples at each time step. We focus on two specific weighting strategies: (1) uniform weights, treating all samples equally; (2) discounted weights, with geometric decay of older data's influence. For both schemes, we derive tight bounds on "tracking error" (TE) under gradient descent (GD) updates, where TE is defined as the deviation between model parameters and the time-varying optimal solution at a given time step. We prove that under uniform weighting, TE asymptotically vanishes at a decay rate of O(1/t), while discounted weighting produces a non-zero error lower bound controlled by the discount factor and the number of gradient updates performed at each time step.

Research Background and Motivation

Problem Definition

The core problem addressed in this paper is time-varying optimization learning in streaming data environments. Specifically:

Limitations of Traditional Optimization: Classical machine learning optimizes static objective functions, assuming static data distributions, but real-world solutions operate in dynamically evolving environments
Streaming Data Challenges: Data arrives sequentially, objective functions evolve over time, resulting in non-stationary optimization problems
Computational Constraints: In real-time or resource-limited settings, only a limited number of updates can be performed at each time step

Significance

This problem is important in multiple critical application domains:

Mobile robot tracking in autonomous vehicles
Moving target localization
Portfolio optimization
Risk management in volatile financial markets
Controller adaptation to time-varying system dynamics

Limitations of Existing Methods

Loose Bounds in Generic Formulations: Most existing work focuses on generic time-varying formulations, ignoring the inherent structure of streaming data, potentially leading to loose tracking error bounds
Lack of Structured Analysis: Existing methods do not explicitly exploit the weight structure of streaming data to obtain tighter performance bounds
Theory-Practice Gap: Methods in continual learning are mostly empirical, lacking theoretical foundations

Core Contributions

Structured Weight Formulation: Introduces time-varying objective functions that explicitly capture streaming data structure, defined as weighted averages of losses from all past samples
Theoretical Analysis of Two Weighting Strategies:
- Uniform weights: Proves tracking error vanishes asymptotically at O(1/t) rate
- Discounted weights: Derives explicit non-zero asymptotic tracking error bounds
Tight Bound Derivation: Exploits streaming data structure to obtain tighter TE bounds than existing generic time-varying analyses
Theory and Experimental Validation: Numerical simulations verify the validity of theoretical findings

Methodology Details

Task Definition

Consider a single agent (e.g., edge or cloud server) aiming to track time-varying machine learning model parameters in a learning setting:

Input: At each iteration t≥1, the agent receives a new data sample (x_t, y_t)
Output: Model parameters w_t that minimize the weighted average loss of accumulated data
Constraint: Only E gradient updates can be performed at each time step

Core Mathematical Formulation

Time-Varying Objective Function: $w_t^* = \arg\min_{w \in \mathbb{R}^d} F_t(w), \quad \text{where} \quad F_t(w) = \sum_{i=1}^t a_i(t)f_i(w)$

Where:

$a_i(t)$ is the weight of the i-th sample at time t
$f_i(w)$ is the loss function of the i-th data sample
Weights satisfy: $0 \leq a_i(t) \leq 1$ and $\sum_{i=1}^t a_i(t) = 1$

Gradient Descent Update: $w_{t,k+1} = w_{t,k} - \eta\nabla F_{t+1}(w_{t,k}) = w_{t,k} - \eta\sum_{i=1}^{t+1} a_i(t+1)\nabla f_i(w_{t,k})$

Tracking Error Definition: $\text{TE}(t) = \|w_t - w_t^*\|$

Two Weighting Strategies

1. Uniform Weights

Set $a_i(t) = 1/t$ for all $i = 1, \ldots, t$ , the objective function becomes: $F_{t+1}(w) = \frac{t}{t+1}F_t(w) + \frac{1}{t+1}f_{t+1}(w)$

2. Discounted Weights

Use geometric discounting: $a_i(t) = \frac{1-\gamma}{1-\gamma^t}\gamma^{t-i}$ , where $0 < \gamma < 1$ is the discount factor.

Technical Innovations

Structured Analysis: Unlike generic time-varying optimization, explicitly exploits the weight structure of streaming data
Minimizer Drift Analysis: Understands objective function changes by analyzing $\|w_{i+1}^* - w_i^*\|$
Recursive Error Analysis: Establishes recursive relationships to track error evolution

Theoretical Analysis

Fundamental Assumptions

Assumption 1 (L-Smoothness and μ-Strong Convexity): Each data sample's loss function satisfies:

$\|\nabla f_t(x) - \nabla f_t(y)\| \leq L\|x-y\|$
$f_t(y) \geq f_t(x) + \nabla f_t(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2$

Assumption 2 (Bounded Minimizers): There exists $C > 0$ such that $\|w_t^*\| \leq C$ for all t.

Main Theoretical Results

Tracking Error for Uniform Weights

Proposition 1: For uniform weights, the tracking error satisfies: $\text{TE}(t) \leq \alpha^t\|w_0 - w_1^*\| + \frac{C'A}{t}$

Where $\alpha = (1-\eta\mu)^E < 1$ , $C' = (1+\sqrt{L/\mu})\frac{LC}{\mu}$ .

Key Conclusion: TE decays at rate O(1/t), with asymptotic tracking error approaching zero.

Tracking Error for Discounted Weights

Proposition 2: For discounted weights, the asymptotic tracking error is: $\text{ATE}_\gamma = \limsup_{t\to\infty} \|w_t - w_t^*\| \leq \left(1+\sqrt{\frac{L}{\mu}}\right)\frac{LC}{\mu} \cdot \frac{(1-\gamma)\alpha}{1-\alpha}$

Key Conclusion: A non-zero error lower bound exists, controlled by the discount factor γ and the number of gradient updates E.

Experimental Setup

Data Generation

Uses scalar quadratic loss functions: $f_t(w) = \frac{\mu}{2}(w-c_t)^2$

Parameter settings:

$c_t$ generated by bounded random walk: $c_{t+1} = \max(-C_{\max}, \min(c_t + z_{t+1}, C_{\max}))$
$z_t \sim \mathcal{N}(0, \sigma^2)$ , $C_{\max} = 100$ , $\sigma^2 = 100$ , $\mu = 0.1$

Evaluation Metrics

Root mean square tracking error
Maximum (worst-case) tracking error
Statistical results from 1000 independent runs

Experimental Results

Uniform Weights Results

O(1/t) Decay Verification: Experiments clearly demonstrate monotonic decay consistent with theoretical predictions
Impact of Gradient Updates: Increasing E from 10 to 20 improves empirical TE by approximately 0.09 factor, quantitatively matching theoretical predictions
Long-term Behavior: For large t, TE is dominated by residual error from minimizer drift

Discounted Weights Results

Non-Zero Error Lower Bound: All error metrics converge to non-vanishing asymptotic error bounds
Discount Factor Impact: Larger γ produces lower asymptotic tracking error
Theory Verification: When γ=0.99, TE approaches the uniform weights case, validating theoretical analysis

Gradient Complexity

Proposition 3: To ensure $\text{ATE}_\gamma \leq \epsilon$ , the following number of gradient updates must be performed: $E \geq \frac{\ln\left(\frac{\epsilon}{C'(1-\gamma)+\epsilon}\right)}{\ln(1-\eta\mu)}$ resulting in O(ln(1/ε)) gradient iteration complexity.

Time-Varying Optimization

Early Work: Newton-type algorithms exploiting second-order information achieving exponential convergence
Bounded Drift Conditions: Methods assuming $\|w_{t+1}^* - w_t^*\| \leq C$
Prediction-Correction Schemes: Methods combining prediction and gradient correction

Continual Learning

Task Sequence Learning: Learning ML models on task sequences
Catastrophic Forgetting: Challenge of performance degradation on old tasks when learning new ones
Empirical Methods: Existing methods are primarily empirical, lacking theoretical foundations

Uniform Weights: Achieve O(1/t) decay in tracking error with asymptotically perfect tracking
Discounted Weights: Produce quantifiable non-zero asymptotic error, reflecting forgetting of older data
Structured Analysis: Exploiting streaming data structure yields tighter bounds than generic methods

Theoretical Insights

Uniform vs. Discounted: Uniform weights dilute each new sample's influence to O(1/t), while discounted weights maintain O(1) drift
Weight Convergence: As γ→1, discounted weights converge to uniform weights, correspondingly ATE_γ→0
Computational Budget Trade-off: More gradient updates E reduce tracking error but increase computational cost

Limitations

Memory Assumption: Assumes access to all historical sample gradients, not considering memory constraints
Specific Loss Functions: Theoretical analysis based on L-smoothness and μ-strong convexity assumptions
Bounded Minimizers: Requires assumption that minimizers are uniformly bounded

Future Directions

Memory-Constrained Analysis: Study time-varying learning under memory constraints
More General Loss Functions: Extend to non-convex or other types of losses
Distributed Settings: Applications in distributed environments such as federated learning
Adaptive Weights: Study data-driven dynamic weighting strategies

In-Depth Evaluation

Strengths

Theoretical Rigor: Provides complete mathematical analysis and tight bound derivations
Structured Approach: Explicitly exploits streaming data structure, obtaining more precise results than generic methods
Practical Value: Two weighting strategies correspond to different real-world application scenarios
Experimental Validation: Numerical results highly consistent with theoretical predictions
Clear Presentation: Well-organized paper with clear mathematical derivations

Weaknesses

Restrictive Assumptions: L-smoothness and μ-strong convexity assumptions may be overly strict for practical applications
Memory Requirements: Requires storing all historical gradients, impractical for large-scale applications
Single Agent: Only considers single-agent settings, not addressing multi-agent or distributed scenarios
Simple Experiments: Experiments use simple quadratic loss functions, lacking verification on complex scenarios

Impact

Theoretical Contribution: Provides important theoretical foundations for time-varying optimization and continual learning
Methodological Value: Structured analysis approach can be generalized to other time-varying learning problems
Practical Application: Provides theoretical guidance for online learning and adaptive system design
Reproducibility: Detailed description of theoretical results and experimental setup facilitates reproduction

Applicable Scenarios

Online Learning Systems: Machine learning systems requiring continuous adaptation to new data
Adaptive Control: Controller design for time-varying systems
Financial Modeling: Investment strategies requiring adaptation to market changes
IoT Applications: Real-time data processing in sensor networks
Recommendation Systems: Recommendation algorithms adapting to changing user preferences

References

The paper cites 40 related references covering important works in time-varying optimization, continual learning, and convex optimization, providing a solid theoretical foundation for the research.