2025-11-18T22:10:13.514792

Time-Varying Optimization for Streaming Data Via Temporal Weighting

Abrar, Michelusi, Larsson
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
academic

Time-Varying Optimization for Streaming Data Via Temporal Weighting

Basic Information

  • Paper ID: 2510.13052
  • Title: Time-Varying Optimization for Streaming Data Via Temporal Weighting
  • Authors: Muhammad Faraz Ul Abrar (Arizona State University), Nicolò Michelusi (Arizona State University), Erik G. Larsson (Linköping University)
  • Classification: cs.LG cs.AI cs.SY eess.SP eess.SY math.OC
  • Publication Date: October 15, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.13052

Abstract

Traditional optimization theory addresses fixed, time-invariant objective functions. However, time-varying optimization has become an important topic for decision-making in dynamic environments. This paper investigates the streaming data learning problem from the perspective of time-varying optimization. Unlike prior work focusing on generic formulations, we introduce a structured weight-based formulation that explicitly captures the streaming data sources of time-varying objectives, where an agent aims to minimize the weighted average loss of all past data samples at each time step. We focus on two specific weighting strategies: (1) uniform weights, treating all samples equally; (2) discounted weights, with geometric decay of older data's influence. For both schemes, we derive tight bounds on "tracking error" (TE) under gradient descent (GD) updates, where TE is defined as the deviation between model parameters and the time-varying optimal solution at a given time step. We prove that under uniform weighting, TE asymptotically vanishes at a decay rate of O(1/t), while discounted weighting produces a non-zero error lower bound controlled by the discount factor and the number of gradient updates performed at each time step.

Research Background and Motivation

Problem Definition

The core problem addressed in this paper is time-varying optimization learning in streaming data environments. Specifically:

  1. Limitations of Traditional Optimization: Classical machine learning optimizes static objective functions, assuming static data distributions, but real-world solutions operate in dynamically evolving environments
  2. Streaming Data Challenges: Data arrives sequentially, objective functions evolve over time, resulting in non-stationary optimization problems
  3. Computational Constraints: In real-time or resource-limited settings, only a limited number of updates can be performed at each time step

Significance

This problem is important in multiple critical application domains:

  • Mobile robot tracking in autonomous vehicles
  • Moving target localization
  • Portfolio optimization
  • Risk management in volatile financial markets
  • Controller adaptation to time-varying system dynamics

Limitations of Existing Methods

  1. Loose Bounds in Generic Formulations: Most existing work focuses on generic time-varying formulations, ignoring the inherent structure of streaming data, potentially leading to loose tracking error bounds
  2. Lack of Structured Analysis: Existing methods do not explicitly exploit the weight structure of streaming data to obtain tighter performance bounds
  3. Theory-Practice Gap: Methods in continual learning are mostly empirical, lacking theoretical foundations

Core Contributions

  1. Structured Weight Formulation: Introduces time-varying objective functions that explicitly capture streaming data structure, defined as weighted averages of losses from all past samples
  2. Theoretical Analysis of Two Weighting Strategies:
    • Uniform weights: Proves tracking error vanishes asymptotically at O(1/t) rate
    • Discounted weights: Derives explicit non-zero asymptotic tracking error bounds
  3. Tight Bound Derivation: Exploits streaming data structure to obtain tighter TE bounds than existing generic time-varying analyses
  4. Theory and Experimental Validation: Numerical simulations verify the validity of theoretical findings

Methodology Details

Task Definition

Consider a single agent (e.g., edge or cloud server) aiming to track time-varying machine learning model parameters in a learning setting:

  • Input: At each iteration t≥1, the agent receives a new data sample (x_t, y_t)
  • Output: Model parameters w_t that minimize the weighted average loss of accumulated data
  • Constraint: Only E gradient updates can be performed at each time step

Core Mathematical Formulation

Time-Varying Objective Function: wt=argminwRdFt(w),whereFt(w)=i=1tai(t)fi(w)w_t^* = \arg\min_{w \in \mathbb{R}^d} F_t(w), \quad \text{where} \quad F_t(w) = \sum_{i=1}^t a_i(t)f_i(w)

Where:

  • ai(t)a_i(t) is the weight of the i-th sample at time t
  • fi(w)f_i(w) is the loss function of the i-th data sample
  • Weights satisfy: 0ai(t)10 \leq a_i(t) \leq 1 and i=1tai(t)=1\sum_{i=1}^t a_i(t) = 1

Gradient Descent Update: wt,k+1=wt,kηFt+1(wt,k)=wt,kηi=1t+1ai(t+1)fi(wt,k)w_{t,k+1} = w_{t,k} - \eta\nabla F_{t+1}(w_{t,k}) = w_{t,k} - \eta\sum_{i=1}^{t+1} a_i(t+1)\nabla f_i(w_{t,k})

Tracking Error Definition: TE(t)=wtwt\text{TE}(t) = \|w_t - w_t^*\|

Two Weighting Strategies

1. Uniform Weights

Set ai(t)=1/ta_i(t) = 1/t for all i=1,,ti = 1, \ldots, t, the objective function becomes: Ft+1(w)=tt+1Ft(w)+1t+1ft+1(w)F_{t+1}(w) = \frac{t}{t+1}F_t(w) + \frac{1}{t+1}f_{t+1}(w)

2. Discounted Weights

Use geometric discounting: ai(t)=1γ1γtγtia_i(t) = \frac{1-\gamma}{1-\gamma^t}\gamma^{t-i}, where 0<γ<10 < \gamma < 1 is the discount factor.

Technical Innovations

  1. Structured Analysis: Unlike generic time-varying optimization, explicitly exploits the weight structure of streaming data
  2. Minimizer Drift Analysis: Understands objective function changes by analyzing wi+1wi\|w_{i+1}^* - w_i^*\|
  3. Recursive Error Analysis: Establishes recursive relationships to track error evolution

Theoretical Analysis

Fundamental Assumptions

Assumption 1 (L-Smoothness and μ-Strong Convexity): Each data sample's loss function satisfies:

  • ft(x)ft(y)Lxy\|\nabla f_t(x) - \nabla f_t(y)\| \leq L\|x-y\|
  • ft(y)ft(x)+ft(x)T(yx)+μ2yx2f_t(y) \geq f_t(x) + \nabla f_t(x)^T(y-x) + \frac{\mu}{2}\|y-x\|^2

Assumption 2 (Bounded Minimizers): There exists C>0C > 0 such that wtC\|w_t^*\| \leq C for all t.

Main Theoretical Results

Tracking Error for Uniform Weights

Proposition 1: For uniform weights, the tracking error satisfies: TE(t)αtw0w1+CAt\text{TE}(t) \leq \alpha^t\|w_0 - w_1^*\| + \frac{C'A}{t}

Where α=(1ημ)E<1\alpha = (1-\eta\mu)^E < 1, C=(1+L/μ)LCμC' = (1+\sqrt{L/\mu})\frac{LC}{\mu}.

Key Conclusion: TE decays at rate O(1/t), with asymptotic tracking error approaching zero.

Tracking Error for Discounted Weights

Proposition 2: For discounted weights, the asymptotic tracking error is: ATEγ=lim suptwtwt(1+Lμ)LCμ(1γ)α1α\text{ATE}_\gamma = \limsup_{t\to\infty} \|w_t - w_t^*\| \leq \left(1+\sqrt{\frac{L}{\mu}}\right)\frac{LC}{\mu} \cdot \frac{(1-\gamma)\alpha}{1-\alpha}

Key Conclusion: A non-zero error lower bound exists, controlled by the discount factor γ and the number of gradient updates E.

Experimental Setup

Data Generation

Uses scalar quadratic loss functions: ft(w)=μ2(wct)2f_t(w) = \frac{\mu}{2}(w-c_t)^2

Parameter settings:

  • ctc_t generated by bounded random walk: ct+1=max(Cmax,min(ct+zt+1,Cmax))c_{t+1} = \max(-C_{\max}, \min(c_t + z_{t+1}, C_{\max}))
  • ztN(0,σ2)z_t \sim \mathcal{N}(0, \sigma^2), Cmax=100C_{\max} = 100, σ2=100\sigma^2 = 100, μ=0.1\mu = 0.1

Evaluation Metrics

  • Root mean square tracking error
  • Maximum (worst-case) tracking error
  • Statistical results from 1000 independent runs

Experimental Results

Uniform Weights Results

  • O(1/t) Decay Verification: Experiments clearly demonstrate monotonic decay consistent with theoretical predictions
  • Impact of Gradient Updates: Increasing E from 10 to 20 improves empirical TE by approximately 0.09 factor, quantitatively matching theoretical predictions
  • Long-term Behavior: For large t, TE is dominated by residual error from minimizer drift

Discounted Weights Results

  • Non-Zero Error Lower Bound: All error metrics converge to non-vanishing asymptotic error bounds
  • Discount Factor Impact: Larger γ produces lower asymptotic tracking error
  • Theory Verification: When γ=0.99, TE approaches the uniform weights case, validating theoretical analysis

Gradient Complexity

Proposition 3: To ensure ATEγϵ\text{ATE}_\gamma \leq \epsilon, the following number of gradient updates must be performed: Eln(ϵC(1γ)+ϵ)ln(1ημ)E \geq \frac{\ln\left(\frac{\epsilon}{C'(1-\gamma)+\epsilon}\right)}{\ln(1-\eta\mu)} resulting in O(ln(1/ε)) gradient iteration complexity.

Time-Varying Optimization

  • Early Work: Newton-type algorithms exploiting second-order information achieving exponential convergence
  • Bounded Drift Conditions: Methods assuming wt+1wtC\|w_{t+1}^* - w_t^*\| \leq C
  • Prediction-Correction Schemes: Methods combining prediction and gradient correction

Continual Learning

  • Task Sequence Learning: Learning ML models on task sequences
  • Catastrophic Forgetting: Challenge of performance degradation on old tasks when learning new ones
  • Empirical Methods: Existing methods are primarily empirical, lacking theoretical foundations

Uniqueness of This Work's Contribution

This paper is the first to theoretically bridge time-varying optimization and continual learning, providing explicit analysis of streaming data structure.

Conclusions and Discussion

Main Conclusions

  1. Uniform Weights: Achieve O(1/t) decay in tracking error with asymptotically perfect tracking
  2. Discounted Weights: Produce quantifiable non-zero asymptotic error, reflecting forgetting of older data
  3. Structured Analysis: Exploiting streaming data structure yields tighter bounds than generic methods

Theoretical Insights

  • Uniform vs. Discounted: Uniform weights dilute each new sample's influence to O(1/t), while discounted weights maintain O(1) drift
  • Weight Convergence: As γ→1, discounted weights converge to uniform weights, correspondingly ATE_γ→0
  • Computational Budget Trade-off: More gradient updates E reduce tracking error but increase computational cost

Limitations

  1. Memory Assumption: Assumes access to all historical sample gradients, not considering memory constraints
  2. Specific Loss Functions: Theoretical analysis based on L-smoothness and μ-strong convexity assumptions
  3. Bounded Minimizers: Requires assumption that minimizers are uniformly bounded

Future Directions

  1. Memory-Constrained Analysis: Study time-varying learning under memory constraints
  2. More General Loss Functions: Extend to non-convex or other types of losses
  3. Distributed Settings: Applications in distributed environments such as federated learning
  4. Adaptive Weights: Study data-driven dynamic weighting strategies

In-Depth Evaluation

Strengths

  1. Theoretical Rigor: Provides complete mathematical analysis and tight bound derivations
  2. Structured Approach: Explicitly exploits streaming data structure, obtaining more precise results than generic methods
  3. Practical Value: Two weighting strategies correspond to different real-world application scenarios
  4. Experimental Validation: Numerical results highly consistent with theoretical predictions
  5. Clear Presentation: Well-organized paper with clear mathematical derivations

Weaknesses

  1. Restrictive Assumptions: L-smoothness and μ-strong convexity assumptions may be overly strict for practical applications
  2. Memory Requirements: Requires storing all historical gradients, impractical for large-scale applications
  3. Single Agent: Only considers single-agent settings, not addressing multi-agent or distributed scenarios
  4. Simple Experiments: Experiments use simple quadratic loss functions, lacking verification on complex scenarios

Impact

  1. Theoretical Contribution: Provides important theoretical foundations for time-varying optimization and continual learning
  2. Methodological Value: Structured analysis approach can be generalized to other time-varying learning problems
  3. Practical Application: Provides theoretical guidance for online learning and adaptive system design
  4. Reproducibility: Detailed description of theoretical results and experimental setup facilitates reproduction

Applicable Scenarios

  1. Online Learning Systems: Machine learning systems requiring continuous adaptation to new data
  2. Adaptive Control: Controller design for time-varying systems
  3. Financial Modeling: Investment strategies requiring adaptation to market changes
  4. IoT Applications: Real-time data processing in sensor networks
  5. Recommendation Systems: Recommendation algorithms adapting to changing user preferences

References

The paper cites 40 related references covering important works in time-varying optimization, continual learning, and convex optimization, providing a solid theoretical foundation for the research.