2025-11-16T15:01:13.538180

Age of Job Completion Minimization with Stable Queues

Mitrolaris, Banerjee, Ulukus
We consider a time-slotted job-assignment system with a central server, N users and a machine which changes its state according to a Markov chain (hence called a Markov machine). The users submit their jobs to the central server according to a stochastic job arrival process. For each user, the server has a dedicated job queue. Upon receiving a job from a user, the server stores that job in the corresponding queue. When the machine is not working on a job assigned by the server, the machine can be either in internally busy or in free state, and the dynamics of these states follow a binary symmetric Markov chain. Upon sampling the state information of the machine, if the server identifies that the machine is in the free state, it schedules a user and submits a job to the machine from the job queue of the scheduled user. To maximize the number of jobs completed per unit time, we introduce a new metric, referred to as the age of job completion. To minimize the age of job completion and the sampling cost, we propose two policies and numerically evaluate their performance. For both of these policies, we find sufficient conditions under which the job queues will remain stable.
academic

Age of Job Completion Minimization with Stable Queues

Basic Information

  • Paper ID: 2511.04630
  • Title: Age of Job Completion Minimization with Stable Queues
  • Authors: Stavros Mitrolaris, Subhankar Banerjee, Sennur Ulukus (University of Maryland, College Park)
  • Classification: cs.IT, cs.NI, cs.SY, eess.SP, eess.SY, math.IT, math.PR
  • Publication Date: November 6, 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2511.04630

Abstract

This paper investigates a slotted job assignment system comprising a central server, N users, and a machine whose state evolves according to a Markov chain (referred to as a Markov machine). Users submit jobs to the central server according to a stochastic job arrival process, and the server maintains dedicated job queues for each user. When the machine is not processing a job assigned by the server, it may be in an internal busy or idle state, with state dynamics following a binary symmetric Markov chain. The server samples machine state information and schedules users to submit jobs upon identifying machine idleness. To maximize the number of jobs completed per unit time, the authors introduce a novel metric called "Age of Job Completion." To minimize the age of job completion and sampling costs, two strategies are proposed and their performance is numerically evaluated. Sufficient conditions guaranteeing queue stability for both strategies are derived.

Research Background and Motivation

1. Problem Statement

This paper addresses the job offloading problem in edge computing scenarios where multiple users compete for a shared edge computing device (Markov machine). Core challenges include:

  • Uncertainty in machine state (idle/internally busy)
  • Cost of state sampling
  • Job scheduling under multi-user competition
  • Queue stability guarantees

2. Problem Significance

In critical mission control applications such as intelligent surveillance, edge devices are typically shared by multiple users or servers, each independently generating computational tasks. This is important because:

  • Job offloading processes are inherently stochastic
  • Edge device availability is highly uncertain
  • Efficient tracking of device operational status is needed
  • Timely task offloading is required to ensure system performance

3. Limitations of Existing Approaches

Existing literature has the following shortcomings:

  • AoII metric in 5: Does not directly target maximizing completed jobs
  • Binary Freshness (BF), False Rejection Rate (FRR), False Acceptance Rate (FAR) in 6: Similarly fail to directly capture the job completion maximization objective
  • Methods in 7,8,10: Do not consider infinite queues and lack queue stability analysis
  • Most research: Either lacks job queues or assumes finite queue capacity, unsuitable for practical long-term operation scenarios

4. Research Motivation

  • Introduce a metric that more directly reflects job completion efficiency
  • Consider practical scenarios with infinite queue capacity
  • Provide theoretical guarantees for queue stability
  • Balance job completion efficiency with sampling costs

Core Contributions

  1. Proposes novel metric "Age of Job Completion": Directly reflects the number of jobs completed per unit time, more suitable for job offloading systems than existing AoII, BF, and other metrics
  2. Designs two strategy pairs:
    • Adaptive Randomized Policy (ϕ₁)
    • Max-Age Policy with Adaptive Randomized Sampling (ϕ̄₁)
  3. Theoretical stability analysis: Derives sufficient conditions for queue stability for both strategies (Propositions 1 and 2), providing the first stability results for Markov machine job offloading research
  4. Closed-form expressions: Provides closed-form expressions for average age of job completion and sampling costs under permanent backlog conditions for fixed subsystems (Theorems 1-4)
  5. Numerical evaluation: Validates strategy performance through numerical experiments, finding that the max-age policy significantly outperforms the adaptive randomized policy under high arrival rates

Methodology Details

Task Definition

System Model:

  • Input: N users, each submitting a job with probability pᵢ at the end of each slot (i.i.d. Bernoulli process)
  • State Space: Machine state x(t) ∈ {-1, 0, 1, ..., N, fr}
    • -1: Internally busy
    • 0: State unknown after job completion
    • i ∈ {1,...,N}: Processing job from user i
    • fr: Idle
  • Markov Dynamics: Transition probability between idle and internally busy states is q (binary symmetric)
  • Service Time: Job from user i follows geometric distribution with parameter qᵢ
  • After Job Completion: Transitions to internally busy with probability s, to idle with probability (1-s)

Decision Variables:

  • Sampling Decision μ(t) ∈ {0,1}: Whether to sample machine state at slot t (cost L)
  • Scheduling Decision π(t) = (π₁(t),...,πₙ(t)): Which user's job to select

Optimization Objective: Minimize total average cost: Δϕ+Sϕ=1Ni=1NΔiϕ+Sϕ\Delta^{\phi} + S^{\phi} = \frac{1}{N}\sum_{i=1}^{N}\Delta_i^{\phi} + S^{\phi}

where:

  • Δiϕ\Delta_i^{\phi}: Average age of job completion for user i
  • SϕS^{\phi}: Average sampling cost

Constraints: Queue stability (positive recurrent Markov chain)

Model Architecture

1. Adaptive Randomized Policy (ϕ₁)

Definition (Definition 1): The strategy is characterized by set Π = {(μ(S), π(S)) : S ⊆ N, S ≠ ∅}, where:

  • μ(S) ∈ (0,1]: Sampling probability when the set of non-empty queues is S
  • π(S) = (π₁(S),...,πₙ(S)): Scheduling probability distribution
    • πᵢ(S) > 0 if and only if i ∈ S
    • ∑ᵢ∈S πᵢ(S) = 1

Construction Method:

  1. For each non-empty subset S ⊆ N, consider permanent backlog scenario
  2. Use Theorems 1 and 2 to obtain closed-form expressions for total cost upper bounds
  3. Solve optimization problem (12): minϕkSΔkϕ(S)+Subϕ(S)\min_{\phi} \sum_{k\in S}\Delta_k^{\phi}(S) + S_{ub}^{\phi}(S) subject to: μ ∈ (0,1), πₖ ∈ (0,1), ∑ₖ∈S πₖ = 1
  4. Obtain locally optimal solution (μ*(S), π*(S))
  5. Aggregate solutions from all subsets to form strategy set Πc

Key Formula (Theorem 1): Average age for user k: Δkϕ(S)=1(sq+2(1μ1)+ηˉ)(ψk2πk+(1qk+1sq2)ψk+)+1\Delta_k^{\phi}(S) = \frac{1}{\left(\frac{s}{q}+2\left(\frac{1}{\mu}-1\right)+\bar{\eta}\right)\left(\frac{\psi_k^2}{\pi_k}+\left(\frac{1}{q_k}+\frac{1-s}{q}-2\right)\psi_k+\ldots\right)} + 1

where ηˉ=iSπiqi\bar{\eta} = \sum_{i\in S}\frac{\pi_i}{q_i}, ψk=ηk+πk+2(1μ1)+sq\psi_k = \eta_k + \pi_k + 2(\frac{1}{\mu}-1) + \frac{s}{q}

Sampling Cost Upper Bound (Theorem 2): Subϕ(S)=(L+1)μp(11μp+ηˉ)S_{ub}^{\phi}(S) = \frac{(L+1)\mu}{p^*}\left(\frac{1}{\frac{1}{\mu}p^* + \bar{\eta}}\right)

where p=q1(12q)(1μ)p^* = \frac{q}{1-(1-2q)(1-\mu)}

2. Max-Age Policy with Adaptive Randomized Sampling (ϕ̄₁)

Scheduling Strategy: πᴹᴬ(S) selects the user with maximum job completion age in set S at each slot (equivalent to round-robin policy)

Sampling Strategy: Adaptive randomized sampling characterized by set Π̄c = ∪_{S⊆N,S≠∅}{μ̄*(S)}

Key Formula (Theorem 3): Average age for user k: Δkϕˉ(S)=N(β2β12)+iS1qiqi22(Nβ1+iS1qiqi)+12(Nβ1+iS1qiqi+1)\Delta_k^{\bar{\phi}}(S) = \frac{N(\beta_2 - \beta_1^2) + \sum_{i\in S}\frac{1-q_i}{q_i^2}}{2\left(N\beta_1 + \sum_{i\in S}\frac{1-q_i}{q_i}\right)} + \frac{1}{2}\left(N\beta_1 + \sum_{i\in S}\frac{1-q_i}{q_i} + 1\right)

where:

  • β1=1μˉ((1μˉ)+sμˉq+1)\beta_1 = \frac{1}{\bar{\mu}}\left((1-\bar{\mu}) + \frac{s}{\bar{\mu}q} + 1\right)
  • β2=21μˉμˉ(s1μˉα2+αsμˉ(1s)+3)+β1\beta_2 = 2\frac{1-\bar{\mu}}{\bar{\mu}}\left(\frac{s}{1-\bar{\mu}}\alpha^2 + \alpha - s - \bar{\mu}(1-s) + 3\right) + \beta_1
  • α=1μˉ+μˉq\alpha = 1-\bar{\mu}+\frac{\bar{\mu}}{q}

Sampling Cost (Theorem 4): Sϕˉ(S)=μˉNLN((1μˉ)+sμˉq+1)+μˉiS1qiqi1(p1+(1p1)(1+p)p)S^{\bar{\phi}}(S) = \frac{\bar{\mu}NL}{N\left((1-\bar{\mu}) + \frac{s\bar{\mu}}{q} + 1\right) + \bar{\mu}\sum_{i\in S}\frac{1-q_i}{q_i}}\cdot\frac{1}{\left(p_1^* + \frac{(1-p_1^*)(1+p^*)}{p^*}\right)}

where p1=s(1μˉ)p+(1s)(1(1μˉ)p)p_1^* = s(1-\bar{\mu})p^* + (1-s)(1-(1-\bar{\mu})p^*)

Technical Innovations

  1. Introduction of Age of Job Completion Metric:
    • Definition: vᵢ(t) = t - sup{t' : t' < t, bᵢ(t') = 1} (time since last job completion)
    • Advantage: Directly reflects job completion frequency, equivalent to maximizing jobs completed per unit time
    • Distinction from AoII: AoII focuses on information correctness, while job completion age focuses on throughput
  2. Adaptive Randomized Policy Design:
    • Differs from traditional fixed-probability randomized policies
    • Dynamically adjusts sampling and scheduling probabilities based on non-empty queue set
    • Efficiently utilizes system resources (does not schedule empty queue users)
    • Mathematically tractable (each subset corresponds to fixed randomized policy)
  3. Permanent Backlog Analysis Method:
    • Assumes queues in subset S remain permanently non-empty
    • Derives closed-form cost expressions
    • Obtains optimal strategy parameters for that subset through optimization
    • Aggregates solutions from all subsets to form complete adaptive policy
  4. Queue Stability Sufficient Conditions:
    • Proposition 1: Exponential-level condition for adaptive randomized policy
    • Corollary 1: Single but more conservative sufficient condition
    • Proposition 2: Sufficient condition for max-age policy
    • Introduces χ(q,s) function to characterize lower bound of machine idle probability

Experimental Setup

Numerical Experiment Parameters

System Configuration:

  • Number of users: N = 4
  • Machine parameters:
    • State transition probability: q ∈ 0.1, 0.9 (varying parameter)
    • Probability of internally busy after completion: s = 0.5 (some experiments) or s = 0.3
  • Sampling cost: L = 5
  • Service rate vector: q̄ = 0.1, 0.4, 0.6, 0.9 (some experiments) or other configurations

Arrival Rate Configuration:

  • Low arrival rate: p = 0.01, 0.02, 0.05, 0.06
  • High arrival rate: p̃ = 0.05, 0.2, 0.5, 0.6
  • Stability testing: Multiple configurations

Evaluation Metrics

  1. Total Average Cost: Δ^φ + S^φ
    • Includes average age of job completion and average sampling cost
    • Primary performance metric
  2. Queue Stability:
    • Observed through numerical simulation of queue length process
    • Validates theoretical stability conditions

Comparison Methods

  • ϕ₁: Adaptive randomized policy
  • ϕ̄₁: Max-age scheduling + adaptive randomized sampling

Implementation Details

  • Optimization problems (12) and (16) solved through numerical methods for local optima
  • Optimization performed over all 2^N - 1 non-empty subsets
  • Markov chain simulation used to evaluate strategy performance
  • Long-duration simulation to obtain steady-state behavior

Experimental Results

Main Results

Figure 4: Total Cost Variation with q

Under configuration N=4, s=0.5, L=5, q̄=0.1, 0.4, 0.6, 0.9:

  1. Low Arrival Rate p = 0.01, 0.02, 0.05, 0.06:
    • Both strategies perform similarly
    • Total cost decreases as q increases
    • Reason: Under low arrival rates, both work-conserving strategies effectively handle jobs
  2. High Arrival Rate p̃ = 0.05, 0.2, 0.5, 0.6:
    • ϕ̄₁ (max-age policy) significantly outperforms ϕ₁ (adaptive randomized policy)
    • Clear performance gap (approximately 10-20 units)
    • Both strategies' costs decrease as q increases
    • Reason: Under high load, deterministic round-robin is more efficient than random scheduling
  3. Trend Analysis:
    • Larger q (faster state transitions) yields better system performance
    • Strategy selection is more critical under high arrival rates

Queue Stability Experiments

Case 1: Unstable

  • Parameters: N=4, q=0.35, s=0.3, L=5
  • Service rates: q̄ = 0.55, 0.73, 0.84, 0.91
  • Arrival rates: p = 0.09, 0.09, 0.12, 0.14
  • Result: Conditions in Propositions 1 and 2 not satisfied; numerical verification confirms queue instability
  • Implication: Arrival rates are too high relative to service rates

Case 2: Stable but Conditions Not Satisfied

  • Parameters: N=4, q=0.5, s=0.5, L=5
  • Service rates: q̄ = 0.4, 0.6, 0.8, 0.94
  • Arrival rates: p = 0.04, 0.05, 0.06, 0.06
  • Result: Sufficient conditions not satisfied, but numerical results show queues stable under both strategies
  • Implication: Proposed sufficient conditions are conservative (sufficient but not necessary)

Experimental Findings

  1. Strategy Performance:
    • Max-age policy shows clear advantages under high load
    • Strategy differences are minimal under low load
    • Both strategies are work-conserving
  2. Stability Conditions:
    • Queues are stable only when user arrival rates are significantly lower than service rates
    • Theoretical sufficient conditions are conservative
    • Cases exist where conditions are not satisfied but systems remain stable
  3. System Parameter Impact:
    • State transition probability q significantly affects performance
    • Arrival rate configuration determines the importance of strategy selection

Markov Machine Tracking

  1. 5 AoII Metric:
    • Introduces AoII metric for Markov machines
    • Focuses on tracking performance rather than job completion
    • This paper's metric more directly targets throughput
  2. 6 Multi-Machine Networks:
    • Uses Binary Freshness (BF), False Rejection Rate (FRR), False Acceptance Rate (FAR)
    • Does not consider job queues
    • This paper considers queue stability
  3. 9 Fatigued Workers:
    • Studies scenarios where worker efficiency depends on state
    • Optimizes sampling rate allocation
    • Does not consider queue dynamics

Job Assignment and Queuing

  1. 8 Revenue Maximization:
    • Single buffer (stores at most one job)
    • This paper considers infinite queues
  2. 10 MDP Approach:
    • Discounted MDP framework
    • Finite queues; oldest job replaced when queue is full
    • Does not directly maximize average completed jobs
  3. 7 No-Queue Scenario:
    • Jobs discarded when machine is busy
    • Maximizes acceptance probability
    • This paper ensures acceptance probability of 1 (sample before submit)

Advantages of This Work

  • First to provide queue stability theoretical results for Markov machine job offloading
  • Considers practical infinite queue scenarios
  • Introduces more suitable age of job completion metric
  • Provides closed-form performance expressions

Conclusions and Discussion

Main Conclusions

  1. Metric Innovation: Age of job completion effectively captures throughput objectives in job offloading systems
  2. Strategy Design:
    • Adaptive randomized policy constructed through subsystem optimization
    • Max-age policy achieves better performance under high load
    • Both strategies can guarantee queue stability
  3. Stability Theory:
    • Provides sufficient conditions (Propositions 1-2)
    • Conditions relate arrival rates, service rates, and machine parameters
    • Sufficient but not necessary (exhibits conservatism)
  4. Performance Insights:
    • Strategy differences are minimal under low load
    • Deterministic scheduling outperforms random scheduling under high load
    • Machine state transition speed significantly affects performance

Limitations

  1. Symmetry Assumptions:
    • Currently only considers binary symmetric Markov chains (equal transition probabilities)
    • Real systems may be asymmetric
  2. Conservative Stability Conditions:
    • Sufficient conditions are strict
    • Requires checking exponential-level (2^N - 1) conditions
    • Single condition (Corollary 1) is more conservative
  3. Local Optimality:
    • Optimization problems (12) and (16) only find local optima
    • Better solutions may exist
  4. Lack of Necessity Analysis:
    • No necessary conditions for stability provided
    • Gap between sufficient and necessary conditions not quantified
  5. Proof Omission:
    • All proofs deferred to journal version due to space constraints
    • Affects result verifiability

Future Directions

  1. Asymmetric Markov Chains: Extend to general state transition probabilities
  2. Necessary Conditions: Derive necessary conditions for queue stability to narrow the gap between sufficient and necessary conditions
  3. Global Optimality: Study global optimal solutions or approximation algorithms for optimization problems
  4. Heterogeneous Users: Consider user priorities and different QoS requirements
  5. Multi-Machine Scenarios: Extend to machine networks
  6. Real System Validation: Test on actual edge computing platforms

In-Depth Evaluation

Strengths

  1. Significant Theoretical Contributions:
    • First to provide queue stability theory for Markov machine job offloading
    • Closed-form performance expressions (Theorems 1-4) have theoretical value
    • Mathematical derivations are rigorous (though proofs are omitted)
  2. Reasonable Metric Design:
    • Age of job completion is intuitive and effective
    • Directly corresponds to throughput maximization objective
    • More suitable for job offloading than existing AoII, BF metrics
  3. Innovative Strategy Design:
    • Adaptive randomized policy balances flexibility and analyzability
    • Max-age policy is simple and efficient
    • Two strategies cover both randomized and deterministic approaches
  4. Practical Problem Modeling:
    • Accounts for sampling costs
    • Infinite queues better reflect long-term system operation
    • Markov machine model suitable for edge computing
  5. Reasonable Experimental Design:
    • Compares different load scenarios
    • Validates stability theory
    • Reveals conservatism of sufficient conditions

Weaknesses

  1. Missing Proofs:
    • All theorem and proposition proofs are omitted
    • Severely impacts result verifiability and credibility
    • Readers cannot understand derivation approaches
  2. Insufficient Experiments:
    • Only considers small-scale systems with N=4 users
    • Lacks scalability analysis for large systems
    • No quantitative comparison with other literature methods
    • Missing statistical significance tests
  3. Unclear Optimization Methods:
    • Does not explain how to numerically solve (12) and (16)
    • Local optima may affect strategy performance
    • Computational complexity not discussed
  4. Incomplete Stability Analysis:
    • Only provides sufficient conditions, no necessary conditions
    • Tightness of conditions not analyzed
    • Gap between sufficient and necessary conditions not quantified
  5. Restrictive Assumptions:
    • Binary symmetric Markov chain assumption is strong
    • Geometric service time distribution may not match reality
    • Does not consider communication delays, transmission errors, and other practical factors
  6. Writing Issues:
    • Some notation definitions lack clarity (e.g., xa(t) used infrequently)
    • Figures 2 and 3 could have more detailed explanations
    • Missing algorithm pseudocode

Impact

  1. Theoretical Impact:
    • Establishes stability theory framework for Markov machine job offloading
    • Age of job completion metric may be adopted by subsequent work
    • Adaptive randomized strategy design approach is inspirational
  2. Practical Value:
    • Applicable to edge computing job offloading scenarios
    • Strategy design can guide practical systems
    • Stability conditions provide design guidelines
  3. Limitations:
    • Requires complete proofs for full assessment
    • Small-scale experiments limit persuasiveness
    • Practical deployment requires addressing more engineering issues

Applicable Scenarios

  1. Edge Computing:
    • Multiple users sharing edge servers
    • Scenarios requiring state sampling
    • Jobs can queue for service
  2. Cloud Computing Resource Scheduling:
    • Virtual machine state uncertainty
    • Multi-tenant resource competition
  3. IoT Task Offloading:
    • Randomly changing device states
    • Non-zero sampling costs
  4. Inapplicable Scenarios:
    • Extremely stringent real-time requirements (queue waiting unacceptable)
    • Completely observable states (no sampling cost)
    • Non-queueable jobs (must be processed immediately or discarded)

References

This paper primarily references the following key works:

  1. 5 Banerjee & Ulukus (2025): Markov machine tracking and job assignment, introduces AoII metric
  2. 6 Liyanaarachchi & Ulukus (2025): Optimal monitoring and job assignment for multiple Markov machines
  3. 10 Chamoun et al. (2025): Edge server monitoring using MAPPO, MDP approach
  4. 11 Kadota et al. (2018): Scheduling policies minimizing information age in broadcast wireless networks
  5. 12 Tassiulas & Ephremides (1990): Stability properties of constrained queuing systems
  6. 13 Neely (2010): Stochastic network optimization, strong stability definitions

Overall Assessment: This paper makes important theoretical contributions to the Markov machine job offloading domain, particularly by providing the first queue stability analysis. The introduction of the age of job completion metric is innovative, and both strategy designs are reasonable. However, the omission of proofs, limited experimental scale, and conservatism of stability conditions are notable shortcomings. The authors are encouraged to promptly provide a complete journal version including detailed proofs, large-scale experiments, and necessary condition analysis to fully demonstrate the work's value.