2025-11-16T15:01:13.538180

Age of Job Completion Minimization with Stable Queues

Mitrolaris, Banerjee, Ulukus

We consider a time-slotted job-assignment system with a central server, N users and a machine which changes its state according to a Markov chain (hence called a Markov machine). The users submit their jobs to the central server according to a stochastic job arrival process. For each user, the server has a dedicated job queue. Upon receiving a job from a user, the server stores that job in the corresponding queue. When the machine is not working on a job assigned by the server, the machine can be either in internally busy or in free state, and the dynamics of these states follow a binary symmetric Markov chain. Upon sampling the state information of the machine, if the server identifies that the machine is in the free state, it schedules a user and submits a job to the machine from the job queue of the scheduled user. To maximize the number of jobs completed per unit time, we introduce a new metric, referred to as the age of job completion. To minimize the age of job completion and the sampling cost, we propose two policies and numerically evaluate their performance. For both of these policies, we find sufficient conditions under which the job queues will remain stable.

academic

Age of Job Completion Minimization with Stable Queues

Basic Information

Paper ID: 2511.04630
Title: Age of Job Completion Minimization with Stable Queues
Authors: Stavros Mitrolaris, Subhankar Banerjee, Sennur Ulukus (University of Maryland, College Park)
Classification: cs.IT, cs.NI, cs.SY, eess.SP, eess.SY, math.IT, math.PR
Publication Date: November 6, 2025 (arXiv preprint)
Paper Link: https://arxiv.org/abs/2511.04630

Abstract

This paper investigates a slotted job assignment system comprising a central server, N users, and a machine whose state evolves according to a Markov chain (referred to as a Markov machine). Users submit jobs to the central server according to a stochastic job arrival process, and the server maintains dedicated job queues for each user. When the machine is not processing a job assigned by the server, it may be in an internal busy or idle state, with state dynamics following a binary symmetric Markov chain. The server samples machine state information and schedules users to submit jobs upon identifying machine idleness. To maximize the number of jobs completed per unit time, the authors introduce a novel metric called "Age of Job Completion." To minimize the age of job completion and sampling costs, two strategies are proposed and their performance is numerically evaluated. Sufficient conditions guaranteeing queue stability for both strategies are derived.

Research Background and Motivation

1. Problem Statement

This paper addresses the job offloading problem in edge computing scenarios where multiple users compete for a shared edge computing device (Markov machine). Core challenges include:

Uncertainty in machine state (idle/internally busy)
Cost of state sampling
Job scheduling under multi-user competition
Queue stability guarantees

2. Problem Significance

In critical mission control applications such as intelligent surveillance, edge devices are typically shared by multiple users or servers, each independently generating computational tasks. This is important because:

Job offloading processes are inherently stochastic
Edge device availability is highly uncertain
Efficient tracking of device operational status is needed
Timely task offloading is required to ensure system performance

3. Limitations of Existing Approaches

Existing literature has the following shortcomings:

AoII metric in 5: Does not directly target maximizing completed jobs
Binary Freshness (BF), False Rejection Rate (FRR), False Acceptance Rate (FAR) in 6: Similarly fail to directly capture the job completion maximization objective
Methods in 7,8,10: Do not consider infinite queues and lack queue stability analysis
Most research: Either lacks job queues or assumes finite queue capacity, unsuitable for practical long-term operation scenarios

4. Research Motivation

Introduce a metric that more directly reflects job completion efficiency
Consider practical scenarios with infinite queue capacity
Provide theoretical guarantees for queue stability
Balance job completion efficiency with sampling costs

Core Contributions

Proposes novel metric "Age of Job Completion": Directly reflects the number of jobs completed per unit time, more suitable for job offloading systems than existing AoII, BF, and other metrics
Designs two strategy pairs:
- Adaptive Randomized Policy (ϕ₁)
- Max-Age Policy with Adaptive Randomized Sampling (ϕ̄₁)
Theoretical stability analysis: Derives sufficient conditions for queue stability for both strategies (Propositions 1 and 2), providing the first stability results for Markov machine job offloading research
Closed-form expressions: Provides closed-form expressions for average age of job completion and sampling costs under permanent backlog conditions for fixed subsystems (Theorems 1-4)
Numerical evaluation: Validates strategy performance through numerical experiments, finding that the max-age policy significantly outperforms the adaptive randomized policy under high arrival rates

Methodology Details

Task Definition

System Model:

Input: N users, each submitting a job with probability pᵢ at the end of each slot (i.i.d. Bernoulli process)
State Space: Machine state x(t) ∈ {-1, 0, 1, ..., N, fr}
- -1: Internally busy
- 0: State unknown after job completion
- i ∈ {1,...,N}: Processing job from user i
- fr: Idle
Markov Dynamics: Transition probability between idle and internally busy states is q (binary symmetric)
Service Time: Job from user i follows geometric distribution with parameter qᵢ
After Job Completion: Transitions to internally busy with probability s, to idle with probability (1-s)

Decision Variables:

Sampling Decision μ(t) ∈ {0,1}: Whether to sample machine state at slot t (cost L)
Scheduling Decision π(t) = (π₁(t),...,πₙ(t)): Which user's job to select

Optimization Objective: Minimize total average cost: $\Delta^{\phi} + S^{\phi} = \frac{1}{N}\sum_{i=1}^{N}\Delta_i^{\phi} + S^{\phi}$

where:

$\Delta_i^{\phi}$ : Average age of job completion for user i
$S^{\phi}$ : Average sampling cost

Constraints: Queue stability (positive recurrent Markov chain)

Model Architecture

1. Adaptive Randomized Policy (ϕ₁)

Definition (Definition 1): The strategy is characterized by set Π = {(μ(S), π(S)) : S ⊆ N, S ≠ ∅}, where:

μ(S) ∈ (0,1]: Sampling probability when the set of non-empty queues is S
π(S) = (π₁(S),...,πₙ(S)): Scheduling probability distribution
- πᵢ(S) > 0 if and only if i ∈ S
- ∑ᵢ∈S πᵢ(S) = 1

Construction Method:

For each non-empty subset S ⊆ N, consider permanent backlog scenario
Use Theorems 1 and 2 to obtain closed-form expressions for total cost upper bounds
Solve optimization problem (12): $\min_{\phi} \sum_{k\in S}\Delta_k^{\phi}(S) + S_{ub}^{\phi}(S)$ subject to: μ ∈ (0,1), πₖ ∈ (0,1), ∑ₖ∈S πₖ = 1
Obtain locally optimal solution (μ*(S), π*(S))
Aggregate solutions from all subsets to form strategy set Πc

Key Formula (Theorem 1): Average age for user k: $\Delta_k^{\phi}(S) = \frac{1}{\left(\frac{s}{q}+2\left(\frac{1}{\mu}-1\right)+\bar{\eta}\right)\left(\frac{\psi_k^2}{\pi_k}+\left(\frac{1}{q_k}+\frac{1-s}{q}-2\right)\psi_k+\ldots\right)} + 1$

where $\bar{\eta} = \sum_{i\in S}\frac{\pi_i}{q_i}$ , $\psi_k = \eta_k + \pi_k + 2(\frac{1}{\mu}-1) + \frac{s}{q}$

Sampling Cost Upper Bound (Theorem 2): $S_{ub}^{\phi}(S) = \frac{(L+1)\mu}{p^*}\left(\frac{1}{\frac{1}{\mu}p^* + \bar{\eta}}\right)$

where $p^* = \frac{q}{1-(1-2q)(1-\mu)}$

2. Max-Age Policy with Adaptive Randomized Sampling (ϕ̄₁)

Scheduling Strategy: πᴹᴬ(S) selects the user with maximum job completion age in set S at each slot (equivalent to round-robin policy)

Sampling Strategy: Adaptive randomized sampling characterized by set Π̄c = ∪_{S⊆N,S≠∅}{μ̄*(S)}

Key Formula (Theorem 3): Average age for user k: $\Delta_k^{\bar{\phi}}(S) = \frac{N(\beta_2 - \beta_1^2) + \sum_{i\in S}\frac{1-q_i}{q_i^2}}{2\left(N\beta_1 + \sum_{i\in S}\frac{1-q_i}{q_i}\right)} + \frac{1}{2}\left(N\beta_1 + \sum_{i\in S}\frac{1-q_i}{q_i} + 1\right)$

where:

$\beta_1 = \frac{1}{\bar{\mu}}\left((1-\bar{\mu}) + \frac{s}{\bar{\mu}q} + 1\right)$
$\beta_2 = 2\frac{1-\bar{\mu}}{\bar{\mu}}\left(\frac{s}{1-\bar{\mu}}\alpha^2 + \alpha - s - \bar{\mu}(1-s) + 3\right) + \beta_1$
$\alpha = 1-\bar{\mu}+\frac{\bar{\mu}}{q}$

Sampling Cost (Theorem 4): $S^{\bar{\phi}}(S) = \frac{\bar{\mu}NL}{N\left((1-\bar{\mu}) + \frac{s\bar{\mu}}{q} + 1\right) + \bar{\mu}\sum_{i\in S}\frac{1-q_i}{q_i}}\cdot\frac{1}{\left(p_1^* + \frac{(1-p_1^*)(1+p^*)}{p^*}\right)}$

where $p_1^* = s(1-\bar{\mu})p^* + (1-s)(1-(1-\bar{\mu})p^*)$

Technical Innovations

Introduction of Age of Job Completion Metric:
- Definition: vᵢ(t) = t - sup{t' : t' < t, bᵢ(t') = 1} (time since last job completion)
- Advantage: Directly reflects job completion frequency, equivalent to maximizing jobs completed per unit time
- Distinction from AoII: AoII focuses on information correctness, while job completion age focuses on throughput
Adaptive Randomized Policy Design:
- Differs from traditional fixed-probability randomized policies
- Dynamically adjusts sampling and scheduling probabilities based on non-empty queue set
- Efficiently utilizes system resources (does not schedule empty queue users)
- Mathematically tractable (each subset corresponds to fixed randomized policy)
Permanent Backlog Analysis Method:
- Assumes queues in subset S remain permanently non-empty
- Derives closed-form cost expressions
- Obtains optimal strategy parameters for that subset through optimization
- Aggregates solutions from all subsets to form complete adaptive policy
Queue Stability Sufficient Conditions:
- Proposition 1: Exponential-level condition for adaptive randomized policy
- Corollary 1: Single but more conservative sufficient condition
- Proposition 2: Sufficient condition for max-age policy
- Introduces χ(q,s) function to characterize lower bound of machine idle probability

Experimental Setup

Numerical Experiment Parameters

System Configuration:

Number of users: N = 4
Machine parameters:
- State transition probability: q ∈ 0.1, 0.9 (varying parameter)
- Probability of internally busy after completion: s = 0.5 (some experiments) or s = 0.3
Sampling cost: L = 5
Service rate vector: q̄ = 0.1, 0.4, 0.6, 0.9 (some experiments) or other configurations

Arrival Rate Configuration:

Low arrival rate: p = 0.01, 0.02, 0.05, 0.06
High arrival rate: p̃ = 0.05, 0.2, 0.5, 0.6
Stability testing: Multiple configurations

Evaluation Metrics

Total Average Cost: Δ^φ + S^φ
- Includes average age of job completion and average sampling cost
- Primary performance metric
Queue Stability:
- Observed through numerical simulation of queue length process
- Validates theoretical stability conditions

Comparison Methods

ϕ₁: Adaptive randomized policy
ϕ̄₁: Max-age scheduling + adaptive randomized sampling

Implementation Details

Optimization problems (12) and (16) solved through numerical methods for local optima
Optimization performed over all 2^N - 1 non-empty subsets
Markov chain simulation used to evaluate strategy performance
Long-duration simulation to obtain steady-state behavior

Experimental Results

Main Results

Figure 4: Total Cost Variation with q

Under configuration N=4, s=0.5, L=5, q̄=0.1, 0.4, 0.6, 0.9:

Low Arrival Rate p = 0.01, 0.02, 0.05, 0.06:
- Both strategies perform similarly
- Total cost decreases as q increases
- Reason: Under low arrival rates, both work-conserving strategies effectively handle jobs
High Arrival Rate p̃ = 0.05, 0.2, 0.5, 0.6:
- ϕ̄₁ (max-age policy) significantly outperforms ϕ₁ (adaptive randomized policy)
- Clear performance gap (approximately 10-20 units)
- Both strategies' costs decrease as q increases
- Reason: Under high load, deterministic round-robin is more efficient than random scheduling
Trend Analysis:
- Larger q (faster state transitions) yields better system performance
- Strategy selection is more critical under high arrival rates

Queue Stability Experiments

Case 1: Unstable

Parameters: N=4, q=0.35, s=0.3, L=5
Service rates: q̄ = 0.55, 0.73, 0.84, 0.91
Arrival rates: p = 0.09, 0.09, 0.12, 0.14
Result: Conditions in Propositions 1 and 2 not satisfied; numerical verification confirms queue instability
Implication: Arrival rates are too high relative to service rates

Case 2: Stable but Conditions Not Satisfied

Parameters: N=4, q=0.5, s=0.5, L=5
Service rates: q̄ = 0.4, 0.6, 0.8, 0.94
Arrival rates: p = 0.04, 0.05, 0.06, 0.06
Result: Sufficient conditions not satisfied, but numerical results show queues stable under both strategies
Implication: Proposed sufficient conditions are conservative (sufficient but not necessary)

Experimental Findings

Strategy Performance:
- Max-age policy shows clear advantages under high load
- Strategy differences are minimal under low load
- Both strategies are work-conserving
Stability Conditions:
- Queues are stable only when user arrival rates are significantly lower than service rates
- Theoretical sufficient conditions are conservative
- Cases exist where conditions are not satisfied but systems remain stable
System Parameter Impact:
- State transition probability q significantly affects performance
- Arrival rate configuration determines the importance of strategy selection

Markov Machine Tracking

5 AoII Metric:
- Introduces AoII metric for Markov machines
- Focuses on tracking performance rather than job completion
- This paper's metric more directly targets throughput
6 Multi-Machine Networks:
- Uses Binary Freshness (BF), False Rejection Rate (FRR), False Acceptance Rate (FAR)
- Does not consider job queues
- This paper considers queue stability
9 Fatigued Workers:
- Studies scenarios where worker efficiency depends on state
- Optimizes sampling rate allocation
- Does not consider queue dynamics

Job Assignment and Queuing

8 Revenue Maximization:
- Single buffer (stores at most one job)
- This paper considers infinite queues
10 MDP Approach:
- Discounted MDP framework
- Finite queues; oldest job replaced when queue is full
- Does not directly maximize average completed jobs
7 No-Queue Scenario:
- Jobs discarded when machine is busy
- Maximizes acceptance probability
- This paper ensures acceptance probability of 1 (sample before submit)

Advantages of This Work

First to provide queue stability theoretical results for Markov machine job offloading
Considers practical infinite queue scenarios
Introduces more suitable age of job completion metric
Provides closed-form performance expressions

Conclusions and Discussion

Main Conclusions

Metric Innovation: Age of job completion effectively captures throughput objectives in job offloading systems
Strategy Design:
- Adaptive randomized policy constructed through subsystem optimization
- Max-age policy achieves better performance under high load
- Both strategies can guarantee queue stability
Stability Theory:
- Provides sufficient conditions (Propositions 1-2)
- Conditions relate arrival rates, service rates, and machine parameters
- Sufficient but not necessary (exhibits conservatism)
Performance Insights:
- Strategy differences are minimal under low load
- Deterministic scheduling outperforms random scheduling under high load
- Machine state transition speed significantly affects performance

Limitations

Symmetry Assumptions:
- Currently only considers binary symmetric Markov chains (equal transition probabilities)
- Real systems may be asymmetric
Conservative Stability Conditions:
- Sufficient conditions are strict
- Requires checking exponential-level (2^N - 1) conditions
- Single condition (Corollary 1) is more conservative
Local Optimality:
- Optimization problems (12) and (16) only find local optima
- Better solutions may exist
Lack of Necessity Analysis:
- No necessary conditions for stability provided
- Gap between sufficient and necessary conditions not quantified
Proof Omission:
- All proofs deferred to journal version due to space constraints
- Affects result verifiability

Future Directions

Asymmetric Markov Chains: Extend to general state transition probabilities
Necessary Conditions: Derive necessary conditions for queue stability to narrow the gap between sufficient and necessary conditions
Global Optimality: Study global optimal solutions or approximation algorithms for optimization problems
Heterogeneous Users: Consider user priorities and different QoS requirements
Multi-Machine Scenarios: Extend to machine networks
Real System Validation: Test on actual edge computing platforms

In-Depth Evaluation

Strengths

Significant Theoretical Contributions:
- First to provide queue stability theory for Markov machine job offloading
- Closed-form performance expressions (Theorems 1-4) have theoretical value
- Mathematical derivations are rigorous (though proofs are omitted)
Reasonable Metric Design:
- Age of job completion is intuitive and effective
- Directly corresponds to throughput maximization objective
- More suitable for job offloading than existing AoII, BF metrics
Innovative Strategy Design:
- Adaptive randomized policy balances flexibility and analyzability
- Max-age policy is simple and efficient
- Two strategies cover both randomized and deterministic approaches
Practical Problem Modeling:
- Accounts for sampling costs
- Infinite queues better reflect long-term system operation
- Markov machine model suitable for edge computing
Reasonable Experimental Design:
- Compares different load scenarios
- Validates stability theory
- Reveals conservatism of sufficient conditions

Weaknesses

Missing Proofs:
- All theorem and proposition proofs are omitted
- Severely impacts result verifiability and credibility
- Readers cannot understand derivation approaches
Insufficient Experiments:
- Only considers small-scale systems with N=4 users
- Lacks scalability analysis for large systems
- No quantitative comparison with other literature methods
- Missing statistical significance tests
Unclear Optimization Methods:
- Does not explain how to numerically solve (12) and (16)
- Local optima may affect strategy performance
- Computational complexity not discussed
Incomplete Stability Analysis:
- Only provides sufficient conditions, no necessary conditions
- Tightness of conditions not analyzed
- Gap between sufficient and necessary conditions not quantified
Restrictive Assumptions:
- Binary symmetric Markov chain assumption is strong
- Geometric service time distribution may not match reality
- Does not consider communication delays, transmission errors, and other practical factors
Writing Issues:
- Some notation definitions lack clarity (e.g., xa(t) used infrequently)
- Figures 2 and 3 could have more detailed explanations
- Missing algorithm pseudocode

Impact

Theoretical Impact:
- Establishes stability theory framework for Markov machine job offloading
- Age of job completion metric may be adopted by subsequent work
- Adaptive randomized strategy design approach is inspirational
Practical Value:
- Applicable to edge computing job offloading scenarios
- Strategy design can guide practical systems
- Stability conditions provide design guidelines
Limitations:
- Requires complete proofs for full assessment
- Small-scale experiments limit persuasiveness
- Practical deployment requires addressing more engineering issues

Applicable Scenarios

Edge Computing:
- Multiple users sharing edge servers
- Scenarios requiring state sampling
- Jobs can queue for service
Cloud Computing Resource Scheduling:
- Virtual machine state uncertainty
- Multi-tenant resource competition
IoT Task Offloading:
- Randomly changing device states
- Non-zero sampling costs
Inapplicable Scenarios:
- Extremely stringent real-time requirements (queue waiting unacceptable)
- Completely observable states (no sampling cost)
- Non-queueable jobs (must be processed immediately or discarded)

References

This paper primarily references the following key works:

5 Banerjee & Ulukus (2025): Markov machine tracking and job assignment, introduces AoII metric
6 Liyanaarachchi & Ulukus (2025): Optimal monitoring and job assignment for multiple Markov machines
10 Chamoun et al. (2025): Edge server monitoring using MAPPO, MDP approach
11 Kadota et al. (2018): Scheduling policies minimizing information age in broadcast wireless networks
12 Tassiulas & Ephremides (1990): Stability properties of constrained queuing systems
13 Neely (2010): Stochastic network optimization, strong stability definitions

Overall Assessment: This paper makes important theoretical contributions to the Markov machine job offloading domain, particularly by providing the first queue stability analysis. The introduction of the age of job completion metric is innovative, and both strategy designs are reasonable. However, the omission of proofs, limited experimental scale, and conservatism of stability conditions are notable shortcomings. The authors are encouraged to promptly provide a complete journal version including detailed proofs, large-scale experiments, and necessary condition analysis to fully demonstrate the work's value.