2025-11-17T14:19:12.162044

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

Chakraborty, Asrar, Sengupta et al.
5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while maximizing Network Service Provider (NSP) profits. However, existing Deep Reinforcement Learning (DRL) frameworks focus primarily on profit optimization without explicitly accounting for service delay, potentially leading to QoS violations for latency-sensitive slices. Moreover, commonly used epsilon-greedy exploration of DRL often results in unstable convergence and suboptimal policy learning. To address these gaps, we propose DePSAC -- a Delay and Profit-aware Slice Admission Control scheme. Our DRL-based approach incorporates a delay-aware reward function, where penalties due to service delay incentivize the prioritization of latency-critical slices such as URLLC. Additionally, we employ Boltzmann exploration to achieve smoother and faster convergence. We implement and evaluate DePSAC on a simulated 5G core network substrate with realistic Network Slice Request (NSLR) arrival patterns. Experimental results demonstrate that our method outperforms the DSARA baseline in terms of overall profit, reduced URLLC slice delays, improved acceptance rates, and improved resource consumption. These findings validate the effectiveness of the proposed DePSAC in achieving better QoS-profit trade-offs for practical 5G network slicing scenarios.
academic

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

Basic Information

  • Paper ID: 2510.08769
  • Title: Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices
  • Authors: Proggya Chakraborty, Aaquib Asrar, Jayasree Sengupta, Sipra Das Bit
  • Categories: cs.NI (Networking and Internet Architecture), cs.LG (Machine Learning), cs.PF (Performance)
  • Submission Date: October 9, 2025 to arXiv
  • Paper Link: https://arxiv.org/abs/2510.08769v1

Abstract

This paper proposes DePSAC (Delay and Profit-aware Slice Admission Control), a deep reinforcement learning-based solution for admission control in 5G network slicing. The scheme simultaneously maximizes network service provider (NSP) profit while explicitly considering service latency, with particular emphasis on prioritizing ultra-reliable low-latency communication (URLLC) slices. The approach employs a delay-aware reward function and Boltzmann exploration strategy, validated on a simulated 5G core network demonstrating improvements over the baseline DSARA method in profit, latency, acceptance rate, and resource consumption.

Research Background and Motivation

Problem Definition

5G networks support diverse services through network slicing technology, including enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC). These services have heterogeneous QoS requirements, necessitating intelligent admission control and resource allocation strategies to balance strict QoS demands with NSP profitability.

Problem Significance

  1. Service Diversity Challenge: Different slice types have varying requirements for latency, reliability, and bandwidth
  2. Resource Optimization Need: Limited physical resources must be efficiently allocated across multiple virtual networks
  3. Business Viability: NSPs must ensure profitability while meeting QoS requirements

Limitations of Existing Methods

  1. Neglect of Latency Factors: Existing DRL frameworks primarily focus on profit optimization without explicitly considering service latency
  2. Unstable Exploration Strategies: Epsilon-greedy exploration leads to unstable convergence and suboptimal policy learning
  3. QoS Violation Risk: Latency-sensitive services (e.g., URLLC) may experience QoS violations

Research Motivation

While the baseline DSARA method effectively maximizes profit, it fails to account for latency differences across slice types, potentially causing QoS violations. This work aims to develop a slice admission control scheme that simultaneously considers both latency and profit.

Core Contributions

  1. Delay-Aware Reward Function: Proposes a profit-delay-aware reward formula balancing QoS requirements and NSP profitability
  2. Boltzmann Exploration Strategy: Integrates Boltzmann exploration into the DRL agent, improving learning stability and avoiding local optima inherent to epsilon-greedy methods
  3. Comprehensive Experimental Evaluation: Implements DePSAC on a simulated 5G core network using realistic network slice request arrival patterns
  4. Performance Improvement Verification: Experimental results validate DePSAC's improvements in profit-QoS trade-offs, achieving shorter service latency, higher acceptance rates, and lower bandwidth utilization

Methodology Details

Task Definition

Input: Network slice request (NSLR) stream containing slice type, resource requirements, and runtime Output: Admission decisions and resource allocation policies Objective: Maximize NSP profit while minimizing service latency, particularly for URLLC slices

Model Architecture

System Architecture

Adopts the DeepSARA framework architecture with four main modules:

  1. Admission Control Module (ACM): Uses DRL agent to assign priority weights for slice types
  2. Resource Allocation Module (RAM): Maps VNFs to nodes based on availability and QoS constraints
  3. Monitoring Module: Continuously collects resource state data
  4. Lifecycle Module: Instantiates accepted slices and releases resources upon expiration

5G Core Network Substrate

  • Modeled as NFV infrastructure (NFVI) containing core nodes (high capacity) and edge nodes (low latency)
  • Represented as weighted undirected graph SN = {N,L}, where nodes N have CPU capacity and links L have bandwidth

Delay-Aware Reward Function

The core innovation of DePSAC is the delay-aware reward function:

penaltyi = priorityi × delayi                    (1a)
profiti = (revenuei - costi) × To               (1b)
reward(nsli) = profiti - penaltyi               (1c)
R = Σ(i=0 to k) reward(nsli) / maxProfit(SN,T) (1d)

Where:

  • priorityi: Priority level determined by slice type (URLLC > eMBB > mMTC)
  • delayi: Time interval from NSL request i arrival to service
  • To: Slice runtime
  • revenuei and costi: Revenue and operational costs

Boltzmann Exploration Strategy

Replaces epsilon-greedy with Boltzmann exploration:

P(a) = e^(Q[s,a]/τ) × Q[s,a] / Σ(a) e^(Q[s,a]/τ)   (2)

Where τ is the temperature parameter controlling exploration diversity. High τ encourages exploration, low τ promotes exploitation.

Technical Innovations

  1. Latency Penalty Mechanism: Introduces latency penalty terms in the reward function, incentivizing the agent to prioritize latency-sensitive slices
  2. Smooth Exploration Strategy: Boltzmann exploration selects actions based on Q-value probability distributions, avoiding purely random or greedy behavior
  3. Multi-Objective Optimization: Simultaneously considers profit maximization and latency minimization, achieving better QoS-profit trade-offs

Experimental Setup

Dataset

  • Substrate Network: 64-node Barabási-Albert topology capturing scale-free properties of real 5G infrastructure
  • Slice Requests: Dynamically generated NSLRs containing three service types (eMBB, URLLC, mMTC)
  • Arrival Pattern: Realistic network slice request arrival patterns

Evaluation Metrics

  1. Profit: Total revenue NSP obtains from serving network slice requests minus operational costs
  2. Acceptance Rate (AR): Proportion of successfully admitted NSLRs, AR = req_a / req_t
  3. Latency: Service time after request arrival, Delay = T_finished - T_arrival
  4. Resource Consumption (C): Proportion of processing and bandwidth resources allocated to accepted slices

Comparison Methods

  • Baseline: DSARA method, a DRL-based joint admission control and resource allocation framework

Implementation Details

  • Development Environment: Python 3, modular object-oriented design
  • Hardware Platform: AMD Ryzen 5 processor, 16GB RAM, Windows 11
  • Graph Processing: NetworkX library for managing substrate network and NSLR graph representations
  • Simulator: Discrete-event simulator integrated with delay-aware DRL agent

Experimental Results

Main Results

Profit Performance

  • Overall Profit: DePSAC shows slightly lower profit than DSARA during early training due to exploration, but consistently outperforms baseline as training progresses
  • Categorical Profit: Profit improvements across all service types (eMBB, URLLC, mMTC), with URLLC showing most significant gains

Latency Performance

  • Overall Latency: DePSAC achieves lower average latency compared to DSARA
  • URLLC Latency: Significant latency reduction relative to DSARA, validating effective prioritization of time-critical slices
  • Other Service Types: mMTC latency shows moderate but continuous reduction; eMBB latency converges to below-baseline values after exploration phase

Acceptance Rate Performance

  • Overall Acceptance Rate: DePSAC eventually surpasses DSARA by faster request servicing and resource release, allowing more requests to be accepted
  • URLLC Acceptance Rate: Significantly improved, reflecting agent's learned prioritization of latency-sensitive requests
  • eMBB Acceptance Rate: Moderately increased
  • mMTC Acceptance Rate: Slight decrease but within acceptable range

Resource Consumption Performance

  • Overall Consumption: DePSAC demonstrates slight resource consumption reduction in later training stages
  • Bandwidth Efficiency: Total bandwidth usage reduced due to prioritizing URLLC slices with lower resource requirements
  • CPU Utilization: Remains consistent or shows slight improvement

Ablation Studies

The paper validates the effectiveness of delay-aware reward function and Boltzmann exploration through comparison with DSARA, though detailed component-level ablation analysis is not provided.

Experimental Findings

  1. Latency-Profit Balance: Latency penalties do not harm profitability; the agent learns to effectively balance and even improve NSP revenue maximization
  2. Service Differentiation: Successfully achieves prioritization of latency-sensitive services while maintaining performance for other service types
  3. Resource Efficiency: Achieves more compact and latency-efficient embeddings through intelligent admission decisions
  4. Convergence Stability: Boltzmann exploration promotes smoother and more stable convergence

Main Research Directions

  1. Queuing Theory-Based Slicing: Han et al. propose utility-driven multi-service slicing methods
  2. Big Data Analytics Prediction: Raza et al. leverage traffic prediction to improve provider profit
  3. VNF Placement Optimization: Zhang et al. introduce heuristic VNF placement methods
  4. Reinforcement Learning Approaches: William et al. propose SARA and DSARA models

Advantages of This Work

Compared to existing work, this paper is the first to explicitly consider both latency and profit in a DRL framework while employing a more stable exploration strategy.

Conclusions and Discussion

Main Conclusions

  1. DePSAC enables DRL agents to effectively balance profitability and QoS objectives through delay-aware reward design
  2. Boltzmann exploration achieves smoother and more stable convergence compared to epsilon-greedy strategy
  3. Consistently outperforms DSARA baseline across multiple performance metrics

Limitations

  1. Simulation Environment Constraints: Validation only in simulated environments; lacks real network deployment verification
  2. Parameter Sensitivity: Insufficient analysis of sensitivity to temperature parameter τ and priority weights
  3. Scalability Analysis: Performance evaluation on larger-scale networks not conducted
  4. Dynamic Adaptability: Limited adaptive capability to dynamically changing network conditions and traffic patterns

Future Directions

  1. Federated 5G Architecture: Extend DePSAC to support federated 5G architectures
  2. Dynamic Load Assessment: Evaluate robustness under dynamic traffic loads
  3. Mobility Support: Assess mobile scenarios using real deployment trajectories
  4. Real Deployment Validation: Verify method effectiveness in actual 5G networks

In-Depth Evaluation

Strengths

  1. Strong Problem Targeting: Clearly identifies the critical issue of existing methods neglecting latency factors
  2. Reasonable Method Innovation: Delay-aware reward function design is intuitive and effective
  3. Well-Founded Technical Improvements: Boltzmann exploration adoption has sufficient theoretical justification
  4. Complete Experimental Design: Multi-dimensional evaluation metrics comprehensively validate method effectiveness
  5. Convincing Results: Improvements demonstrated across all key metrics

Weaknesses

  1. Insufficient Theoretical Analysis: Lacks convergence and optimality guarantees
  2. Missing Parameter Tuning Guidance: No guidance provided for selecting temperature parameter and priority weights
  3. Absent Computational Complexity Analysis: No analysis of computational overhead compared to baseline
  4. Insufficient Robustness Verification: Performance under abnormal traffic or network failures not tested
  5. Limited Practical Deployment Considerations: Insufficient discussion of challenges likely encountered in actual deployment

Impact

  1. Academic Contribution: Provides new perspectives for multi-objective optimization in 5G network slicing
  2. Practical Value: Method has strong potential for real-world application
  3. Reproducibility: Provides sufficient implementation details for reproduction
  4. Generalizability: Delay-aware concepts can be extended to other network optimization problems

Applicable Scenarios

  1. 5G Network Operators: Network slice management requiring QoS-profit balance
  2. Edge Computing Environments: Deployment and resource allocation for latency-sensitive services
  3. Multi-Tenant Networks: Virtual network environments requiring service differentiation
  4. Real-Time Application Support: Latency-critical applications such as industrial IoT and autonomous driving

References

The paper cites 12 relevant references covering key areas including 5G network slicing, deep reinforcement learning, and resource allocation, providing sufficient theoretical foundation and comparison benchmarks.


Overall Assessment: This paper addresses the latency-profit trade-off problem in 5G network slice admission control with an innovative and practical solution. The method design is sound, experimental validation is comprehensive, and the work demonstrates good academic value and application prospects in this field. Main areas for improvement include theoretical analysis and practical deployment considerations.