2025-11-17T14:19:12.162044

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

Chakraborty, Asrar, Sengupta et al.

5G networks enable diverse services such as eMBB, URLLC, and mMTC through network slicing, necessitating intelligent admission control and resource allocation to meet stringent QoS requirements while maximizing Network Service Provider (NSP) profits. However, existing Deep Reinforcement Learning (DRL) frameworks focus primarily on profit optimization without explicitly accounting for service delay, potentially leading to QoS violations for latency-sensitive slices. Moreover, commonly used epsilon-greedy exploration of DRL often results in unstable convergence and suboptimal policy learning. To address these gaps, we propose DePSAC -- a Delay and Profit-aware Slice Admission Control scheme. Our DRL-based approach incorporates a delay-aware reward function, where penalties due to service delay incentivize the prioritization of latency-critical slices such as URLLC. Additionally, we employ Boltzmann exploration to achieve smoother and faster convergence. We implement and evaluate DePSAC on a simulated 5G core network substrate with realistic Network Slice Request (NSLR) arrival patterns. Experimental results demonstrate that our method outperforms the DSARA baseline in terms of overall profit, reduced URLLC slice delays, improved acceptance rates, and improved resource consumption. These findings validate the effectiveness of the proposed DePSAC in achieving better QoS-profit trade-offs for practical 5G network slicing scenarios.

academic

Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices

Basic Information

Paper ID: 2510.08769
Title: Prioritizing Latency with Profit: A DRL-Based Admission Control for 5G Network Slices
Authors: Proggya Chakraborty, Aaquib Asrar, Jayasree Sengupta, Sipra Das Bit
Categories: cs.NI (Networking and Internet Architecture), cs.LG (Machine Learning), cs.PF (Performance)
Submission Date: October 9, 2025 to arXiv
Paper Link: https://arxiv.org/abs/2510.08769v1

Abstract

This paper proposes DePSAC (Delay and Profit-aware Slice Admission Control), a deep reinforcement learning-based solution for admission control in 5G network slicing. The scheme simultaneously maximizes network service provider (NSP) profit while explicitly considering service latency, with particular emphasis on prioritizing ultra-reliable low-latency communication (URLLC) slices. The approach employs a delay-aware reward function and Boltzmann exploration strategy, validated on a simulated 5G core network demonstrating improvements over the baseline DSARA method in profit, latency, acceptance rate, and resource consumption.

Research Background and Motivation

Problem Definition

5G networks support diverse services through network slicing technology, including enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (URLLC), and massive machine-type communication (mMTC). These services have heterogeneous QoS requirements, necessitating intelligent admission control and resource allocation strategies to balance strict QoS demands with NSP profitability.

Problem Significance

Service Diversity Challenge: Different slice types have varying requirements for latency, reliability, and bandwidth
Resource Optimization Need: Limited physical resources must be efficiently allocated across multiple virtual networks
Business Viability: NSPs must ensure profitability while meeting QoS requirements

Limitations of Existing Methods

Neglect of Latency Factors: Existing DRL frameworks primarily focus on profit optimization without explicitly considering service latency
Unstable Exploration Strategies: Epsilon-greedy exploration leads to unstable convergence and suboptimal policy learning
QoS Violation Risk: Latency-sensitive services (e.g., URLLC) may experience QoS violations

Research Motivation

While the baseline DSARA method effectively maximizes profit, it fails to account for latency differences across slice types, potentially causing QoS violations. This work aims to develop a slice admission control scheme that simultaneously considers both latency and profit.

Core Contributions

Delay-Aware Reward Function: Proposes a profit-delay-aware reward formula balancing QoS requirements and NSP profitability
Boltzmann Exploration Strategy: Integrates Boltzmann exploration into the DRL agent, improving learning stability and avoiding local optima inherent to epsilon-greedy methods
Comprehensive Experimental Evaluation: Implements DePSAC on a simulated 5G core network using realistic network slice request arrival patterns
Performance Improvement Verification: Experimental results validate DePSAC's improvements in profit-QoS trade-offs, achieving shorter service latency, higher acceptance rates, and lower bandwidth utilization

Methodology Details

Task Definition

Input: Network slice request (NSLR) stream containing slice type, resource requirements, and runtime Output: Admission decisions and resource allocation policies Objective: Maximize NSP profit while minimizing service latency, particularly for URLLC slices

Model Architecture

System Architecture

Adopts the DeepSARA framework architecture with four main modules:

Admission Control Module (ACM): Uses DRL agent to assign priority weights for slice types
Resource Allocation Module (RAM): Maps VNFs to nodes based on availability and QoS constraints
Monitoring Module: Continuously collects resource state data
Lifecycle Module: Instantiates accepted slices and releases resources upon expiration

5G Core Network Substrate

Modeled as NFV infrastructure (NFVI) containing core nodes (high capacity) and edge nodes (low latency)
Represented as weighted undirected graph SN = {N,L}, where nodes N have CPU capacity and links L have bandwidth

Delay-Aware Reward Function

The core innovation of DePSAC is the delay-aware reward function:

penaltyi = priorityi × delayi                    (1a)
profiti = (revenuei - costi) × To               (1b)
reward(nsli) = profiti - penaltyi               (1c)
R = Σ(i=0 to k) reward(nsli) / maxProfit(SN,T) (1d)

Where:

priorityi: Priority level determined by slice type (URLLC > eMBB > mMTC)
delayi: Time interval from NSL request i arrival to service
To: Slice runtime
revenuei and costi: Revenue and operational costs

Boltzmann Exploration Strategy

Replaces epsilon-greedy with Boltzmann exploration:

P(a) = e^(Q[s,a]/τ) × Q[s,a] / Σ(a) e^(Q[s,a]/τ)   (2)

Where τ is the temperature parameter controlling exploration diversity. High τ encourages exploration, low τ promotes exploitation.

Technical Innovations

Latency Penalty Mechanism: Introduces latency penalty terms in the reward function, incentivizing the agent to prioritize latency-sensitive slices
Smooth Exploration Strategy: Boltzmann exploration selects actions based on Q-value probability distributions, avoiding purely random or greedy behavior
Multi-Objective Optimization: Simultaneously considers profit maximization and latency minimization, achieving better QoS-profit trade-offs

Experimental Setup

Dataset

Substrate Network: 64-node Barabási-Albert topology capturing scale-free properties of real 5G infrastructure
Slice Requests: Dynamically generated NSLRs containing three service types (eMBB, URLLC, mMTC)
Arrival Pattern: Realistic network slice request arrival patterns

Evaluation Metrics

Profit: Total revenue NSP obtains from serving network slice requests minus operational costs
Acceptance Rate (AR): Proportion of successfully admitted NSLRs, AR = req_a / req_t
Latency: Service time after request arrival, Delay = T_finished - T_arrival
Resource Consumption (C): Proportion of processing and bandwidth resources allocated to accepted slices

Comparison Methods

Baseline: DSARA method, a DRL-based joint admission control and resource allocation framework

Implementation Details

Development Environment: Python 3, modular object-oriented design
Hardware Platform: AMD Ryzen 5 processor, 16GB RAM, Windows 11
Graph Processing: NetworkX library for managing substrate network and NSLR graph representations
Simulator: Discrete-event simulator integrated with delay-aware DRL agent

Experimental Results

Main Results

Profit Performance

Overall Profit: DePSAC shows slightly lower profit than DSARA during early training due to exploration, but consistently outperforms baseline as training progresses
Categorical Profit: Profit improvements across all service types (eMBB, URLLC, mMTC), with URLLC showing most significant gains

Latency Performance

Overall Latency: DePSAC achieves lower average latency compared to DSARA
URLLC Latency: Significant latency reduction relative to DSARA, validating effective prioritization of time-critical slices
Other Service Types: mMTC latency shows moderate but continuous reduction; eMBB latency converges to below-baseline values after exploration phase

Acceptance Rate Performance

Overall Acceptance Rate: DePSAC eventually surpasses DSARA by faster request servicing and resource release, allowing more requests to be accepted
URLLC Acceptance Rate: Significantly improved, reflecting agent's learned prioritization of latency-sensitive requests
eMBB Acceptance Rate: Moderately increased
mMTC Acceptance Rate: Slight decrease but within acceptable range

Resource Consumption Performance

Overall Consumption: DePSAC demonstrates slight resource consumption reduction in later training stages
Bandwidth Efficiency: Total bandwidth usage reduced due to prioritizing URLLC slices with lower resource requirements
CPU Utilization: Remains consistent or shows slight improvement

Ablation Studies

The paper validates the effectiveness of delay-aware reward function and Boltzmann exploration through comparison with DSARA, though detailed component-level ablation analysis is not provided.

Experimental Findings

Latency-Profit Balance: Latency penalties do not harm profitability; the agent learns to effectively balance and even improve NSP revenue maximization
Service Differentiation: Successfully achieves prioritization of latency-sensitive services while maintaining performance for other service types
Resource Efficiency: Achieves more compact and latency-efficient embeddings through intelligent admission decisions
Convergence Stability: Boltzmann exploration promotes smoother and more stable convergence

Main Research Directions

Queuing Theory-Based Slicing: Han et al. propose utility-driven multi-service slicing methods
Big Data Analytics Prediction: Raza et al. leverage traffic prediction to improve provider profit
VNF Placement Optimization: Zhang et al. introduce heuristic VNF placement methods
Reinforcement Learning Approaches: William et al. propose SARA and DSARA models

Advantages of This Work

Compared to existing work, this paper is the first to explicitly consider both latency and profit in a DRL framework while employing a more stable exploration strategy.

Conclusions and Discussion

Main Conclusions

DePSAC enables DRL agents to effectively balance profitability and QoS objectives through delay-aware reward design
Boltzmann exploration achieves smoother and more stable convergence compared to epsilon-greedy strategy
Consistently outperforms DSARA baseline across multiple performance metrics

Limitations

Simulation Environment Constraints: Validation only in simulated environments; lacks real network deployment verification
Parameter Sensitivity: Insufficient analysis of sensitivity to temperature parameter τ and priority weights
Scalability Analysis: Performance evaluation on larger-scale networks not conducted
Dynamic Adaptability: Limited adaptive capability to dynamically changing network conditions and traffic patterns

Future Directions

Federated 5G Architecture: Extend DePSAC to support federated 5G architectures
Dynamic Load Assessment: Evaluate robustness under dynamic traffic loads
Mobility Support: Assess mobile scenarios using real deployment trajectories
Real Deployment Validation: Verify method effectiveness in actual 5G networks

In-Depth Evaluation

Strengths

Strong Problem Targeting: Clearly identifies the critical issue of existing methods neglecting latency factors
Reasonable Method Innovation: Delay-aware reward function design is intuitive and effective
Well-Founded Technical Improvements: Boltzmann exploration adoption has sufficient theoretical justification
Complete Experimental Design: Multi-dimensional evaluation metrics comprehensively validate method effectiveness
Convincing Results: Improvements demonstrated across all key metrics

Weaknesses

Insufficient Theoretical Analysis: Lacks convergence and optimality guarantees
Missing Parameter Tuning Guidance: No guidance provided for selecting temperature parameter and priority weights
Absent Computational Complexity Analysis: No analysis of computational overhead compared to baseline
Insufficient Robustness Verification: Performance under abnormal traffic or network failures not tested
Limited Practical Deployment Considerations: Insufficient discussion of challenges likely encountered in actual deployment

Impact

Academic Contribution: Provides new perspectives for multi-objective optimization in 5G network slicing
Practical Value: Method has strong potential for real-world application
Reproducibility: Provides sufficient implementation details for reproduction
Generalizability: Delay-aware concepts can be extended to other network optimization problems

Applicable Scenarios

5G Network Operators: Network slice management requiring QoS-profit balance
Edge Computing Environments: Deployment and resource allocation for latency-sensitive services
Multi-Tenant Networks: Virtual network environments requiring service differentiation
Real-Time Application Support: Latency-critical applications such as industrial IoT and autonomous driving

References

The paper cites 12 relevant references covering key areas including 5G network slicing, deep reinforcement learning, and resource allocation, providing sufficient theoretical foundation and comparison benchmarks.

Overall Assessment: This paper addresses the latency-profit trade-off problem in 5G network slice admission control with an innovative and practical solution. The method design is sound, experimental validation is comprehensive, and the work demonstrates good academic value and application prospects in this field. Main areas for improvement include theoretical analysis and practical deployment considerations.