Research in compute resource management for cloud-native applications is dominated by the problem of setting optimal CPU limits -- a fundamental OS mechanism that strictly restricts a container's CPU usage to its specified CPU-limits . Rightsizing and autoscaling works have innovated on allocation/scaling policies assuming the ubiquity and necessity of CPU-limits . We question this. Practical experiences of cloud users indicate that CPU-limits harms application performance and costs more than it helps. These observations are in contradiction to the conventional wisdom presented in both academic research and industry best practices. We argue that this indiscriminate adoption of CPU-limits is driven by erroneous beliefs that CPU-limits is essential for operational and safety purposes. We provide empirical evidence making a case for eschewing CPU-limits completely from latency-sensitive applications. This prompts a fundamental rethinking of auto-scaling and billing paradigms and opens new research avenues. Finally, we highlight specific scenarios where CPU-limits can be beneficial if used in a well-reasoned way (e.g. background jobs).
- Paper ID: 2510.10747
- Title: CPU-Limits kill Performance: Time to rethink Resource Control
- Authors: Chirag Shetty (UIUC), Sarthak Chakraborty (UIUC), Hubertus Franke (IBM Research), Larisa Shwartz (IBM Research), Chandra Narayanaswami (IBM Research), Indranil Gupta (UIUC), Saurabh Jha (IBM Research)
- Categories: cs.DC (Distributed Computing), cs.OS (Operating Systems), cs.PF (Performance)
- Publication Date: October 2025 (arXiv preprint)
- Paper Link: https://arxiv.org/abs/2510.10747
This paper fundamentally challenges CPU-Limits, a core mechanism in computational resource management for cloud-native applications. Despite widespread acceptance in both academic research and industrial practice, the authors provide empirical evidence demonstrating that CPU-Limits actually degrade application performance and increase costs. The paper argues that latency-sensitive applications should completely abandon CPU-Limits, necessitating fundamental rethinking of auto-scaling and billing models, while identifying legitimate use cases for CPU-Limits in specific scenarios such as background tasks.
CPU resource management for containerized microservices is a fundamental problem in cloud computing. The current mainstream approach strictly limits container CPU usage through the CPU-Limits (c.limit) mechanism, implemented via Linux's cpu.cfs_quota_us. However, the authors observe a significant gap between theory and practice in actual deployments.
- Performance Impact: CPU-Limits-induced throttling causes dramatic latency degradation and can trigger cascading failures
- Cost Issues: Safety margins set to avoid throttling result in 25-45% resource over-provisioning
- Operational Complexity: DevOps personnel must navigate complex trade-offs among multiple fine-grained CPU limits
Existing auto-scaling research (such as FIRM, Cilantro, Autothrottle) all build upon the assumption that CPU-Limits are necessary, focusing on optimizing limit values rather than questioning the mechanism itself. Through analysis, the authors find that these approaches fail when CPU-Limits are removed.
Through interviews with SREs (Site Reliability Engineers) and surveys of online discussions, the authors discover disagreement within the operations community regarding CPU-Limits. Many practitioners have already begun removing CPU-Limits to improve performance, contrasting with mainstream academic opinion.
- Challenging Conventional Wisdom: First systematic questioning of CPU-Limits necessity in latency-sensitive applications, supported by substantial empirical evidence
- Performance Analysis: In-depth analysis of negative mechanisms through which CPU-Limits impact latency, reliability, and cost
- Alternative Design: Demonstrates feasibility and advantages of resource management using only CPU-Requests (c.req)
- New Paradigm: Proposes performance-based billing models and unrestricted auto-scaling design
- Prototype Implementation: Develops YAAS (Yet Another AutoScaler) prototype, achieving 51% resource savings
- Application Scenario Classification: Clearly delineates legitimate use cases for CPU-Limits (e.g., background tasks, CPU-bound workloads)
The research objective is to redesign container CPU resource management mechanisms, achieving better performance-cost trade-offs through optimizing CPU-Requests and node utilization without using CPU-Limits.
The authors construct a decision tree (Figure 1) to systematically analyze various CPU-Limits configuration scenarios:
- limit = req: Increases costs, requiring 25-45% safety margins
- limit > req:
- If the limit is never reached, it is unnecessary
- If the limit may be reached, it causes auto-scalers to "hang" or causes dramatic latency degradation
The authors prove the sufficiency of using only CPU-Requests from two levels:
CFS Scheduler Guarantees: The Linux CFS scheduler provides proportional fairness guarantees, ensuring that Pod P_i with CPU-Requests r_i receives at least (r_i/Σr_j) × C CPU time on a node with total CPU C.
Orchestrator Gating: Orchestrators like Kubernetes ensure that the sum of CPU-Requests for all containers on a node does not exceed node capacity, making CPU-Requests an absolute minimum guarantee.
YAAS is based on two key control variables:
- Overage (U-R): The difference between Pod actual usage and allocated resources
- Node Utilization (N): Total CPU utilization of the Pod's host node
Core strategies:
- Maintain overage ≥ 0, increasing resources only when SLO is about to be violated
- Optimize node utilization through Pod migration
- Combine vertical and horizontal scaling
Two microservice applications from DeathStarBench:
- HotelReservation (HR): Hotel reservation system
- SocialNetwork (SN): Social network application
- Platform: Amazon EC2 cluster
- Load Patterns: Varying request loads simulating production environments
- Evaluation Metrics:
- End-to-end tail latency (P99)
- CPU resource usage
- Scaling frequency and convergence time
- Cost efficiency
- Traditional CPU-Limits-based HPA (Horizontal Pod Autoscaler)
- Manually optimized CPU-Limits configuration
- Different safety margin settings (20%-30%)
Latency Impact:
- Setting CPU-Limits on just one Pod (out of 19) causes end-to-end tail latency to degrade 5-fold
- CPU-Limits damage performance through two mechanisms: per-request throttling and cross-request queue formation
Cost Analysis:
- Avoiding throttling requires 25-45% resource over-provisioning
- Simply removing CPU-Limits saves 38% of resources
- YAAS further achieves 51% resource savings
Auto-Scaling Performance:
- When load increases 25%, raising the scaling threshold from 60% to 70% increases SLO satisfaction time 4-fold
- Demonstrates CPU-Limits' impact on auto-scaling sensitivity
Safety Margin Analysis: Different applications require different safety margins:
- nginx-thrift: 30%
- user-timeline-service: 45%
Queue Formation Mechanism: Theoretical analysis and experimental validation demonstrate how CPU-Limits form queues at lower loads, while CPU-Requests do not exhibit this problem.
Multi-tenant Scenarios: Experiments show that when two applications coexist, CPU-Requests effectively protect conformant applications from bursting applications, while CPU-Limits actually worsen performance.
Cascading Failures: Long queues caused by CPU-Limits may cause Pods to exceed memory limits, triggering Pod restarts, which in turn cause other Pods to hit limits or request timeouts.
The paper systematically analyzes recent auto-scaling work from top-tier conferences, finding they all depend on CPU-Limits:
- FIRM: Uses reinforcement learning to optimize CPU-Limits
- Cilantro: Adjusts resource allocation based on online feedback
- Autothrottle: Dual-layer approach for SLO targets
- Ursa: Analysis-driven resource management
- Kubernetes QoS classification requires critical containers to set CPU-Limits
- Cloud providers (e.g., GCP Autopilot) automatically apply CPU-Limits
- Multi-tenant best practices recommend using CPU-Limits
- CPU-Limits are Harmful: For latency-sensitive applications, CPU-Limits are either harmful (causing throttling) or useless (never reached)
- CPU-Requests are Sufficient: Guarantees from modern orchestrators and schedulers make CPU-Requests sufficient for resource isolation
- New Design Space: Removing CPU-Limits opens new optimization dimensions based on overage and node utilization
- Paradigm Shift Required: Requires redesigning auto-scaling and billing models
- Scope of Applicability: Primarily targets latency-sensitive applications; background tasks and similar scenarios still require CPU-Limits
- Experimental Scale: Experiments are based on specific microservice benchmarks, requiring larger-scale validation
- Production Deployment: Prototype YAAS requires further engineering for production use
- Ecosystem Changes: Requires coordinated changes in orchestrators, monitoring, and billing systems
- Intelligent Scheduling: Interference-aware scheduling incorporating microarchitectural resources (cache, memory bandwidth)
- Performance-Based Billing: Billing models based on SLO satisfaction rather than resource usage
- Vertical Scaling: Vertical scaling optimization in CPU-Limits-free environments
- Multi-dimensional Optimization: Joint optimization of Pod scaling and node scaling
- Disruptive Perspective: Courageously challenges fundamental assumptions in the field, with significant academic value
- Sufficient Empirical Evidence: Supports arguments through theoretical analysis, experimental validation, and industrial surveys
- Practical Value: Provides concrete alternative solutions and prototype implementations with direct applicability
- Systematic Analysis: Comprehensively analyzes the problem from multiple angles including performance, cost, and reliability
- Balanced Viewpoint: Criticizes CPU-Limits misuse while identifying legitimate use cases
- Experimental Limitations: Experiments primarily based on two microservice applications, lacking validation across broader application types
- Production Validation: Lacks long-term validation data from large-scale production environments
- Compatibility Analysis: Insufficient analysis of migration costs for existing systems and toolchains
- Security Considerations: Insufficient discussion of potential security risks from removing CPU-Limits
Academic Impact:
- May trigger paradigm shifts in resource management research
- Provides new design perspectives for auto-scaling research
- Challenges industry best practices of over a decade
Industrial Impact:
- Provides cloud service providers new cost optimization pathways
- May influence future design of orchestrators like Kubernetes
- Drives innovation in performance-based billing models
Direct Applicability:
- Latency-sensitive online services
- Cost-sensitive cloud-native applications
- Microservice architectures requiring high performance guarantees
Requires Caution:
- Multi-tenant environments (requiring additional isolation mechanisms)
- Mixed workloads containing background tasks
- Scenarios with strict resource usage compliance requirements
The paper cites 83 relevant references covering multiple domains including container orchestration, resource management, and auto-scaling. Key references include:
- Kubernetes official documentation and best practices
- Recent auto-scaling research from top-tier conferences (OSDI, NSDI, EuroSys, etc.)
- Linux kernel CPU scheduling and control group documentation
- Industrial practice experiences and case studies
Through its disruptive perspective and substantial empirical analysis, this paper presents an important challenge to the cloud-native resource management field. While completely removing CPU-Limits may require broad ecosystem transformation, the insights and alternative solutions it provides point toward new directions for future development in this field. The paper's value lies not only in its technical contributions but also in its profound reflection on industry-established practices.