2025-11-20T20:19:15.373671

CPU-Limits kill Performance: Time to rethink Resource Control

Shetty, Chakraborty, Franke et al.
Research in compute resource management for cloud-native applications is dominated by the problem of setting optimal CPU limits -- a fundamental OS mechanism that strictly restricts a container's CPU usage to its specified CPU-limits . Rightsizing and autoscaling works have innovated on allocation/scaling policies assuming the ubiquity and necessity of CPU-limits . We question this. Practical experiences of cloud users indicate that CPU-limits harms application performance and costs more than it helps. These observations are in contradiction to the conventional wisdom presented in both academic research and industry best practices. We argue that this indiscriminate adoption of CPU-limits is driven by erroneous beliefs that CPU-limits is essential for operational and safety purposes. We provide empirical evidence making a case for eschewing CPU-limits completely from latency-sensitive applications. This prompts a fundamental rethinking of auto-scaling and billing paradigms and opens new research avenues. Finally, we highlight specific scenarios where CPU-limits can be beneficial if used in a well-reasoned way (e.g. background jobs).
academic

CPU-Limits kill Performance: Time to rethink Resource Control

Basic Information

  • Paper ID: 2510.10747
  • Title: CPU-Limits kill Performance: Time to rethink Resource Control
  • Authors: Chirag Shetty (UIUC), Sarthak Chakraborty (UIUC), Hubertus Franke (IBM Research), Larisa Shwartz (IBM Research), Chandra Narayanaswami (IBM Research), Indranil Gupta (UIUC), Saurabh Jha (IBM Research)
  • Categories: cs.DC (Distributed Computing), cs.OS (Operating Systems), cs.PF (Performance)
  • Publication Date: October 2025 (arXiv preprint)
  • Paper Link: https://arxiv.org/abs/2510.10747

Abstract

This paper fundamentally challenges CPU-Limits, a core mechanism in computational resource management for cloud-native applications. Despite widespread acceptance in both academic research and industrial practice, the authors provide empirical evidence demonstrating that CPU-Limits actually degrade application performance and increase costs. The paper argues that latency-sensitive applications should completely abandon CPU-Limits, necessitating fundamental rethinking of auto-scaling and billing models, while identifying legitimate use cases for CPU-Limits in specific scenarios such as background tasks.

Research Background and Motivation

Problem Definition

CPU resource management for containerized microservices is a fundamental problem in cloud computing. The current mainstream approach strictly limits container CPU usage through the CPU-Limits (c.limit) mechanism, implemented via Linux's cpu.cfs_quota_us. However, the authors observe a significant gap between theory and practice in actual deployments.

Problem Significance

  1. Performance Impact: CPU-Limits-induced throttling causes dramatic latency degradation and can trigger cascading failures
  2. Cost Issues: Safety margins set to avoid throttling result in 25-45% resource over-provisioning
  3. Operational Complexity: DevOps personnel must navigate complex trade-offs among multiple fine-grained CPU limits

Limitations of Existing Approaches

Existing auto-scaling research (such as FIRM, Cilantro, Autothrottle) all build upon the assumption that CPU-Limits are necessary, focusing on optimizing limit values rather than questioning the mechanism itself. Through analysis, the authors find that these approaches fail when CPU-Limits are removed.

Research Motivation

Through interviews with SREs (Site Reliability Engineers) and surveys of online discussions, the authors discover disagreement within the operations community regarding CPU-Limits. Many practitioners have already begun removing CPU-Limits to improve performance, contrasting with mainstream academic opinion.

Core Contributions

  1. Challenging Conventional Wisdom: First systematic questioning of CPU-Limits necessity in latency-sensitive applications, supported by substantial empirical evidence
  2. Performance Analysis: In-depth analysis of negative mechanisms through which CPU-Limits impact latency, reliability, and cost
  3. Alternative Design: Demonstrates feasibility and advantages of resource management using only CPU-Requests (c.req)
  4. New Paradigm: Proposes performance-based billing models and unrestricted auto-scaling design
  5. Prototype Implementation: Develops YAAS (Yet Another AutoScaler) prototype, achieving 51% resource savings
  6. Application Scenario Classification: Clearly delineates legitimate use cases for CPU-Limits (e.g., background tasks, CPU-bound workloads)

Methodology Details

Task Definition

The research objective is to redesign container CPU resource management mechanisms, achieving better performance-cost trade-offs through optimizing CPU-Requests and node utilization without using CPU-Limits.

Core Argumentation Framework

The authors construct a decision tree (Figure 1) to systematically analyze various CPU-Limits configuration scenarios:

  1. limit = req: Increases costs, requiring 25-45% safety margins
  2. limit > req:
    • If the limit is never reached, it is unnecessary
    • If the limit may be reached, it causes auto-scalers to "hang" or causes dramatic latency degradation

Sufficiency Proof for CPU-Requests

The authors prove the sufficiency of using only CPU-Requests from two levels:

CFS Scheduler Guarantees: The Linux CFS scheduler provides proportional fairness guarantees, ensuring that Pod P_i with CPU-Requests r_i receives at least (r_i/Σr_j) × C CPU time on a node with total CPU C.

Orchestrator Gating: Orchestrators like Kubernetes ensure that the sum of CPU-Requests for all containers on a node does not exceed node capacity, making CPU-Requests an absolute minimum guarantee.

YAAS Prototype Design

YAAS is based on two key control variables:

  1. Overage (U-R): The difference between Pod actual usage and allocated resources
  2. Node Utilization (N): Total CPU utilization of the Pod's host node

Core strategies:

  • Maintain overage ≥ 0, increasing resources only when SLO is about to be violated
  • Optimize node utilization through Pod migration
  • Combine vertical and horizontal scaling

Experimental Setup

Dataset

Two microservice applications from DeathStarBench:

  • HotelReservation (HR): Hotel reservation system
  • SocialNetwork (SN): Social network application

Experimental Environment

  • Platform: Amazon EC2 cluster
  • Load Patterns: Varying request loads simulating production environments
  • Evaluation Metrics:
    • End-to-end tail latency (P99)
    • CPU resource usage
    • Scaling frequency and convergence time
    • Cost efficiency

Comparison Methods

  • Traditional CPU-Limits-based HPA (Horizontal Pod Autoscaler)
  • Manually optimized CPU-Limits configuration
  • Different safety margin settings (20%-30%)

Experimental Results

Main Results

Latency Impact:

  • Setting CPU-Limits on just one Pod (out of 19) causes end-to-end tail latency to degrade 5-fold
  • CPU-Limits damage performance through two mechanisms: per-request throttling and cross-request queue formation

Cost Analysis:

  • Avoiding throttling requires 25-45% resource over-provisioning
  • Simply removing CPU-Limits saves 38% of resources
  • YAAS further achieves 51% resource savings

Auto-Scaling Performance:

  • When load increases 25%, raising the scaling threshold from 60% to 70% increases SLO satisfaction time 4-fold
  • Demonstrates CPU-Limits' impact on auto-scaling sensitivity

Ablation Studies

Safety Margin Analysis: Different applications require different safety margins:

  • nginx-thrift: 30%
  • user-timeline-service: 45%

Queue Formation Mechanism: Theoretical analysis and experimental validation demonstrate how CPU-Limits form queues at lower loads, while CPU-Requests do not exhibit this problem.

Case Studies

Multi-tenant Scenarios: Experiments show that when two applications coexist, CPU-Requests effectively protect conformant applications from bursting applications, while CPU-Limits actually worsen performance.

Cascading Failures: Long queues caused by CPU-Limits may cause Pods to exceed memory limits, triggering Pod restarts, which in turn cause other Pods to hit limits or request timeouts.

Auto-Scaling Research

The paper systematically analyzes recent auto-scaling work from top-tier conferences, finding they all depend on CPU-Limits:

  • FIRM: Uses reinforcement learning to optimize CPU-Limits
  • Cilantro: Adjusts resource allocation based on online feedback
  • Autothrottle: Dual-layer approach for SLO targets
  • Ursa: Analysis-driven resource management

Industrial Practice

  • Kubernetes QoS classification requires critical containers to set CPU-Limits
  • Cloud providers (e.g., GCP Autopilot) automatically apply CPU-Limits
  • Multi-tenant best practices recommend using CPU-Limits

Conclusions and Discussion

Main Conclusions

  1. CPU-Limits are Harmful: For latency-sensitive applications, CPU-Limits are either harmful (causing throttling) or useless (never reached)
  2. CPU-Requests are Sufficient: Guarantees from modern orchestrators and schedulers make CPU-Requests sufficient for resource isolation
  3. New Design Space: Removing CPU-Limits opens new optimization dimensions based on overage and node utilization
  4. Paradigm Shift Required: Requires redesigning auto-scaling and billing models

Limitations

  1. Scope of Applicability: Primarily targets latency-sensitive applications; background tasks and similar scenarios still require CPU-Limits
  2. Experimental Scale: Experiments are based on specific microservice benchmarks, requiring larger-scale validation
  3. Production Deployment: Prototype YAAS requires further engineering for production use
  4. Ecosystem Changes: Requires coordinated changes in orchestrators, monitoring, and billing systems

Future Directions

  1. Intelligent Scheduling: Interference-aware scheduling incorporating microarchitectural resources (cache, memory bandwidth)
  2. Performance-Based Billing: Billing models based on SLO satisfaction rather than resource usage
  3. Vertical Scaling: Vertical scaling optimization in CPU-Limits-free environments
  4. Multi-dimensional Optimization: Joint optimization of Pod scaling and node scaling

In-Depth Evaluation

Strengths

  1. Disruptive Perspective: Courageously challenges fundamental assumptions in the field, with significant academic value
  2. Sufficient Empirical Evidence: Supports arguments through theoretical analysis, experimental validation, and industrial surveys
  3. Practical Value: Provides concrete alternative solutions and prototype implementations with direct applicability
  4. Systematic Analysis: Comprehensively analyzes the problem from multiple angles including performance, cost, and reliability
  5. Balanced Viewpoint: Criticizes CPU-Limits misuse while identifying legitimate use cases

Shortcomings

  1. Experimental Limitations: Experiments primarily based on two microservice applications, lacking validation across broader application types
  2. Production Validation: Lacks long-term validation data from large-scale production environments
  3. Compatibility Analysis: Insufficient analysis of migration costs for existing systems and toolchains
  4. Security Considerations: Insufficient discussion of potential security risks from removing CPU-Limits

Impact

Academic Impact:

  • May trigger paradigm shifts in resource management research
  • Provides new design perspectives for auto-scaling research
  • Challenges industry best practices of over a decade

Industrial Impact:

  • Provides cloud service providers new cost optimization pathways
  • May influence future design of orchestrators like Kubernetes
  • Drives innovation in performance-based billing models

Applicable Scenarios

Direct Applicability:

  • Latency-sensitive online services
  • Cost-sensitive cloud-native applications
  • Microservice architectures requiring high performance guarantees

Requires Caution:

  • Multi-tenant environments (requiring additional isolation mechanisms)
  • Mixed workloads containing background tasks
  • Scenarios with strict resource usage compliance requirements

References

The paper cites 83 relevant references covering multiple domains including container orchestration, resource management, and auto-scaling. Key references include:

  • Kubernetes official documentation and best practices
  • Recent auto-scaling research from top-tier conferences (OSDI, NSDI, EuroSys, etc.)
  • Linux kernel CPU scheduling and control group documentation
  • Industrial practice experiences and case studies

Through its disruptive perspective and substantial empirical analysis, this paper presents an important challenge to the cloud-native resource management field. While completely removing CPU-Limits may require broad ecosystem transformation, the insights and alternative solutions it provides point toward new directions for future development in this field. The paper's value lies not only in its technical contributions but also in its profound reflection on industry-established practices.