Kubernetes makes it easy to deploy workloads. It does not automatically make those workloads efficient. Right-sizing Kubernetes workloads requires measurement, tuning, and ongoing process.
Resource requests and limits
Kubernetes scheduling is based on resource requests (the guaranteed resources a pod needs) not actual utilisation. A pod with a 4 CPU request will be scheduled on a node with 4 free CPU regardless of whether it actually uses 0.2 CPU. The consequence: if resource requests are set too high (a common pattern for 'safety'), nodes fill up faster than necessary and cluster costs are inflated.
Vertical Pod Autoscaler in recommendation mode
The Vertical Pod Autoscaler (VPA) in recommendation mode analyses pod resource utilisation and recommends request and limit values based on observed usage. Run VPA in recommendation mode (not auto mode) for production workloads, observing recommendations before applying them avoids unexpected pod evictions. Regularly reviewing VPA recommendations and updating request values is the systematic approach to right-sizing.
Namespace resource quotas and LimitRanges
Kubernetes resource quotas set aggregate CPU and memory limits for a namespace. LimitRanges set defaults and maximums for individual pods. Requiring teams to declare resource requests (via LimitRange defaults that require explicit override) surfaces the resource usage of every workload. Without quotas, unconstrained workloads can expand to fill node capacity at the expense of other workloads.
Node pool right-sizing
The node pool VM SKU should be matched to the dominant workload type. Memory-intensive workloads (Java applications with large heap, databases) need memory-optimised instances. CPU-intensive workloads (batch processing, compute-heavy services) need compute-optimised instances. A single general-purpose node pool running both types wastes either CPU or memory capacity depending on the mix. The operational complexity of multiple node pools is justified by the cost efficiency.