Kubernetes compute cost is the dominant operational expense for organisations running large clusters. The good news is that most clusters are significantly over-provisioned and the optimisation levers are well-understood.

Resource requests and limits

Kubernetes schedules pods based on resource requests. If a pod requests 2 CPU cores, the scheduler needs a node with 2 available CPU cores. If most pods request more CPU than they use, nodes are underutilised. The right-sizing work: query actual CPU and memory consumption from Prometheus or Container Insights, compare to requested resources, and tighten requests to match actual usage. VPA (Vertical Pod Autoscaler) can automate this by adjusting requests based on observed usage.

Cluster autoscaler

The cluster autoscaler adds nodes when pods cannot be scheduled due to insufficient resources and removes nodes when they are underutilised. Configuring the autoscaler correctly: set minimum node count high enough to handle baseline load, maximum node count to cap unexpected scaling, and the scale-down delay long enough that transient load spikes do not cause churn. With autoscaler, you pay only for the capacity actually in use.

Spot instances for batch workloads

AWS Spot, Azure Spot VMs, and GCP Preemptible VMs run at 60-80% discount compared to on-demand pricing. The trade-off: the cloud provider can reclaim the instance with 2 minutes notice. For workloads that can tolerate interruption (batch processing, stateless microservices with fast restart), spot instances reduce compute cost substantially. Kubernetes handles spot instance removal gracefully with pod disruption budgets and graceful termination.

Namespace resource quotas

Kubernetes resource quotas limit the total CPU, memory, and object count that a namespace can consume. Setting quotas per team namespace prevents a single team's misconfigured deployment from consuming cluster-wide resources. The quota design should match the team's expected workload with some headroom: quotas set too tight block legitimate scale-out; quotas set too loose do not constrain runaway consumption.