I've seen many organisations struggle with Kubernetes costs, and it's usually because their clusters are significantly over-provisioned. The good news is that optimisation is straightforward, and most of the work involves understanding how resources are being used.
Kubernetes schedules pods based on resource requests, which means if a pod requests 2 CPU cores, the scheduler needs a node with 2 available CPU cores. But if most pods request more CPU than they use, nodes are underutilised, which is a huge waste of resources. To fix this, you can query actual CPU and memory consumption from Prometheus or Container Insights, compare it to requested resources, and tighten requests to match actual usage.
For example, I worked on a project where the average CPU utilisation was around 20%, but the requested CPU was 2-3 times higher. By adjusting the requests to match actual usage, we were able to reduce the number of nodes required by 30%, resulting in significant cost savings.
The Vertical Pod Autoscaler, or VPA, can automate this process by adjusting requests based on observed usage. This is a big help, because it saves you from having to constantly monitor and adjust resource requests manually. With VPA, you can focus on more important things, like building and deploying your applications.
When configuring VPA, it's essential to set the correct update policy to avoid sudden changes in resource requests. A gradual update policy, such as 10-20% increase per hour, can help prevent pod evictions and ensure a smooth transition.
The cluster autoscaler is another key tool for managing Kubernetes costs. It adds nodes when pods cannot be scheduled due to insufficient resources, and removes nodes when they are underutilised. To use the autoscaler correctly, you need to set the minimum node count high enough to handle baseline load, the maximum node count to cap unexpected scaling, and the scale-down delay long enough that transient load spikes do not cause churn.
When you get the autoscaler configured correctly, you only pay for the capacity you actually use, which can be a huge cost savings. This is especially important for organisations with variable workloads, because it means you don't have to worry about provisioning for peak demand all the time.
For batch workloads, spot instances can be a great way to reduce compute costs. AWS Spot, Azure Spot VMs, and GCP Preemptible VMs all offer discounts of 60-80% compared to on-demand pricing. The trade-off is that the cloud provider can reclaim the instance with 2 minutes notice, but for workloads that can tolerate interruption, this is a small price to pay.
Kubernetes handles spot instance removal gracefully, with features like pod disruption budgets and graceful termination. This means you can use spot instances for batch processing, stateless microservices with fast restart, and other workloads that don't require dedicated instances.
Namespace resource quotas are another important tool for managing Kubernetes costs. They limit the total CPU, memory, and object count that a namespace can consume, which prevents a single team's misconfigured deployment from consuming cluster-wide resources. The key is to set quotas per team namespace, and design them to match the team's expected workload with some headroom.
If you set quotas too tight, you block legitimate scale-out, but if you set them too loose, you don't constrain runaway consumption. It's a delicate balance, but when you get it right, you can prevent cost overruns and ensure that your cluster is running efficiently.