Kubernetes cluster autoscaling (adding and removing nodes based on workload demand) provides cost efficiency and capacity resilience. The operational patterns for reliable autoscaling require understanding the autoscaler's decision model.

How the Cluster Autoscaler works

The Cluster Autoscaler (CA) monitors for unschedulable pods (pods in Pending state because no node has sufficient resources) and triggers node pool scale-out. The CA selects which node group to expand based on expander configuration (least waste, most pods). Scale-in occurs when nodes are underutilised (below the scale-in threshold, typically 50%) for a sustained period (scale-in delay, typically 10 minutes). The CA removes nodes that can be safely drained of their pods.

PodDisruptionBudgets for safe scale-in

Without PodDisruptionBudgets, the CA can drain a node and displace pods in a way that violates availability requirements. A PDB specifying minAvailable: 2 for a deployment with 3 replicas ensures the CA cannot drain a node if doing so would leave fewer than 2 replicas available. PDBs are a prerequisite for safe cluster autoscaling, they are the mechanism by which workloads communicate their availability constraints to the cluster control plane.

Node pool topology for autoscaling

Autoscaling node pools that span multiple availability zones and use a consistent VM SKU provide resilient scale-out. A scale event can land on any zone where capacity is available. A node pool that spans zones 1, 2, and 3 scales out regardless of which zone has available capacity. The CA does not guarantee balanced distribution across zones; for zone-balanced workloads, pod topology spread constraints (Kubernetes 1.18) distribute pods evenly.

The scale-out latency problem

A node scale-out event takes 2-5 minutes (VM provisioning time). An application that sees a traffic spike needs new capacity before the spike peaks, not 5 minutes after. The solution: provision some over-capacity at all times (min node pool size above zero) or use fast-scaling solutions (Azure Virtual Nodes for serverless burst, node pool pre-warming). The CA is a cost optimiser; it is not designed for rapid response to traffic spikes.