AKS has matured considerably since its preview in 2017. In 2021 it is a production-ready Kubernetes service with a well-understood set of operational patterns. The rough edges of early AKS have been smoothed out; the remaining challenges are Kubernetes challenges, not AKS-specific.
Node pool architecture
AKS node pools allow different VM SKUs for different workload types within a cluster. A common pattern: a system node pool with smaller VMs for Kubernetes system components (CoreDNS, metrics-server, AKS-specific daemonsets) and one or more user node pools for application workloads. CPU-intensive ML workloads go on GPU node pools. Spot instance node pools for batch or fault-tolerant workloads reduce cost significantly.
Managed identities as the auth model
Pod identity (now superseded by Workload Identity in 2022) was the 2021 pattern for giving pods access to Azure resources without embedding credentials. AKS workload identity allows pods to authenticate as Azure managed identities, getting scoped access to Azure Key Vault, Storage, SQL, and other services. This eliminates client secrets in environment variables or Kubernetes secrets, replacing them with short-lived tokens that are rotated automatically.
Private cluster networking
Production AKS clusters should use private cluster configuration: the Kubernetes API server has no public IP endpoint, API access requires network adjacency (VPN, ExpressRoute, or Azure Bastion). Combined with Azure Container Registry with private endpoint (no public image pulls) and an internal load balancer for ingress, a properly configured production AKS cluster has no publicly accessible surfaces except those explicitly required.
Upgrade management
AKS supports in-place cluster upgrades and node pool upgrades. The operational pattern that works at scale: run node pools on N-1 of the latest Kubernetes version, upgrade system node pool first, validate, then upgrade user node pools. Use PodDisruptionBudgets to ensure rolling upgrades do not violate minimum availability. Automate upgrade validation with a canary environment that tracks current version and fails if post-upgrade smoke tests do not pass.