AKS's general availability in June 2018 was just the beginning. The real story is how Microsoft matured the service into a production-grade platform in 2019. The result is a credible enterprise Kubernetes platform that's worth serious consideration.
One of AKS's most appealing features is the free control plane. You pay only for the worker node VMs, which makes small clusters economical. The trade-off is a shared responsibility model, where Microsoft manages the control plane and you can't directly access or customise it.
In practice, this shared responsibility model has worked out well for many of my clients, with Microsoft handling the control plane updates and security patches, freeing up internal teams to focus on application development and deployment. For example, I've seen clients use tools like Terraform to manage their AKS clusters, defining the desired state of their cluster configuration and letting Terraform handle the creation and updates of the cluster resources.
Enterprise-grade auth is another key area where AKS shines. Azure Active Directory integration allows Kubernetes RBAC to be backed by Azure AD groups and users, eliminating the need for Kubernetes-specific user management and integrating with corporate identity governance. This integration has been particularly useful for larger enterprises, where identity management is a major concern, and has allowed them to use existing Azure AD groups to control access to their AKS clusters.
For workloads with burst requirements or batch jobs, AKS's Virtual Nodes are a significant advantage. Powered by Azure Container Instances, Virtual Nodes allow for near-instant scale-out without pre-provisioned capacity, and start in seconds rather than waiting for cluster autoscaler to provision new VMs. In one case, I saw a client use Virtual Nodes to handle a sudden spike in traffic, scaling from 10 to 100 pods in a matter of minutes, and then scaling back down just as quickly when the traffic subsided.
When it comes to upgrades, AKS requires some planning in 2019. The process involves upgrading the control plane first, then moving to node pools by cordon-and-drain cycling through each node. The upgrade is in-place, but you need to configure PodDisruptionBudgets correctly to prevent upgrade-time pod disruption. I've found that using tools like Kubernetes Dashboard or kubectl to monitor the upgrade process and validate application health is crucial to ensuring a smooth transition.
Before you upgrade, test the process in a non-production cluster first, and upgrade node pools during low-traffic windows. Validating application health after each node upgrade is crucial to ensure a smooth transition. In my experience, it's also important to have a rollback plan in place, in case something goes wrong during the upgrade process, and to use tools like Azure Monitor to track the performance of your cluster during and after the upgrade.