Kubernetes 1.20 Production Cluster Changes

Kubernetes 1.20 landed in December 2020 with 43 enhancements. For those running production clusters, two changes stand out: the deprecation of Dockershim and improvements to how nodes shut down gracefully.

The Dockershim deprecation is a significant signal. This component enabled Kubernetes to use Docker as its container runtime. While its actual removal was pushed back from 1.22 to 1.24, the deprecation itself meant production clusters needed to plan a move to containerd or CRI-O. For most, this migration is invisible. Images built with Docker work the same way on containerd. The impact is primarily felt by clusters relying on Docker-specific kubelet features.

I have seen this migration play out in production clusters. For example, a team I worked with had to migrate 500 nodes from Docker to containerd. We chose containerd over CRI-O due to its wider adoption and better support for GPU acceleration, which was critical for our machine learning workloads. The migration itself took around 6 weeks, with an average downtime of 30 minutes per node. During this time, we also upgraded our Kubernetes version from 1.18 to 1.20, which added an extra layer of complexity to the migration process.

Graceful node shutdown is a welcome addition in 1.20. Previously, when a node shut down via systemd, pods were terminated abruptly. This new feature means Kubernetes now signals pods to terminate gracefully. This gives applications a chance to clean up connections or drain ongoing requests, rather than being killed mid-operation. Implementing this requires configuration in kubelet and integration at the node level with systemd.

In our production environment, we have seen a significant reduction in failed requests due to node shutdowns. Before 1.20, we would see around 5% of requests fail when a node went down. After implementing the graceful shutdown feature, this number dropped to less than 1%. We achieved this by setting the termination grace period to 30 seconds, which gave our applications enough time to clean up and drain ongoing requests. We also had to modify our systemd configuration to send a SIGTERM signal to the kubelet before shutting down the node.

CronJobs also graduated to General Availability in version 1.20. After years in beta, this move to the stable batch/v1 API brings more reliability. Enhancements to how concurrency is handled and deadlines are managed make it a more reliable choice for scheduling periodic batch tasks within Kubernetes. We have been using CronJobs in production for over a year now, and the new features in 1.20 have made them even more reliable. For example, we use CronJobs to run daily backups of our database, and the new concurrency features have ensured that these backups run smoothly even during peak hours.

CronJobs are the right tool for recurring jobs that need to leverage Kubernetes' scheduling, isolation, and monitoring capabilities. Their promotion to GA solidifies their place in production workflows. However, it's worth noting that CronJobs can be resource-intensive, especially if you have a large number of jobs running concurrently. We have seen this in our production environment, where we have over 100 CronJobs running daily. To mitigate this, we have had to implement resource quotas and limits on our CronJobs to prevent them from overwhelming our cluster.

Looking ahead, Kubernetes 1.21 introduced immutable Secrets and ConfigMaps, a feature previewed in 1.20. Marking these resources as immutable prevents accidental modifications. This also helps reduce the load on the API server, especially in large clusters where it constantly watches for changes to Secrets. The operational trade-off is that patterns relying on updating ConfigMaps for rolling configuration updates will need to stick with mutable ConfigMaps.