Kubernetes 1.28 came out in August 2023 with more incremental improvements. What's bigger than this release is what the Kubernetes ecosystem looks like five years after its 1.0 version: it's mature, wide, and getting more specialised.

The 1.28 version promoted several features from alpha and beta stages. These include better node lifecycle management, sidecar containers moving closer to stability, and networking enhancements. The standout feature here is native support for sidecar containers.

Sidecar containers run alongside the main container in a pod. Before, they couldn't define their startup order. With native sidecar support, you can now declare a container as a sidecar. Kubernetes will then start it before the main container and keep it running until after the main container exits.

For example, when using sidecar containers to handle logging or monitoring, it's crucial to ensure they start before the main application container. In a real-world scenario, this can be achieved by using tools like Fluentd or Vector as sidecars to collect logs from the main application, and then forward them to a logging backend like Elasticsearch or Splunk. This approach allows for more efficient log collection and processing.

The Cloud Native Computing Foundation's landscape document lists over 1,000 projects. For engineering teams, the question is no longer if they should use Kubernetes, but which Kubernetes-related tools to standardise on. This includes service mesh options like Istio, Linkerd, and Cilium.

Other areas with established patterns are GitOps, with tools like Flux and ArgoCD, and secrets management, with Vault and ESO. Observability tools include OpenTelemetry, Prometheus, and Grafana. The decision now is which combination works best for an organisation's capabilities and compliance needs. When evaluating these tools, it's essential to consider factors like scalability, performance, and integration with existing systems. For instance, a large-scale e-commerce platform may require a highly scalable service mesh like Istio to handle thousands of requests per second, while a smaller application may be able to use a simpler service mesh like Linkerd.

When it comes to service mesh, the choice between Istio and Linkerd can be a trade-off between features and complexity. Istio offers more advanced features like traffic management and security, but it can be more challenging to set up and manage. Linkerd, on the other hand, is generally easier to use but may not offer all the features required by a large-scale application. In our experience, the choice between these two service meshes depends on the specific requirements of the application and the expertise of the team. For example, if the team is already familiar with Istio, it may be a better choice, but if the team is new to service mesh, Linkerd may be a better starting point.

Most teams running Kubernetes in production use a managed service like EKS, AKS, or GKE. These services usually lag behind the upstream version, with a 3-6 month delay. What matters most is not the upstream 1.28 release, but when these services support it and the upgrade impact on workloads. In our experience, the upgrade process can be complex, especially when dealing with large-scale applications. It's essential to carefully plan and test the upgrade to ensure minimal downtime and no impact on the application's functionality.

The operational burden of running Kubernetes has significantly decreased as managed services have matured. This change has allowed teams to focus more on their applications and less on cluster management. However, this also means that teams need to be more aware of the costs associated with running a managed Kubernetes service. For example, the cost of running a large-scale application on EKS can be significant, with costs ranging from $10,000 to $50,000 per month, depending on the number of nodes and the type of instances used.

The growth of Kubernetes has led to a new challenge: too many ways to accomplish the same tasks. In response, organisations are building internal developer platforms on top of Kubernetes. These platforms abstract away cluster management, making it easier for developers to focus on their work. For instance, a developer portal like Backstage can provide a self-service experience for developers, allowing them to easily create and manage applications without requiring extensive Kubernetes knowledge.

A mature internal platform in 2023 might include a developer portal like Backstage, infrastructure provisioning with Crossplane, application deployment with Helm or Kustomize, and a GitOps pipeline. This approach helps streamline development and deployment processes. By using these tools, organisations can reduce the complexity associated with Kubernetes and improve the overall efficiency of their development teams. For example, a company like Netflix may use a combination of these tools to manage its large-scale applications, allowing developers to focus on writing code rather than managing infrastructure.

By standardising on a set of tools and processes, organisations can reduce the complexity associated with Kubernetes and improve the overall efficiency of their development teams. This approach also allows for better scalability and reliability, as the platform can be easily replicated and managed across multiple environments. In our experience, the key to a successful internal developer platform is to start small and iterate, gradually adding more features and tools as the platform matures.