AWS re:Invent 2020 was an online event in December, and now we can see the engineering impact of the announcements in 2021 as services become widely available and adopted in production
The AWS Graviton2 processor is gaining traction, with M6g, C6g, and R6g instance families reaching broad availability in 2021, offering 40% better price/performance than equivalent x86 instances for workloads without x86 dependencies
For most compiled languages and containerised workloads, Graviton2 provides an immediate cost reduction with no application changes, thanks to the mature Docker multi-arch build ecosystem supporting arm64 images without code changes
In our experience with migrating 500 containers to Graviton2, we saw a 32% reduction in total compute costs, with the only significant effort being the validation of arm64 image builds for a handful of third-party libraries, which was completed in under two weeks using tools like GitHub Actions
Amazon EMR on EKS is another significant development, allowing Spark and Hive workloads to run on existing EKS clusters without separate EMR clusters, which means unified infrastructure and cost sharing of Kubernetes node capacity for organisations already running Kubernetes
This pattern is increasingly viable for organisations with centralised Kubernetes operations, as Spark job submission can be done via the EMR API without managing a separate Hadoop cluster
We have seen this approach work well with 20-node EKS clusters running 100 concurrent Spark jobs, using tools like Prometheus and Grafana for monitoring and Kubernetes Dashboard for node management, resulting in a 25% reduction in overall cluster management overhead
Additionally, using Amazon EMR on EKS also simplifies data processing pipeline management, as it allows for the integration with other AWS services like S3 and Glue, which can be managed using AWS Lake Formation, making it easier to manage data governance and access policies
Using Lake Formation, we were able to reduce the time it takes to provision new data sets from 2 weeks to 2 days, by automating the process of creating and managing access policies, and integrating it with our existing identity and access management system using AWS IAM
AWS also announced ECS Anywhere and EKS Anywhere, enabling customers to run AWS container services on on-premises hardware and other clouds, similar to Azure Arc, providing cloud-managed container orchestration for organisations with on-premises compute
Lake Formation, which became generally available in 2019 and was significantly improved in 2020-2021, provides column-level security and governed access to data lake resources in S3 and Glue, making it a powerful tool for data governance
With Lake Formation, a central data platform team can manage access policies, while data consumers get access to exactly the columns they are authorised for, without relying on all-or-nothing S3 bucket policies at the bucket or prefix level