I was surprised to see platform engineering make its debut on the Gartner Hype Cycle in 2023. It addresses a pressing issue many medium-to-large organisations encounter: the sprawl of DevOps tools.
DevOps promised developers self-service infrastructure, but what many organisations got was 20 different tools, each team configuring them differently. Developers now spend a significant amount of time on infrastructure rather than product. The promise of self-service turned into the burden of self-management.
Platform engineering is about building an internal developer platform that offers self-service without requiring teams to understand the full infrastructure stack. This discipline aims to simplify the developer experience.
For instance, I have seen teams reduce their average service deployment time from 3 days to just 30 minutes by implementing standardized golden path templates and automated deployment pipelines. This not only improves the developer experience but also reduces the likelihood of human error and increases overall efficiency.
An internal developer platform typically consists of several key components. There's a developer portal, often Backstage, for discovering services and documentation. A service catalog maps what runs where. Golden path templates help bootstrap new services with standardised tooling. An environment provisioning system, such as Crossplane or Terraform Cloud, allows for infrastructure creation through code. A deployment pipeline with guardrails prevents common misconfigurations.
In my experience, the choice of environment provisioning system can have significant implications on the overall cost and complexity of the platform. For example, using Terraform Cloud can provide a more streamlined experience, but may also introduce additional costs, whereas using Crossplane may require more upfront investment in custom development, but can provide more flexibility and cost savings in the long run.
Backstage, Spotify's open-source developer portal, has become the de facto standard for the portal layer of an internal platform. The Cloud Native Computing Foundation graduated Backstage in 2022. Its plugin ecosystem now covers most common integrations, including Kubernetes, CI/CD systems, cloud cost dashboards, incident management, and API documentation.
The maturity of Backstage has largely answered the build vs buy question for developer portals. Teams no longer need to debate whether to build their own portal or purchase one. However, this also means that teams need to carefully consider the trade-offs between using a standardized portal like Backstage versus customizing it to meet their specific needs, which can be a time-consuming and costly process.
Furthermore, implementing a platform engineering approach requires significant upfront investment in tooling and process development, with some teams reporting spending upwards of 6 months and $200,000 to get their internal platform up and running. However, the long-term benefits can be substantial, with some teams reporting reductions in infrastructure costs of up to 30% and improvements in deployment frequency of up to 50%.
Platform engineering does not replace DevOps; it changes the division of labour. The platform team builds and maintains tools and pipelines, while product teams use these tools without needing to understand the underlying Kubernetes operators, Terraform modules, and CI/CD templates.
The responsibility for infrastructure reliability is shared differently in platform engineering. The platform team owns platform abstractions, and product teams own their application code. This division of responsibility aims to improve overall reliability and efficiency. For example, by using tools like Prometheus and Grafana, platform teams can provide product teams with real-time monitoring and alerting capabilities, allowing them to quickly identify and resolve issues before they become critical.
The key to successful platform engineering is finding the right balance between standardization and flexibility. By providing a set of standardized tools and processes, platform teams can simplify the developer experience and improve overall efficiency, while still allowing product teams the flexibility to meet their specific needs and requirements.