AWS, Azure, and Google Cloud all reported strong Q1 2024 earnings driven by AI workloads. Beneath the revenue lines is a genuine infrastructure build-out that is reshaping what data centres look like.
The major hyperscalers spent 2023 and early 2024 in a GPU capacity crunch. Nvidia's H100 backlog stretched to 6-12 months for some customers. The hyperscalers consumed most of the production capacity to stock their own data centres. This is why they built customer-facing GPU cluster products: to give developers access to hardware they could not buy themselves.
For example, a single Nvidia H100 GPU can cost upwards of 30000 dollars, which is a significant investment for many companies. In contrast, hyperscalers can negotiate better prices due to their massive scale, with some estimates suggesting they pay around 15000 dollars per unit. This economy of scale allows them to offer competitive pricing for their GPU cluster products, making it more attractive for developers to use their services.
All three clouds have custom AI accelerators in production. Google's TPUs have been in use since 2016 and power Gemini inference at scale. AWS Trainium and Inferentia are designed for training and inference respectively, with pricing structured to undercut Nvidia-based instances for sustained workloads. Azure's Maia 100 chip and its Cobalt ARM CPU are now in production.
Custom silicon gives hyperscalers control over their cost per token, which determines AI product economics. This is the long game for the hyperscalers. To achieve this, they have invested heavily in research and development, with Google's TPU team, for instance, comprising hundreds of engineers and researchers. The payoff is significant, with custom silicon allowing hyperscalers to optimize their AI workloads for specific tasks, such as natural language processing or computer vision.
Training large models requires moving hundreds of terabytes between GPUs thousands of times per run. The networking fabric connecting GPUs matters as much as the GPUs themselves. Google's Tensor Processing Unit pods use custom high-bandwidth interconnects. AWS uses Elastic Fabric Adapter and custom network switching. Azure's Eagle network and InfiniBand clusters are purpose-built for distributed training. For instance, AWS's Elastic Fabric Adapter can deliver up to 400 Gbps of bandwidth, which is significantly higher than traditional networking solutions.
This is not commodity data centre networking. It is a custom infrastructure layer that took years to build and is very hard to replicate. In fact, building a similar infrastructure would require significant investment in both hardware and software, with some estimates suggesting it could cost tens of millions of dollars. Furthermore, the complexity of managing such a system is extremely high, requiring specialized teams with expertise in areas such as distributed systems and high-performance computing.
The infrastructure gap between hyperscalers and on-premises is growing. The cost and complexity of building the GPU clusters, custom networking, and custom silicon that power competitive AI workloads is beyond most enterprises. According to a recent survey, over 70 percent of companies have abandoned their on-premises AI initiatives due to the high costs and complexity involved, opting instead to use cloud-based services from hyperscalers.
The hyperscaler model of renting this infrastructure is becoming more compelling, not less, as AI workloads grow. The strategic question is no longer whether to use cloud for AI but which cloud's specific AI infrastructure investments align with your stack. Companies such as Hugging Face and Stability AI have already made the switch, leveraging hyperscalers' infrastructure to build and deploy their AI models, and achieving significant cost savings and improved performance as a result.