Twelve months ago, most people had never used an AI system outside of a chatbot or a Spotify recommendation. By December 2023, your organisation probably had an AI policy, your IDE had an AI plugin, and you had used at least one LLM for real work.

ChatGPT's user base grew to 100 million faster than any consumer application in history. This single fact restructured the technology industry's priorities more than any product launch since the iPhone. Every major software company reallocated resources to AI in 2023. Products that had no AI roadmap in January had AI roadmaps in June.

Enterprise AI adoption in 2023 followed a recognisable pattern. Leadership mandates exploration, engineering evaluates tools, procurement negotiates contracts, and security and legal review policies. Most large organisations are in one of the middle stages. The companies that finished all of them are seeing measurable productivity gains in developer tooling, content operations, and customer service automation.

For example, I have seen companies like GitHub and Google implement AI-powered code review tools that use models like Codex to suggest improvements and detect bugs. These tools have reduced the time spent on code review by up to 30% and have improved the overall quality of the code. Similarly, companies like Meta and Amazon have implemented AI-powered content generation tools that use models like Llama 2 to generate high-quality content, reducing the workload of human content creators by up to 25%.

The model capability curve in 2023 was steeper than any previous year. GPT-4 was released in March, Claude 2 in July, Llama 2 in July, Claude 2.1 in November, and Gemini in December. The practical implication is that applications built on GPT-3.5 at the start of the year are already running on a generation-old model. Model selection is now a recurring engineering decision, not a one-time choice.

In terms of model selection, I have seen companies struggle with the trade-offs between different models. For instance, GPT-4 offers better performance but at a higher cost, while Claude 2 offers better interpretability but at a lower performance level. The choice of model depends on the specific use case and the requirements of the application. Companies must also consider the cost of retraining and redeploying models as new versions are released, which can be a significant overhead. For example, retraining a model like Llama 2 can cost upwards of $100,000 and require significant computational resources.

The 2023 narrative was about capability. The 2024 narrative will be about reliability, cost, and integration depth. Applications that handle hallucinations gracefully, that know when to call a model and when to use a rule, and that have observability into model behaviour will outlast the applications that were built quickly on top of raw API calls.

Organisations are at different stages of AI adoption. Some are still exploring, while others have already seen productivity gains. The companies that have finished all the stages are seeing measurable results.

In order to achieve reliability and integration depth, companies must invest in tools like Apache Airflow, AWS Step Functions, or Zapier to manage the workflow and integrate AI models with other systems. They must also implement monitoring and logging tools like Prometheus, Grafana, or New Relic to observe model behaviour and detect issues. Furthermore, companies must develop strategies to handle edge cases and outliers, such as using techniques like data augmentation or transfer learning to improve model robustness.

The rapid progress in AI capabilities has created new challenges. Applications built on older models are now outdated. Engineers must now consider model selection a recurring decision.

As AI becomes more mainstream, organisations must consider reliability, cost, and integration depth. The applications that can handle these challenges will be the ones that succeed in the long run.