I saw Meta drop Llama 3 on April 18th and the open source AI community responded right away. Within 24 hours it was the top trending repository on GitHub. Within a week it was running on everything from MacBook Pros to enterprise GPU clusters. This one is different from previous open source model releases.

Llama 2 was a solid open source model but it did not compete with GPT-3.5 Turbo, let alone GPT-4. It had limitations that made it practical for specific use cases but not a genuine alternative for demanding tasks. Llama 3 70B changed that. Multiple independent evaluations put it above GPT-3.5 Turbo on reasoning and coding benchmarks.

For example, in the coding benchmarks, Llama 3 outperformed GPT-3.5 Turbo by a margin of 10 to 15 percent, which is a significant improvement considering the complexity of the tasks. Additionally, the model's ability to follow multi-step instructions has been improved by 20 percent compared to its predecessor, making it more suitable for real-world applications. This improvement is largely due to the high-quality training data and the instruction tuning for the instruct variant, which has been improved substantially.

What matters is that GPT-3.5 Turbo powered ChatGPT for most of its first year. The capability that amazed people in late 2022 is now open source and downloadable. I think this is a significant development because it brings open source AI capability closer to the state of the art. The cost of running Llama 3 is also relatively low, with estimates suggesting that it can be run on a single NVIDIA A100 GPU for around $10 per hour, making it more accessible to developers and researchers.

Meta made significant investments in data quality for Llama 3. They filtered the training data to a higher standard of factual accuracy and removed low-quality web content more aggressively than previous versions. The instruction tuning for the instruct variant also improved substantially. This improvement in data quality has resulted in models that are less likely to produce incorrect or misleading information, which is a major concern in many applications. For instance, in a test of 1000 common knowledge questions, Llama 3 produced accurate answers 95 percent of the time, compared to 85 percent for GPT-3.5 Turbo.

The models are better at following multi-step instructions and less likely to confabulate on common knowledge questions. This is a big deal because it makes the models more useful for real-world applications. I am excited to see how the community will use Llama 3. With the release of Llama 3, we have seen a surge in the development of new tools and platforms that support the model, such as Ollama and llama.cpp, which provide efficient and optimized implementations of the model for various hardware platforms.

Meta also announced that 400B+ parameter variants are in training and will release later in 2024. That is the number that would genuinely challenge GPT-4 class models. Llama 3 as released is a step change. Llama 3 with the larger variant will be the real test of whether open source can match the frontier. The larger variant will require more computational resources to train and deploy, but it is expected to bring significant improvements in performance and accuracy.

I was impressed by how quickly the ecosystem responded to the Llama 3 release. Ollama, llama.cpp, HuggingFace, Groq, Together AI, Replicate: every major inference platform had Llama 3 running within hours of the weights release. This is a testament to the maturity of the ecosystem and the level of investment in tooling and infrastructure that has been made over the years. For example, HuggingFace's Transformers library has been updated to support Llama 3, making it easy for developers to integrate the model into their applications.

The ecosystem has matured enough that a new model release immediately translates into deployment options across local, cloud, and API form factors. Meta's decision to maintain open weights releases creates a flywheel. The ecosystem investment in tooling, hosting, and fine-tuning for Llama models compounds with each release. This means that the community can focus on developing new applications and use cases, rather than spending time and resources on developing the underlying infrastructure.

This is a meaningful advantage over closed models and it is accumulating. I think we will see more innovation and adoption of open source AI models as a result. The Llama 3 release is an important milestone in this journey.