Meta released Llama 3 on April 18th with 8 billion and 70 billion parameter variants. Two weeks in, the enterprise implications are clearer. The 70B model outperforms every previous open source model and trades blows with GPT-3.5 Turbo on several benchmarks. The 8B model is fast enough to run locally. Together they change the enterprise open source AI conversation.
The benchmark reality
On MMLU, the academic reasoning benchmark, Llama 3 70B scores 82%. GPT-3.5 Turbo scores around 70%. Llama 3 70B consistently outperforms it across reasoning tasks. On coding benchmarks, it approaches GPT-4 class performance on simpler tasks while falling short on complex multi-step problems. For a freely downloadable model, that is a significant capability threshold.
The 8B variant scores around 68% on MMLU, comparable to earlier versions of GPT-3.5. For tasks where GPT-3.5 was "good enough", the 8B model is now a viable open source alternative that you can run on a single A100 GPU or quantised on a Mac Studio.
The enterprise use case for open source models
Three drivers push enterprises toward open models. First, data privacy: with a model running on your own infrastructure, sensitive data does not leave your environment. For financial services, healthcare, and government, this simplifies compliance conversations significantly. Second, cost at scale: once you own the weights, the marginal cost of inference is compute only, no per-token API charges. At high volume, that maths shifts decisively. Third, customisation: fine-tuning on proprietary data is more practical when you own the weights.
The counterargument has always been capability gap: closed models were meaningfully better. Llama 3 narrows that gap enough that the capability argument weakens for a large category of enterprise tasks.
Where the gap remains
Llama 3 70B is not GPT-4. For complex multi-step reasoning, nuanced code generation, and tasks requiring broad world knowledge, the closed frontier models still win. The practical question for each use case is whether the task requires GPT-4 class capability or whether GPT-3.5 class capability, now available as open source, is sufficient.
Most enterprise AI applications are not pushing the frontier. They are classifying documents, extracting structured data, generating first drafts, and answering queries from a knowledge base. For that work, Llama 3 is now a serious option.