I was surprised when Meta released Code Llama on August 24th - a suite of code-specialised models fine-tuned from Llama 2. At 34B parameters, Code Llama 34B outperforms GPT-3.5 on coding benchmarks and approaches GPT-4 on some tasks.
Code Llama comes in three sizes: 7B, 13B, and 34B. Each size has a base version, an instruction-tuned version, and a Python-specialised version. The instruction-tuned versions accept natural language descriptions and generate code. The base versions are designed for integration into coding tools. All versions are fine-tuned for code completion, code explanation, and code debugging.
I checked the benchmarks and Code Llama 34B scores 53.7% on HumanEval, a benchmark of Python programming problems. GPT-3.5 scores 48.1%, while GPT-4 scores 67%. Code Llama 34B isn't GPT-4 quality but it's definitely better than GPT-3.5 on this benchmark, which was the most widely deployed model in coding tools before GPT-4 Turbo.
One thing to note is that training and deploying large models like Code Llama comes with significant costs. For instance, training a 34B model can require over 1000 GPU hours, which on an A100 GPU would cost around $160,000 based on typical cloud pricing. This makes it challenging for smaller organisations to develop and deploy similar models.
One of the things that caught my attention was that Code Llama 34B runs in 4-bit quantisation on 24GB of VRAM, available in an A10G instance on most cloud providers. For organisations that can't send their proprietary code to an external API due to data classification policies, Code Llama provides a coding assistant that runs entirely on-premises. This unblocks a significant category of enterprise use cases that were previously inaccessible.
The developer tooling ecosystem response was swift - Ollama, LM Studio, and oobabooga TextGen WebUI all supported Code Llama within days of release. IDE integrations via Continue and Tabby, the open source GitHub Copilot alternative, followed. The open source coding assistant ecosystem now looks credible for teams that need local or on-premises deployment.
You might wonder how Code Llama achieves this performance. It's fine-tuned for code completion, code explanation, and code debugging. This makes it a versatile tool for developers. For example, I can see it being used for tasks like automated code review, where it can help identify potential bugs or areas for improvement.
Organisations have been waiting for a model like Code Llama. It provides a level of control and customisation that's hard to find with other models. I've seen cases where organisations had to spend months developing and fine-tuning their own models, only to end up with performance similar to Code Llama's. Now, they can skip that step and get started with a pre-trained model.
Code Llama's release is a significant move by Meta. It shows they're committed to making AI more accessible and useful for developers. The fact that they're open-sourcing the model and allowing the community to build on top of it is a big deal. I've seen firsthand how open-source communities can drive innovation and adoption.
The open source community is already building on top of Code Llama. This kind of collaboration can lead to some amazing innovations.
I'm interested to see how Code Llama evolves. For now, it's an important step forward for AI in coding.