Mistral AI released Mistral 7B in September with a claim that was initially received with scepticism: a 7 billion parameter model that outperforms Llama 2 13B on all benchmarks. Independent testing confirmed it. The open source AI landscape changed.
Why smaller can be better
The conventional wisdom was that larger models were better models. Mistral 7B demonstrated that training efficiency and data quality can produce a model that outperforms larger models trained less carefully. Mistral's team published a technical report describing the architectural choices: sliding window attention for efficient processing of long sequences, grouped query attention for faster inference, and careful filtering of training data. The result is a model that is faster, cheaper to run, and more capable than models 2x its size.
The Apache 2.0 licence
Mistral released the weights under Apache 2.0 with no usage restrictions. Commercial use, fine-tuning, redistribution: all permitted without conditions. This is meaningfully more permissive than Llama 2's licence, which restricts use for services with over 700 million monthly active users. For enterprises building on open source models, Apache 2.0 removes the legal complexity.
The fine-tuning ecosystem
Mistral 7B became the preferred base model for fine-tuning experiments almost immediately after release. The combination of strong base capability, small size (fits on a single consumer GPU), and permissive licence made it the go-to model for instruction tuning, function calling training, and domain specialisation. The instruction-tuned Mistral 7B Instruct was released alongside the base model and is competitive with GPT-3.5 Turbo for instruction-following tasks.
What it means for enterprise AI
A 7B model running on a single consumer GPU that beats models requiring multiple GPUs is a fundamental shift in the infrastructure cost of on-premises AI. For classification, extraction, and routing tasks, Mistral 7B provides the justification for running local inference at scale. The cloud API vs on-premises calculation now has a credible on-premises option that is not just cost-competitive but operationally practical.