Google released Gemini 1.5 Pro with a one million token context window in February and has been expanding access since. At that scale, context is not just a bigger box. It changes the architecture of what AI applications can do.
What one million tokens means practically
One million tokens is roughly 750,000 words, about seven novels, or a codebase with hundreds of files all in context at once. Previous models topped out at 128,000 tokens for the largest context variants. Going from 128K to 1M is not incremental. It removes entire categories of architectural constraint.
With smaller context windows, the standard solution for "I need the model to know about my whole codebase" was RAG: retrieve the relevant pieces and inject them into the prompt. RAG works but it introduces retrieval latency, depends on embedding quality, and can miss things if the retrieval misses them. With a million token context, you can potentially just put the whole codebase in the prompt and let the model find what it needs.
The document processing case
The immediate practical use case is document processing. Financial reports, legal contracts, technical specifications, research papers: many of these run to tens or hundreds of thousands of words. With previous models, you had to chunk them and process pieces, losing the ability to reason across the whole document at once.
With Gemini 1.5 Pro, you can give it a year of meeting transcripts and ask it to identify recurring themes, or give it a 300-page contract and ask it to find clauses that conflict with each other. The model can reason across the entire document simultaneously rather than in segments.
The cost caveat
Large context windows are expensive. Filling a million token context and generating a response costs significantly more than a typical short-context query. For use cases where you genuinely need the full document in context, it is worth paying. For use cases where good retrieval would get you 95% of the answer at 1% of the cost, RAG is still the right architecture.
The million token context window is a capability to deploy strategically, not a replacement for thoughtful retrieval design. Use it for the problems that genuinely require whole-document reasoning. Use RAG for the problems where targeted retrieval is sufficient.