OpenAI came out Codex in August 2021 via the API and it is the model powering GitHub Copilot. Codex is a GPT-3 model fine-tuned on code from GitHub. Its capabilities set the baseline for what code generation models can do.
What Codex can do
Codex can write code from natural language descriptions, translate between programming languages, explain code in plain English, complete partial code, and generate tests for existing functions. The capabilities are strongest in Python (the dominant language in the training data) and progressively weaker in less-represented languages. For well-specified tasks with clear inputs and outputs, Codex generates working code at a rate that surprised most developers who tried it.
The context window limitation
Codex operates within a context window of 2048 tokens in the initial release (later extended). This means it can only consider a limited amount of surrounding code when generating completions. For large codebases, this limits how much context the model can use. The integration layer (how the IDE decides what code context to send to the model) matters as much as the model capability for real-world usefulness.
Multimodal code understanding
Codex demonstrates that code is learnable by language models because code has a more constrained vocabulary and stricter syntactic rules than natural language. A syntactically correct Python function is easier to evaluate than a natural language paragraph. The model learns patterns from the enormous corpus of existing code that captures common algorithms, API usage patterns, and idiomatic style.
What it does not replace
Codex does not understand the business domain, the performance constraints, the security requirements, or the architectural context of the code it is writing. It generates plausible code for stated requirements. The gap between plausible and correct widens as requirements become more domain-specific. Software engineering is more requirements clarification, architectural decision-making, and testing than it is code authoring.