DALL-E 3 launched inside ChatGPT Plus in early October 2023. The integration of a text-to-image model into a conversational interface changed how most people experience image generation. The improvement over DALL-E 2 is substantial.
What improved
DALL-E 3 handles text in images, complex compositions, and nuanced prompts far better than its predecessor. DALL-E 2 required prompt engineering expertise to produce consistent results. DALL-E 3, accessed through ChatGPT, lets you describe what you want in natural language and iterate conversationally. If the first image is not right, you describe the change you want, the model refines the prompt, and generates a new version. The conversational iteration model reduces the prompt engineering burden significantly.
The safety approach
DALL-E 3 has stricter refusals than DALL-E 2 for realistic depictions of real people and for content that OpenAI considers harmful. Artists can opt their work out of DALL-E training data through a form. These choices reflect a different philosophy than the open release approach that Stability AI took with Stable Diffusion. Whether the trade-off between safety restrictions and capability freedom is the right one is a live debate in the AI community.
Midjourney comparison
Midjourney v5, the leading alternative at the time, produces more aesthetically striking images by default. DALL-E 3 produces more accurate renderings of detailed prompts. These are different strengths for different use cases: Midjourney for artistic and visual exploration, DALL-E 3 for accurate representation of specified content like diagrams, illustrations, and product visualisations.
Creative industry implications
The rapid improvement curve in AI image generation is compressing the timeframe in which stock photography libraries can maintain pricing. Custom illustration work for content marketing, which previously required a freelance illustrator and several days of turnaround, can now be done in minutes. The creative professionals who are thriving are using these tools to increase output volume and speed, not competing with them on commodity image production.