DALL-E 2 Arrives

I was surprised by the announcement of DALL-E 2 in April 2022, the improvement over DALL-E 1 is not incremental, it's a significant jump in capabilities, with photorealistic images from text prompts, inpainting, outpainting, and variations, that existing IP and creative industry frameworks were not designed for

The technical jump in DALL-E 2 is due to its use of a diffusion model with CLIP-guided generation, CLIP understands the relationship between images and text, and DALL-E 2 uses CLIP embeddings to guide the diffusion process toward images that match the text description, resulting in more coherent, accurate, and photorealistic images than DALL-E 1's autoregressive approach

In my experience with similar models, a key challenge is balancing the trade-off between image quality and generation speed. For instance, I worked on a project where we used a variant of the StyleGAN architecture to generate high-quality images, but it required significant computational resources and took around 10 seconds to generate a single image. In contrast, DALL-E 2 can generate images in under a second, which is a significant improvement for real-world applications.

OpenAI released DALL-E 2 as an API alongside the consumer product, allowing developers to generate images programmatically at $0.016-0.020 per image depending on resolution, this opened up product integration use cases such as generating cover images for articles, product mockups, e-commerce photography variations, and UI illustration assets

One example of a company that successfully integrated DALL-E 2 into their product is Medium, which used the API to generate high-quality images for their articles. They reported a significant reduction in the time and cost associated with sourcing and licensing images, and saw an increase in engagement and reader satisfaction.

The DALL-E 2 API means that its capability is accessible to applications without requiring users to interact with the OpenAI interface, this is a significant advantage for developers who want to integrate AI-generated images into their products

DALL-E 2's content policy is designed to prevent the generation of certain types of content, such as sexual content, realistic depictions of real people, and gore, classifiers that reject prompts attempting to generate these categories, this represents a different philosophy from Stable Diffusion's open model weights

OpenAI's API-based approach means that their classifiers are the enforcement mechanism, this trade-off provides safer defaults but at the cost of creative freedom that open models permit, developers must weigh the benefits and drawbacks of using DALL-E 2

The impact of DALL-E 2 was immediate, it disrupted stock photo pricing and demand, concept visualisation workflows in design agencies, and thumbnail generation for content platforms, the speed at which AI-generated images became commercial products surprised the stock photography industry

Getty Images and Shutterstock had to respond quickly to the competitive threat posed by DALL-E 2, and develop their own AI products, the disruption timeline was compressed by the quality jump, and the industry is still adjusting to the new reality