OpenAI made GPT-4o available to free ChatGPT users on May 13th. The free tier had GPT-3.5. Now it has GPT-4o. That is a significant capability jump for the largest user base on any AI platform, and the downstream implications for how AI gets perceived and used are real.

What GPT-4o actually is

The "o" stands for omni. GPT-4o is a single model that handles text, images, and audio natively, not three separate models stitched together. It can look at an image and answer questions about it, transcribe and respond to voice in real time with appropriate pacing and tone, and process text with GPT-4 class intelligence. The demo where the model tutored someone through a maths problem using live camera input was the "Her" moment people talked about.

On benchmarks, GPT-4o performs similarly to GPT-4 Turbo on text tasks while being significantly faster and cheaper. For API users, the cost reduction is meaningful: GPT-4o is priced at roughly half of GPT-4 Turbo on input tokens and a third on output.

For API developers

The cost improvement is the headline for developers building on the API. Applications that were GPT-3.5 class because GPT-4 was too expensive to run at volume now have a GPT-4 class option at closer to GPT-3.5 pricing. That reopens the conversation about which tier of model to use for various tasks.

The multimodal capability is available through the API with the same model endpoint. You do not switch to a different model for image understanding. You send the same model a message with an image attachment. That simplicity reduces integration complexity for applications that need both text and vision.

The broader access effect

Putting GPT-4 class intelligence in the hands of every free ChatGPT user accelerates the familiarity curve. The gap between what power users knew AI could do and what mainstream users experienced just narrowed sharply. That changes user expectations for every product that uses AI, including enterprise software. The baseline is now higher.