OpenAI dropped GPT-4o on May 13th for free ChatGPT users, replacing the GPT-3.5 model that was there before. This is a huge jump for the largest user base on any AI platform, and the implications are real.
I've seen firsthand how a model upgrade can change the game. At one company, we upgraded from a smaller model to a larger one, and the accuracy improved by 15% on a key task. But we also saw a 30% increase in latency, which was a major issue for our users. With GPT-4o, OpenAI has managed to improve performance while reducing costs, which is a significant achievement.
GPT-4o is the 'omni' model, capable of handling text, images, and audio natively. It's not stitched together from three separate models, like some earlier versions. The demo where it tutored someone through a maths problem using live camera input was the stuff of legend.
In my experience, multimodal models can be challenging to integrate. At one point, I worked on a project that involved combining text and image models, and it took a significant amount of engineering effort to get it working smoothly. With GPT-4o, the simplicity of the API endpoint is a major advantage. For example, I can send a message with an image attachment and get a response in JSON format, which makes it easy to integrate with other systems.
On benchmarks, GPT-4o performs similarly to GPT-4 Turbo on text tasks, but it's faster and cheaper. For API users, the cost reduction is a big deal: GPT-4o is priced at roughly half of GPT-4 Turbo on input tokens and a third on output.
For API developers, the cost improvement is a headliner. Applications that were stuck at GPT-3.5 class because GPT-4 was too expensive to run at volume now have a GPT-4 class option at closer to GPT-3.5 pricing. This opens up the conversation about which tier of model to use for various tasks. For instance, I worked on a project that involved building a chatbot for customer support. With GPT-4o, we can now use a more advanced model without breaking the bank.
The multimodal capability is available through the API with the same model endpoint. You don't need to switch to a different model for image understanding; just send the same model a message with an image attachment. This simplicity reduces integration complexity for applications that need both text and vision.
With GPT-4 class intelligence in the hands of every free ChatGPT user, the familiarity curve accelerates. The gap between what power users knew AI could do and what mainstream users experienced just narrowed sharply. This changes user expectations for every product that uses AI, including enterprise software.
The baseline is now higher, and user expectations are shifting accordingly. This will have a ripple effect on the way product teams approach AI-powered features and integrations.