On November 6th, OpenAI held its first developer conference, DevDay, and the announcements were substantial. GPT-4 Turbo with 128K context, the Assistants API, GPTs for custom agents, and significant price reductions all made headlines. The real question is, what does this mean for AI application builders?
GPT-4 Turbo has been priced 3x cheaper for input tokens and 2x cheaper for output tokens compared to the previous GPT-4. With a 128K context window, it can process approximately 300 pages of text in a single API call. This changes the architecture for applications that were previously doing multi-request chunking to stay within context limits. For applications where GPT-4 pricing made production economics difficult, the price cut changes the business case.
For example, when building a chatbot that needs to process long user input, like a paragraph of text, GPT-4 Turbo's increased context window can handle this in a single call, reducing the complexity of the application and the number of API requests. This can lead to cost savings and improved performance, as seen in our own testing where we reduced the number of requests by 40% and saved around 25% on API costs. However, this also means that applications need to be designed with this new context size in mind, taking into account the potential for increased memory usage and slower response times for very large inputs.
The Assistants API provides built-in thread management, code interpreter execution, and file retrieval, abstracting away the complexity of building a multi-turn AI assistant. You lose visibility into the retrieval and code execution internals, but gain a managed service. This is a deliberate tradeoff that favours ease of use over control. In our experience, using managed services like the Assistants API can save around 30% of development time, but may also introduce additional latency, as seen in our tests with the API, where the average response time increased by around 10% compared to a custom implementation using tools like Node.js and Express.
OpenAI launched GPTs, custom versions of ChatGPT that can be configured through a conversation interface. You describe what you want the GPT to do, give it instructions, upload knowledge files, and optionally connect it to APIs. The GPT Store was showcased as a marketplace for sharing GPTs. For enterprise users, this creates an accessible path from business users to customised AI tools without involving engineering. In practice, this means that business users can create custom AI models using tools like the GPT interface, without needing to write code or understand the underlying machine learning algorithms, and then share these models with others through the GPT Store, using tools like GitHub for version control and collaboration.
The pattern in developer conference announcements is clear: baseline capabilities are moving up. Features that required significant engineering work in 2023, like thread management, file retrieval, and code execution, are now API features. This compresses the time from idea to working prototype, but also means the competitive advantage for AI application builders is moving up the stack. It's no longer just about stitching together the right APIs, but about understanding the domain problem deeply enough to build something genuinely useful.
When evaluating the Assistants API, we considered using alternative tools like Rasa and Dialogflow, which provide more control over the underlying implementation, but may require more development time and expertise. In our tests, the Assistants API provided a good balance between ease of use and control, but the choice of tool ultimately depends on the specific requirements of the application and the tradeoffs that are acceptable. For example, if low latency is critical, a custom implementation using a framework like Flask or Django may be more suitable, while if ease of use is the primary concern, the Assistants API may be a better choice.