AutoGPT went viral in April 2023. An open source project by Toran Richards, it chains GPT-4 calls into a loop where the model defines sub-goals, executes them using tools (web search, code execution, file management), and iterates until a goal is complete. The enthusiasm and the reality are both instructive.

What AutoGPT demonstrated

AutoGPT showed that a loop of: plan a step, use a tool to execute it, observe the result, plan the next step, powered by GPT-4, could autonomously complete non-trivial tasks given enough time. It could research topics, write code, execute it, debug failures, and produce outputs without human intervention per step. For the AI community, this was a qualitative shift from 'model that answers questions' to 'agent that does things'.

Why it mostly failed in practice

AutoGPT and its successors (BabyAGI, AgentGPT) had a fundamental problem: they accumulated errors. Each step could go slightly wrong, and since subsequent steps were planned based on previous results, small errors compounded into large deviations. The cost of extended autonomous runs (many GPT-4 API calls) was also high. And the failure mode of a stuck loop that burned API credits without producing useful output was common enough to frustrate most practical uses.

The agent frameworks that emerged

LangChain, LlamaIndex, and Microsoft Semantic Kernel developed agent frameworks that addressed some AutoGPT's limitations: better tool use abstractions, more structured planning, and human-in-the-loop checkpoints. The pattern that works in production is a bounded agent: an agent that operates on a well-defined task, has access to specific tools for that task, and has guardrails that prevent it from diverging into indefinite loops.

The 2023 agent landscape

By the end of 2023, the agent frameworks were mature enough for production use in narrow domains: research summarisation, data extraction pipelines, code generation workflows. The general-purpose autonomous agent that can pursue open-ended goals remained a research problem. The practical value was in specific, bounded use cases where the agent's autonomy was an efficiency gain, not an open-ended exploration.