The 5-stage cognitive pipeline: how an AI agent actually thinks

Most AI agent frameworks treat thinking as a single step where a message comes in and the model generates a response and the response goes out maybe with a tool call in the middle so the “thinking” is basically one pass through a language model, but that never felt right to me because when I think about how people actually make decisions in an organization it’s never one step, you assess the situation first and reason through options and check your work and act and communicate the result which is five distinct activities that happen in a specific order and skipping any of them usually means something goes wrong, so we built that into the architecture.

The five stages

When one of our agents receives a message it doesn’t just think, it runs through a pipeline with five stages each doing something different, and Prepare is where the agent gets its bearings where the raw message arrives and the system normalizes it and 11 context assembly modules fire in sequence and memories load in and relevant knowledge loads in and the graph gets traversed so by the time Prepare finishes the agent has a full picture of everything it needs to reason with.

Reason is where the actual thinking happens where the agent orients itself and assesses what kind of problem it’s facing and starts working through the response, and this stage can loop because if the first reasoning pass isn’t sufficient the agent goes back for another iteration and each pass adds to the cognitive state rather than overwriting it.

Verify is the one most frameworks skip entirely where the agent reflects on its own reasoning and asks whether the response is actually good enough, and this is where revision happens not as an afterthought but as a built-in stage of cognition, and if the reflection identifies problems the agent goes back to Reason.

Execute is where tools actually run where the agent proposes tools during Reason but the system controls execution during Execute so the agent never directly calls a tool, it says “I think we should run this with these parameters” and the system decides when and how to actually do it and that separation prevents the runaway tool-calling you see when agents have unrestricted access, and Deliver formats the final response and sends it back through the appropriate channel which sounds simple but it’s where tone and formatting and channel-specific conventions get applied.

Why the stages matter

The key insight isn’t that there are five stages it’s that separating these concerns changes the quality of what comes out, and when reasoning and execution are combined as they are in most frameworks the agent is trying to think and act simultaneously which is like asking someone to write a strategy memo while also executing the strategy and the quality of both suffers, but when verification is a separate stage the agent can catch its own mistakes before they become outputs and not every message needs heavy verification because simple factual questions can breeze through but for complex reasoning having a built-in “wait is this actually right?” step catches things that a single reasoning pass misses.

The state object that holds it all together

There’s a single cognitive state object that flows through all five stages with 46 fields where some are simple values that get updated as the agent progresses and others are append-only lists that grow with each iteration, the thinking iterations field accumulates every reasoning pass the agent takes, if something goes wrong the errors field captures it, nothing gets overwritten and everything gets logged, so by the time the agent reaches Deliver there’s a complete record of how it got to its answer not just the answer itself but the reasoning that produced it and the tools it considered and the verification it performed and that record becomes part of the agent’s memory for future reference.

The 80% ceiling

One decision that shaped the whole pipeline is we enforce a hard token budget ceiling at 80% of the model’s context window because the context assembly modules that run during Prepare can’t exceed this limit and if the retrieved memories and knowledge and conversation history and tool descriptions are too large to fit the token budget trimmer compresses them by starting to cut the least relevant context using a gradient approach not random truncation, and this sounds like a constraint and it is but it forces the system to be deliberate about what information actually matters for this specific reasoning pass because an agent with unlimited context tends to include everything and reason poorly but an agent with a managed budget has to prioritize and that prioritization makes the reasoning sharper, and I learned something similar running businesses with tight margins where when you have unlimited budget you spend poorly but when you have constraints you spend intentionally so the ceiling forces the same discipline on the agent’s cognition.

What this doesn’t solve

I want to be clear about what this architecture doesn’t do because it doesn’t make the underlying language model smarter and if the model hallucinates during Reason the Verify stage might catch it but it might not and the pipeline creates structure around the reasoning process but the reasoning itself is still only as good as the model doing it, but what the pipeline does is reduce the number of unforced errors, the kind of mistakes that happen not because the model can’t reason well but because the process around the reasoning is sloppy where maybe the context was incomplete or maybe nobody checked the output or maybe a tool ran when it shouldn’t have, those are process failures not intelligence failures and process failures are fixable with architecture, and that’s what we built.