Why we left CrewAI for LangGraph (and what we learned)

We were three weeks into building with CrewAI when I realized we’d picked the wrong framework, not because CrewAI is bad—it’s not, it’s actually a pretty good starting point for getting multi-agent systems up and running fast—but “fast to start” and “right for what we’re building” turned out to be very different things.

The thing that worked

CrewAI gives you a mental model that makes immediate sense, you define agents, you define tasks, you wire them together, it feels like describing a team where “this agent does research, this agent does writing, this agent reviews,” and if you’re building something where agents have clear discrete jobs and hand work back and forth in a predictable sequence, it works well.

We got a basic prototype running in days. Agents were talking to each other. Tasks were completing. It felt like progress.

Where it started to break

The problems showed up when we tried to build the cognitive architecture we actually needed, because our agents don’t just pass tasks around, they have a multi-stage reasoning process that includes preparation, reasoning, verification, execution, and delivery with multiple nodes inside each of those stages, the reasoning stage loops, the verification stage can send things back to reasoning, and tool execution happens in a controlled system-managed way where the agent proposes tools but the system decides when and how to run them, and CrewAI’s abstraction layer sits right on top of that kind of complexity, it wants to manage the agent-to-agent flow for you which is great if your flow is simple but we needed to control the flow ourselves, we needed conditional branching, we needed state that persisted across nodes, we needed a graph not a pipeline, and that’s when LangGraph started making more sense.

What LangGraph actually gave us

LangGraph models the cognitive loop as a state graph where every reasoning step is a node and the edges between nodes are conditional and the state object flows through the entire graph and accumulates context as it goes, which maps naturally to how we think about agent cognition where an agent receiving a message doesn’t just “think and respond” but prepares by normalizing the input and loading context and retrieving memories, then reasons through possibly multiple iterations with tool-planning mixed in, then verifies by checking its own work and deciding if revision is needed, then executes by running tools through a controlled dispatch system, and finally delivers by formatting and sending the response.

Each of those stages contains multiple nodes, and our cognitive state object tracks 46 fields across all of these, with some fields as append-only lists that accumulate across iterations, think of them like a running log of everything the agent considered and tried and decided during a single conversation turn.

We couldn’t have built this on CrewAI without fighting the framework at every step.

The real lesson

The technical migration itself took a few weeks, but the harder part was accepting that we’d invested time in the wrong direction and needed to change course because there’s a gravitational pull to keep using whatever you started with, especially when you’ve already built things on top of it, and what made the decision easier was asking a specific question: “Is this framework going to constrain our architecture, or support it?” because if the answer is constrain you switch, and the cost of switching early is always less than the cost of building workarounds for years.

I’ve seen this pattern before in non-tech contexts, and when I was consulting for Amazon and Alibaba sellers the biggest operational mistakes weren’t choosing the wrong tool, they were staying with the wrong tool after they knew it was wrong because the sunk cost pulls you forward into decisions that don’t make sense anymore.

What I’d tell someone choosing today

If you’re building agents that follow a straightforward task-delegation pattern, CrewAI or something similar will get you moving fast and that’s genuinely valuable, but if you’re building agents that need a custom cognitive architecture where the reasoning process itself is the product, start with a graph-based framework because you’ll spend more time at the beginning understanding the abstraction but you’ll spend dramatically less time fighting it later, and the 46 registered tools our agents use and the 22 cognitive nodes in our reasoning graph and the 11-module context assembly pipeline that runs before every reasoning step, none of that would have been possible if we’d stayed where we started.

Sometimes the best technical decision is admitting that what got you here won’t get you there.