The compression gradient: why Kahneman was wrong about thinking

Daniel Kahneman’s “Thinking, Fast and Slow” gave us a powerful model. System 1 is fast, intuitive, automatic. System 2 is slow, deliberate, effortful. Two systems. Two modes. A clean binary.

I read it years ago and it shaped how I thought about decision-making in the businesses I ran. Fast decisions for routine operations. Slow decisions for strategy. Two gears, shift between them.

Then I started designing how AI agents should think, and the binary fell apart.

Cognition isn’t two speeds

The problem with System 1 and System 2 isn’t that they’re wrong, it’s that they’re incomplete, because real cognition doesn’t switch between two modes, it operates on a continuous gradient from fully compressed (instant pattern matching) to fully expanded (step-by-step deliberate reasoning), with infinite shades in between.

When you drive a familiar route to work, that’s compressed cognition. You’re not in System 1 autopilot. You’re still making hundreds of micro-decisions per minute. But those decisions are so compressed by experience that they feel automatic. If a kid runs into the road, you expand immediately. Not to full System 2 deliberation. To whatever level of expansion the situation demands. Then you compress back down.

The gradient matters because it’s continuous, not binary. And designing AI reasoning as if there are only two modes misses everything in between.

What this means for AI agent design

Most AI agent architectures treat reasoning as one step: the message comes in, the model reasons about it, the response goes out, and some add a “plan first, then execute” layer, which is two steps and better, but still binary.

When we designed our cognitive pipeline, we built it around the gradient. Not every message needs the same depth of reasoning. A simple factual question should compress down to fast retrieval and direct response. A complex strategic question should expand through multiple reasoning iterations with verification.

The system determines the appropriate level of expansion based on the situation, with the orient phase of our reasoning stage assessing what kind of problem the agent is facing: is this routine, is this novel, is there ambiguity, and that assessment then determines how many reasoning iterations the agent takes, whether verification is needed, and how much context to load.

This produces a system where the agent thinks as deeply as the situation warrants. Not more, not less. Simple queries get simple processing. Complex queries get complex processing. The gradient between them is smooth, not a hard switch.

The compression comes from experience

Here’s the part that connects back to Kahneman’s actual insight, even if the binary framing is wrong. Compression happens through experience. The reason you can drive to work without deliberating about every turn is that you’ve made those turns hundreds of times. The decisions have been compressed through repetition.

For AI agents, the parallel is memory. An agent that has handled 500 customer queries has compressed patterns about how to respond. It doesn’t need to reason from first principles about every question. The accumulated experience creates compression, meaning faster responses with less cognitive load for familiar situations.

But here’s what matters: that compression needs to be available at every point on the gradient. An agent should be able to handle a familiar task with compressed cognition AND expand to full deliberation when something unfamiliar shows up, AND operate at any point in between for the vast middle ground of “somewhat familiar but with a twist.”

Binary architectures can’t do this because they’re either in fast mode or slow mode, but gradient architectures can because the system continuously adjusts the depth of processing to match the situation.

Why most “reasoning” frameworks miss this

The recent wave of reasoning-focused AI models is exciting. Models that think step-by-step produce better outputs on complex tasks. But they apply the same depth of reasoning to every query, regardless of whether that depth is needed.

That’s like running System 2 all the time. Kahneman himself pointed out why that doesn’t work: it’s exhausting and slow. Deliberate reasoning on every decision would paralyze a person. It also paralyzes an AI system, through latency and cost if not through fatigue.

The gradient approach says: match the reasoning depth to the problem. Let the system determine whether this query needs 200 milliseconds of compressed pattern-matching or 15 seconds of multi-iteration deliberation. The determination itself should be fast, a compressed judgment about how much expansion is needed.

I think this is one of the ideas from outside AI that the field most needs. Cognitive science has moved well beyond the System 1/System 2 binary. AI architecture should too. Kahneman gave us a useful starting point. The gradient is where the real design space opens up.