What Toyota taught me about AI agents

I didn’t come to AI through computer science, I came through operations, spending twenty-five years running businesses and consulting on supply chains and watching what separates organizations that actually work from organizations that just look like they work, and the Toyota Production System kept showing up not because I was studying manufacturing but because the principles TPS is built on are universal organizational truths that most people dismiss as “factory stuff” when they’re actually the best framework I’ve found for thinking about how AI agents should operate.

Built-in quality, not inspected-in quality

TPS distinguishes between quality that’s built into the process and quality that’s checked after the fact, where building quality in means each step is designed to produce correct output, while inspecting quality in means you let mistakes happen and catch them later.

Most AI agent frameworks use the inspection model, where the agent produces an output and then something checks whether it’s good enough, whether a human reviewing it, a second model evaluating it, or a rule-based filter scanning for problems. It works, but it’s expensive and it doesn’t improve over time, so the agent keeps making the same kinds of mistakes while the inspection layer keeps catching them.

Built-in quality looks different. When we designed our cognitive pipeline, the verification stage isn’t a post-hoc check. It’s part of the reasoning process. The agent doesn’t produce an output and then ask “was this good?” The agent produces reasoning, verifies that reasoning as part of the same cognitive loop, and adjusts before the output ever reaches the user.

The distinction might sound academic, but operationally it’s enormous, where inspection catches mistakes after they’re made while built-in quality prevents them from being made in the first place, or at least reduces the rate by an order of magnitude.

Continuous improvement at the process level

Kaizen, the Japanese word for continuous improvement, is the most misunderstood concept in TPS because people think it means “keep getting better” when it actually means something more specific: systematically identifying waste in your process and removing it repeatedly at every level of the organization, and for AI agents waste takes specific forms like retrieving memories that aren’t relevant to the current task, running expensive models on tasks that cheap models handle equally well, or loading tool descriptions the agent will never use, each of which adds latency and cost and noise to the reasoning process.

Our token budget system is a kaizen tool that forces the system to identify and cut the least valuable context before reasoning begins, and the gradient compression doesn’t just truncate randomly but evaluates what’s least relevant and removes that first so each cycle of compression is a small improvement in what information the agent actually reasons about.

But the deeper application is in how the system evolves, where when we observe that agents consistently underuse a tool or over-retrieve from a particular memory category we adjust the retrieval weighting, update the tool discovery logic, or restructure the knowledge base, and the improvement isn’t in the agent but in the process that supports the agent.

Respect for people (and what it means for agents)

The third pillar of TPS that nobody in AI talks about is respect for people, which in the Toyota context means trusting the people closest to the work to make decisions about it where managers don’t dictate and workers on the line have authority to stop production when they see a problem, and translated to AI agents this principle says something uncomfortable: if you’ve built an agent with good reasoning capabilities and the right context and sound governance you should trust it to make decisions within its scope.

The default in the AI industry is micromanagement, where every output gets reviewed and every action gets approved and the agent is treated as a tool that needs constant supervision rather than a team member that can be trusted with defined responsibilities.

There’s a version of this that’s appropriate early on where new agents like new employees should have high oversight but the goal should be graduated trust rather than permanent supervision so as the agent demonstrates competence in its role the oversight level decreases and its operational latitude increases, and we built this into our trust model where trust isn’t a configuration toggle but an earned property that changes based on demonstrated performance, so an agent that consistently makes good decisions within its scope earns more autonomy over time while one that makes mistakes gets more oversight.

That’s not a novel management idea, it’s how every functional organization manages people and TPS just made it explicit and measurable.

Why operational frameworks matter more than AI frameworks

The AI industry reads papers from NeurIPS and ICML, and I’m not saying those aren’t valuable but the operational problems of running AI agents in real organizations aren’t ML problems—they’re management problems and quality problems and process problems, and Toyota solved those problems decades ago for human workers in physical factories so the principles transfer directly: built-in quality over inspection, continuous process improvement, trust the people closest to the work.

The best framework for thinking about AI agents isn’t LangChain or CrewAI or AutoGen, it’s the Toyota Production System, and the reason nobody in AI is talking about this is the same reason nobody in software talked about it for decades: they thought it was just about cars.

It was never about cars.