Why we built a governance system before we built features
I can tell you the exact moment this decision happened, we were a few weeks into building and I was looking at a growing list of components: tools, services, engines, modules, models, scripts, and each one had a file somewhere and each one had some documentation but none of them had a consistent way of answering basic questions like “what type of thing is this?” or “who owns it?” or “is it actually working?”, and I’d seen this movie before not in software but in businesses where a company grows fast and everyone’s shipping and then one day someone asks “what do we actually have?” and nobody can answer because the inventory doesn’t match reality and the documentation is stale and the org chart doesn’t reflect how work actually flows, so I made a call that felt counterintuitive at the time: before we build more features we build the governance system that tells us what we have and whether it’s working.
Two foundations, not one
The governance system rests on two peer foundations which we call CSS and OSS (not the web kind and not the open-source kind) where CSS is the Component Standard System that answers “is this thing defined correctly?” and OSS is the Operation Standard System that answers “is this thing working correctly?”, and CSS gives every component in the system three coordinates where Kind tells you what the thing IS (and we have over 35 types ranging from engines to tools to cognitive nodes) and Domain tells you which of the 20 subsystems owns it and Touches tracks what other domains it depends on, and every file in the codebase has these three coordinates where Python files get an inline comment and documentation files get YAML frontmatter and database-registered components get them through seed data with a priority cascade for how the system infers coordinates when they’re missing and a scanner that validates everything automatically, and this sounds bureaucratic but it’s the opposite because when every component has a type and an owner and declared dependencies you can answer questions that are impossible without it like “Show me all the tools in the cognition subsystem” or “What depends on the memory domain?” or “Which components were added this week that don’t have documentation?” and those queries take seconds when you have classification but they take days of archeology when you don’t.
54 controls across 5 dimensions
OSS tracks whether components are actually ready and we defined 54 controls organized across 5 dimensions (Build quality, Documentation, Testing, Operations, and Repo hygiene) with each control living at one of three maturity levels where Bronze is foundational (does the thing exist and have basic documentation?) and Silver means it’s actually solid (tests pass, docs are current, monitoring is running) and Gold is the bar you’d want to hit if someone were auditing you, and the levels are sequential so you can’t claim Silver without passing Bronze first and you can’t claim Gold without passing Silver which prevents the thing that happens in every growing system where some components are polished and others are held together with tape and nobody has a consistent way to tell which is which.
Documentation before code
One of the earliest decisions we locked was “documentation before code changes” so every time someone builds something new or changes something significant the documentation updates happen first not after, and this felt slow and I know it felt slow because I remember being frustrated by it early on where I wanted to build the thing not write about the thing, but the discipline paid off in two specific ways because first writing the documentation forces you to think through the design before you code it and you catch problems on paper that would have taken hours to discover in implementation and second the documentation is always current with no “we’ll update the docs later” debt accumulating because “later” is actually “before”, and the result is a system where the documentation and the code and the seed data all agree with each other and we run automated drift detection that checks for inconsistencies across all three so when something drifts we know immediately not months later when someone trips over a stale doc.
The real cost of not doing this
I’ve talked to other founders building AI platforms and almost all of them started with features and planned to add governance later so the ones who are further along than us are dealing with a predictable set of problems: they can’t tell you how many models they’re using or what each one costs, they can’t tell you which components depend on which other components, when something breaks they can’t trace the impact through the system without manually reading code, their documentation is a mixture of current and stale and contradictory, and adding new team members takes weeks because there’s no systematic way to understand what exists, and these aren’t failure stories they’re normal outcomes of building fast without governance and while the companies are functioning they’re just spending an increasing percentage of their time managing the complexity they created by not managing it from the start.
What it actually costs to do governance first
It’s slower at the beginning and there’s no way around that because when you’re classifying every component and writing controls and building scanners and enforcing documentation-first workflows you’re doing work that doesn’t produce visible features, and for us the investment was roughly 54 controls to define and a classification system to build and validate and automated scanners to write and a propagation checklist with 14 touchpoints that gets followed every time something changes, but was it worth it? At 37 database tables and 46 agent tools and 22 cognitive nodes and 53 models in the directory and 20 subsystems I can answer “what do we have and is it working?” in seconds and I can onboard a new contributor by pointing them at the classification system and the lifecycle standard and I can detect drift between what the code says and what the docs say automatically, and more than that the governance system scales because adding the 54th model to the directory takes minutes not hours because the structure for how models are defined and validated and integrated already exists and adding the 47th tool follows the same pattern as adding the first one.
The founders who built features first are now trying to retroactively add governance to a running system and that’s like trying to install plumbing in a building that’s already occupied, it can be done but it’s just dramatically harder and more disruptive than putting the plumbing in first, and I’d make the same call again.