Two types of RAG: why you probably need both

I used to think RAG was one thing, you take some documents and chunk them and embed them into vectors and search by similarity when someone asks a question and that’s retrieval-augmented generation and simple enough, but then we tried to answer a question like “What decisions has this agent made that relate to this customer’s pricing concerns?” and the vector search returned a bunch of vaguely related chunks that had the right keywords but completely missed the actual relationships between things, the information was there but the connections weren’t, and that’s when we started building the second type.

What vector RAG actually does well

Vector search is brilliant at semantic similarity, you turn text into 1024-dimensional points in space and then find other points that are nearby, and “what have we written that sounds like this question?” is something vectors answer extremely well, and for memory retrieval this is exactly what you want because an agent needs to recall past conversations and facts it learned and preferences a user expressed, and embedding those memories and searching by similarity works, and we use it for knowledge retrieval too, pulling relevant documents and data from the knowledge base.

The underlying model generates these embeddings locally on our GPU server, every memory chunk and every knowledge fragment and every tool description gets embedded and stored, and when the agent starts reasoning about a new message the first thing it does is search for relevant memories and knowledge, that semantic search runs against tens of thousands of vectors and returns the most relevant context in milliseconds, and this is the foundation where most people stop here.

Where vector RAG falls apart

Vectors have no concept of relationships so they can tell you “these two things are semantically similar” but they can’t tell you “this thing caused that thing” or “this entity is connected to that entity through three intermediate steps,” and ask a vector database “find me everything related to customer onboarding” and you’ll get chunks that mention onboarding but ask it “how does our onboarding process connect to our retention metrics and what decisions did we make along the way that affected both?” and you’ll get noise, relevant-sounding noise but noise.

Real organizational knowledge isn’t a bag of similar documents, it’s a web of entities and relationships where a person connects to a project which connects to a decision which connects to an outcome, and if your retrieval system can’t traverse those connections it’s working with a flat map of a three-dimensional world.

What graph RAG adds

Graph RAG uses a knowledge graph instead of or alongside vectors, where entities become nodes and relationships become edges, so when you ask a question the system doesn’t just search for similar text, it traverses the graph to find connected entities and the paths between them, and we run a dedicated extraction pipeline that pulls entities and relationships out of everything the system processes, and when a new piece of information arrives it doesn’t just get embedded as a vector, an extraction model identifies what entities are mentioned and what relationships exist between them and how this new information connects to what the graph already knows, and then when the agent reasons about a question that involves relationships the graph retrieval module fires alongside the vector retrieval module where the graph answers “how are these things connected?” while the vectors answer “what else sounds related?”

How they work together

Here’s what actually happens when our agent processes a message, the context assembly pipeline runs 11 modules in a specific sequence with memories coming in from the vector store first then knowledge documents then graph traversal results, and by the time the agent starts reasoning it’s looking at a context window that blends semantic similarity with structural connections, the merging isn’t random, each result gets decorated with source metadata so the agent knows where every piece of context came from and how it was retrieved, and the difference in response quality is significant because before we added graph retrieval the agent was good at surfacing relevant information but struggled with questions about connections and causation and history, and after it could trace the path from a decision to its consequences or from a customer interaction to the policy that governs it.

The cost of building both

I won’t pretend this is simple because running two retrieval systems means two different storage backends and two different ingestion pipelines and a merging strategy that doesn’t just concatenate results, the extraction pipeline needs a dedicated model to pull entities and relationships, the graph database needs its own infrastructure and monitoring and backup strategy, and we run health checks every 5 minutes and back up every 6 hours with a Slack alert if anything goes wrong, so it’s real operational overhead, but the alternative is building an AI system that can only do similarity search in a world where the valuable questions are about connections and that felt like building half the brain and hoping nobody noticed.

If your use case is simple question-answering over a document collection vector RAG alone will serve you well, but if you’re building agents that need to understand relationships and trace decisions and reason about connected information over time you probably need both, and we tried the simpler path first and it wasn’t enough.