Knowledge Graph RAG: Hybrid Architecture for Complex Enterprise Reasoning
Pure vector search cannot answer multi-hop questions that require traversing relationships. Combining knowledge graphs with RAG pipelines unlocks a class of enterprise queries that neither approach solves alone.
Pure vector RAG retrieves semantically similar chunks but cannot traverse entity relationships — it cannot answer 'which of our suppliers also supply our top competitor' or 'which regulatory changes affect all the products that use component X.' Microsoft's GraphRAG (released July 2025) demonstrated that combining community-level knowledge graph summaries with vector retrieval improves answer quality on complex multi-hop enterprise questions by 20-40% versus pure vector RAG. The hybrid architecture — vector search for semantic similarity, graph traversal for relationship queries, cross-encoder re-ranking to merge results — is the production standard for enterprise knowledge systems that must reason over interconnected entities.
Where Pure Vector RAG Breaks Down
Vector RAG is excellent at finding text that is semantically similar to a query. It excels at 'what does the handbook say about parental leave policy' or 'find all documents mentioning Q4 revenue guidance.' These are essentially dense retrieval problems — the answer lives in a region of the embedding space near the query vector, and top-k retrieval finds it reliably.
Vector RAG fails at questions that require multi-hop traversal of entity relationships. Consider: 'Which of our enterprise customers are also customers of the vendor we are evaluating for acquisition?' This requires knowing which entities are customers, which entities are vendors, which vendor is under acquisition review, and which customer entities appear in both the customer relationship and the vendor relationship. This is a graph problem, not a retrieval problem — the answer is not in any single chunk of text, it emerges from traversing relationships across the knowledge graph.
Enterprise knowledge bases are full of this class of query. Regulatory impact analysis ('which of our products are affected by this new EU regulation'), supply chain risk assessment ('which product lines would be disrupted if supplier X failed'), competitive intelligence ('which of our technology patents are cited by our competitor's recent filings'), and compliance auditing ('which contracts require a data processing addendum and have not yet had one executed') all require relationship traversal that vector search cannot provide. The hybrid RAG architecture exists to serve this class of query without abandoning the semantic retrieval capabilities that vector RAG does well.
Building the Knowledge Graph Layer
Entity and Relationship Extraction
The first step in building a knowledge graph from enterprise documents is extracting entities and their relationships. An entity extraction pipeline uses an LLM (GPT-4o-mini or Claude 3 Haiku at reasonable cost) to identify named entities in each document chunk and classify them by type: Organisation, Person, Product, Contract, Regulation, Location, Date, etc. Relationships between entities are extracted in the same pass: 'Acme Corp SUPPLIES Component X to Globex Ltd,' 'Regulation EU-2025/1234 APPLIES_TO Product Category Software.'
The quality of this extraction directly determines the quality of downstream graph queries. Entity resolution — recognising that 'Acme,' 'Acme Corp,' and 'Acme Corporation Inc.' refer to the same entity — is the most labour-intensive step. In practice, we use a combination of fuzzy string matching, embedding-based similarity, and an LLM resolver for ambiguous cases. Getting entity resolution right for your domain's specific entities (your product names, your vendor names, your regulatory citation formats) requires domain-specific prompt tuning.
Neo4j Graph Schema Design
Neo4j AuraDB is the most operationally accessible managed graph database for enterprise deployments as of 2025. The schema design follows a property graph model: nodes represent entities with a type label and properties (name, description, source document, confidence score), and edges represent relationships with a type and optional properties (weight, date, source document, confidence score).
Schema design for enterprise knowledge graphs requires deliberate decisions about relationship granularity. A schema that has 50 relationship types is powerful but requires complex Cypher queries. A schema with 10 generalised relationship types (RELATED_TO, IS_A, HAS_PART, APPLIES_TO, PRODUCED_BY, etc.) is easier to query but loses semantic precision. The production approach is to start with a small set of high-value relationship types that directly map to the queries your users actually ask, and expand the schema incrementally as new query patterns emerge.
Microsoft GraphRAG: Community Summaries
Microsoft released GraphRAG in July 2025, introducing a novel approach: rather than using the graph for direct traversal, GraphRAG builds community summaries — LLM-generated summaries of clusters of related entities detected via community detection algorithms (Leiden clustering on the entity graph). These community summaries capture thematic relationships across large document collections that neither individual chunks nor direct traversal surface efficiently.
The practical value is for 'global' queries — questions about themes, trends, or patterns that span many documents and entities. 'What are the main regulatory themes in our compliance document library' is answered poorly by both vector RAG (too diffuse) and graph traversal (no single traversal path). GraphRAG's community summaries create an intermediate representation that handles this query class well. The tradeoff is high indexing cost — community summary generation requires many LLM calls during the indexing phase — making it appropriate for knowledge bases with infrequent updates but high query volume.
Retrieval Strategy Comparison: When to Use Each Approach
| Query Type | Example | Best Retrieval Strategy | Why |
|---|---|---|---|
| Semantic similarity | What does our privacy policy say about data retention? | Vector RAG | Answer is in a specific chunk, semantic similarity finds it |
| Direct entity lookup | What are all contracts signed with vendor Acme Corp? | Graph traversal (Cypher) | Property lookup on a specific named entity node |
| Single-hop relationship | Which regulations apply to our Healthcare product line? | Graph traversal | One relationship hop: Product → GOVERNED_BY → Regulation |
| Multi-hop relationship | Which suppliers also supply our top competitor? | Graph traversal | Two hops: Our suppliers → SUPPLIES → Products, then Products ← SUPPLIED_BY ← Competitor |
| Thematic / global query | What are the main risk themes across our contract portfolio? | GraphRAG community summaries | Distributed across many entities — community summaries capture themes |
| Hybrid: entity + context | What does the contract with Acme say about liability and what are our recent disputes with them? | Graph traversal + vector RAG merged | Entity lookup for contract, vector search for dispute context |
Hybrid Retrieval: LangChain + Neo4j Graph Queries
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_cohere import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever
from langchain.schema import Document
import logging
from typing import Optional
logger = logging.getLogger(__name__)
# ---- Configuration ----
NEO4J_URI = "neo4j+s://your-aura-db.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "your-password"
llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
class HybridRAGPipeline:
"""
Hybrid retrieval pipeline combining:
1. Neo4j vector store for semantic similarity retrieval
2. Neo4j graph traversal for relationship queries via Cypher
3. Cohere reranker to merge and rank results from both sources
"""
def __init__(self):
# Vector store: documents embedded into Neo4j with vector index
self.vector_store = Neo4jVector.from_existing_index(
embedding=embeddings,
url=NEO4J_URI,
username=NEO4J_USERNAME,
password=NEO4J_PASSWORD,
index_name="document_embeddings",
node_label="Document",
text_node_property="content",
embedding_node_property="embedding"
)
# Graph connection for Cypher queries
self.graph = Neo4jGraph(
url=NEO4J_URI,
username=NEO4J_USERNAME,
password=NEO4J_PASSWORD
)
# Graph QA chain: LLM generates Cypher from natural language, executes against Neo4j
self.cypher_chain = GraphCypherQAChain.from_llm(
llm=llm,
graph=self.graph,
verbose=False,
return_intermediate_steps=True,
cypher_prompt=self._get_cypher_generation_prompt()
)
# Reranker for merging results from both retrieval paths
self.reranker = CohereRerank(model="rerank-english-v3.0", top_n=6)
self.compression_retriever = ContextualCompressionRetriever(
base_compressor=self.reranker,
base_retriever=self.vector_store.as_retriever(search_kwargs={"k": 15})
)
def _get_cypher_generation_prompt(self) -> ChatPromptTemplate:
"""Prompt that instructs the LLM to generate safe, read-only Cypher queries."""
return ChatPromptTemplate.from_messages([
("system",
"""You are an expert Neo4j Cypher query generator for an enterprise knowledge graph.
Graph schema:
- Nodes: (Document {{id, title, content, source}}), (Entity {{id, name, type, description}}),
(Regulation {{id, name, jurisdiction, effective_date}}), (Product {{id, name, category}}),
(Organisation {{id, name, type}}), (Contract {{id, title, value, signed_date}})
- Relationships: (Entity)-[:MENTIONED_IN]->(Document), (Regulation)-[:APPLIES_TO]->(Product),
(Organisation)-[:PARTY_TO]->(Contract), (Organisation)-[:SUPPLIES]->(Product),
(Entity)-[:RELATED_TO]->(Entity)
Rules:
- Generate READ-ONLY Cypher (MATCH/RETURN only, never MERGE/CREATE/DELETE)
- Always LIMIT results to maximum 20 unless explicitly asked for more
- Return only relevant properties, not entire nodes
- If the question cannot be answered via graph traversal, return an empty Cypher query"""),
("human", "Schema: {schema}\n\nQuestion: {question}\n\nCypher query:")
])
def _run_graph_query(self, query: str) -> list[Document]:
"""Run the graph QA chain and convert results to Document objects."""
try:
result = self.cypher_chain.invoke({"query": query})
answer = result.get("result", "")
if answer and answer.strip() and "don't know" not in answer.lower():
return [Document(
page_content=f"Graph Query Result: {answer}",
metadata={"source": "neo4j_graph", "query_type": "cypher"}
)]
except Exception as e:
logger.warning(f"graph_query_failed | error={str(e)}")
return []
def _run_vector_search(self, query: str, k: int = 10) -> list[Document]:
"""Run vector similarity search."""
try:
return self.compression_retriever.invoke(query)
except Exception as e:
logger.warning(f"vector_search_failed | error={str(e)}")
return []
def retrieve(self, query: str) -> list[Document]:
"""
Hybrid retrieval: run both graph and vector search,
deduplicate, and return merged results.
"""
graph_docs = self._run_graph_query(query)
vector_docs = self._run_vector_search(query)
# Merge: graph results first (they are relationship-specific),
# then vector results, deduplicating by content hash
seen_content = set()
merged = []
for doc in graph_docs + vector_docs:
content_hash = hash(doc.page_content[:200])
if content_hash not in seen_content:
seen_content.add(content_hash)
merged.append(doc)
logger.info(f"hybrid_retrieval | query_len={len(query)} | graph_docs={len(graph_docs)} | vector_docs={len(vector_docs)} | merged={len(merged)}")
return merged[:8] # Cap final context at 8 documents
def query(self, question: str) -> str:
"""Full hybrid RAG query: retrieve from both sources, generate answer."""
docs = self.retrieve(question)
context = "\n\n".join(
f"[Source: {d.metadata.get('source', 'unknown')}]\n{d.page_content}"
for d in docs
)
if not context.strip():
return "I could not find relevant information in the knowledge base to answer this question."
prompt = ChatPromptTemplate.from_template("""
You are an enterprise knowledge assistant with access to both a document corpus
and a structured knowledge graph. Use the provided context to answer the question.
Cite the source type (document or graph) for key facts.
Context:
{context}
Question: {question}
Answer:
""")
chain = prompt | llm | StrOutputParser()
return chain.invoke({"context": context, "question": question})
if __name__ == "__main__":
pipeline = HybridRAGPipeline()
# Pure vector query — answered by document chunks
answer1 = pipeline.query("What are the payment terms in our standard MSA template?")
print("Vector query answer:", answer1[:200])
# Multi-hop graph query — answered by relationship traversal
answer2 = pipeline.query("Which of our products are subject to GDPR and also have contracts expiring in 2025?")
print("Graph query answer:", answer2[:200])
# Hybrid query — requires both sources
answer3 = pipeline.query("What does our contract with Meridian Analytics say about liability, and are they also our competitor's supplier?")
print("Hybrid query answer:", answer3[:200])Hybrid retrieval running vector similarity search and LLM-generated Cypher graph queries in parallel, merging results before generation. The Cypher generation prompt is constrained to read-only operations — a critical security requirement for production graph queries.
LLM-generated Cypher queries must be constrained to read-only operations at the database connection level, not just in the prompt. A Neo4j user with WRITE privileges and a well-intentioned but flawed Cypher generation prompt can produce queries that delete or modify graph data. Create a dedicated Neo4j user with READ-ONLY access (GRANT MATCH, READ on the relevant nodes and relationships) for the GraphCypherQAChain connection. Never allow the LLM to execute queries against a connection with CREATE, MERGE, or DELETE privileges, regardless of how well the system prompt is worded.
Implementation Roadmap for Hybrid RAG
- Start with the query inventory: before building any infrastructure, collect 50-100 real questions from your target users and classify them by type (semantic lookup, direct entity query, relationship traversal, global/thematic). This classification determines how much of your retrieval investment should go toward the graph layer versus the vector layer.
- Build the vector layer first: get a working vector RAG pipeline handling the semantic lookup queries before adding graph complexity. This validates the document ingestion pipeline, embedding model choice, and generation quality before introducing the additional complexity of graph construction and query routing.
- Invest in entity extraction quality before graph ingestion: the graph is only as useful as the quality of its entity and relationship extraction. Benchmark your extraction pipeline on a 100-document sample, manually review the extracted entities and relationships, and iterate on extraction prompts before processing the full corpus. Bad extractions compound — a 10% extraction error rate means 10% of your graph traversals return wrong answers.
- Use Neo4j AuraDB for managed hosting: the operational simplicity of a managed graph database is worth the cost for most enterprise deployments. Self-hosting Neo4j Enterprise is the right choice only if data sovereignty requirements prohibit external hosting or if graph query volume exceeds AuraDB's cost-effective tier.
- Implement query classification to route efficiently: not every query needs both retrieval paths. A lightweight classifier (keyword rules or a small LLM call) that determines whether a query requires graph traversal, pure vector search, or both reduces latency and cost for the majority of queries that are straightforward semantic lookups.
How Inductivee Builds Knowledge Graphs for Enterprise Clients
The knowledge graph is part of what Inductivee calls the Liquify phase — transforming frozen enterprise knowledge (documents, databases, legacy systems) into a queryable, AI-accessible layer. For clients with complex entity relationship requirements — financial services firms with interconnected counterparty, product, and regulatory entity graphs; manufacturing companies with multi-tier supply chain graphs; legal teams with entity relationship graphs across thousands of contracts — the knowledge graph layer is built before the agentic system that queries it.
The practical lesson from building these graphs across multiple industries is that entity resolution is always underestimated. Every enterprise has years of data accumulated with inconsistent naming conventions, merged entities, deprecated entity names, and informal references that technically refer to formal entities. The investment in a robust entity resolver — built on fuzzy matching, embedding similarity, and domain-specific rules — determines the quality of every downstream query. We now allocate a minimum of 30% of the Liquify phase to entity resolution, regardless of how clean the client believes their data to be. It has never been as clean as they believe.
Frequently Asked Questions
What is GraphRAG and how does it differ from standard RAG?
When should you use a knowledge graph instead of vector search for RAG?
What database is best for knowledge graphs in enterprise RAG systems?
How do you build a knowledge graph from enterprise documents?
What is the performance overhead of hybrid RAG versus pure vector RAG?
Written By
Inductivee Team
AuthorAgentic AI Engineering Team
The Inductivee engineering team — a remote-first group of multi-agent orchestration specialists, RAG pipeline architects, and data liquidity engineers who have shipped 40+ agentic deployments across 25+ enterprises since 2012. Our writing is grounded in what we actually build, break, and operate in production.
Inductivee is a remote-first agentic AI engineering firm with 40+ production deployments across 25+ enterprises since 2012. Our engineering content is written by active practitioners and technically reviewed before publication. Compliance: SOC2 Type II, HIPAA, GDPR, ISO 27001.
Engineer This With Inductivee
The engineering patterns in this article are what our team builds into production every day. Explore the related service to see how we deliver this capability at enterprise scale.
Cognitive Web Portals
Enterprise RAG portals and natural-language gateways — we turn your enterprise data into an interactive, self-service AI assistant grounded in your own knowledge.
ServiceCognitive Data Platforms
Cognitive data platforms and generative BI engineering — we transform raw enterprise data into a reasoning knowledge base for LLMs and autonomous agents. Built on vector databases, semantic ETL, and conversational analytics.
Related Articles
LangChain vs LlamaIndex: A Production Engineering Comparison for RAG
RAG Pipeline Architecture for the Enterprise: Five Layers Beyond the Basic Chatbot
Semantic Search for Enterprise Knowledge Bases: Engineering Beyond Full-Text
Ready to Build This Into Your Enterprise?
Inductivee engineers agentic systems, RAG pipelines, and enterprise data liquidity solutions. Let's scope your project.
Start a Project