Skip to main content
RAG & Retrieval

Knowledge Graph RAG: Hybrid Architecture for Complex Enterprise Reasoning

Pure vector search cannot answer multi-hop questions that require traversing relationships. Combining knowledge graphs with RAG pipelines unlocks a class of enterprise queries that neither approach solves alone.

Inductivee Team· AI EngineeringAugust 14, 2025(updated April 15, 2026)14 min read
TL;DR

Pure vector RAG retrieves semantically similar chunks but cannot traverse entity relationships — it cannot answer 'which of our suppliers also supply our top competitor' or 'which regulatory changes affect all the products that use component X.' Microsoft's GraphRAG (released July 2025) demonstrated that combining community-level knowledge graph summaries with vector retrieval improves answer quality on complex multi-hop enterprise questions by 20-40% versus pure vector RAG. The hybrid architecture — vector search for semantic similarity, graph traversal for relationship queries, cross-encoder re-ranking to merge results — is the production standard for enterprise knowledge systems that must reason over interconnected entities.

Where Pure Vector RAG Breaks Down

Vector RAG is excellent at finding text that is semantically similar to a query. It excels at 'what does the handbook say about parental leave policy' or 'find all documents mentioning Q4 revenue guidance.' These are essentially dense retrieval problems — the answer lives in a region of the embedding space near the query vector, and top-k retrieval finds it reliably.

Vector RAG fails at questions that require multi-hop traversal of entity relationships. Consider: 'Which of our enterprise customers are also customers of the vendor we are evaluating for acquisition?' This requires knowing which entities are customers, which entities are vendors, which vendor is under acquisition review, and which customer entities appear in both the customer relationship and the vendor relationship. This is a graph problem, not a retrieval problem — the answer is not in any single chunk of text, it emerges from traversing relationships across the knowledge graph.

Enterprise knowledge bases are full of this class of query. Regulatory impact analysis ('which of our products are affected by this new EU regulation'), supply chain risk assessment ('which product lines would be disrupted if supplier X failed'), competitive intelligence ('which of our technology patents are cited by our competitor's recent filings'), and compliance auditing ('which contracts require a data processing addendum and have not yet had one executed') all require relationship traversal that vector search cannot provide. The hybrid RAG architecture exists to serve this class of query without abandoning the semantic retrieval capabilities that vector RAG does well.

Building the Knowledge Graph Layer

Entity and Relationship Extraction

The first step in building a knowledge graph from enterprise documents is extracting entities and their relationships. An entity extraction pipeline uses an LLM (GPT-4o-mini or Claude 3 Haiku at reasonable cost) to identify named entities in each document chunk and classify them by type: Organisation, Person, Product, Contract, Regulation, Location, Date, etc. Relationships between entities are extracted in the same pass: 'Acme Corp SUPPLIES Component X to Globex Ltd,' 'Regulation EU-2025/1234 APPLIES_TO Product Category Software.'

The quality of this extraction directly determines the quality of downstream graph queries. Entity resolution — recognising that 'Acme,' 'Acme Corp,' and 'Acme Corporation Inc.' refer to the same entity — is the most labour-intensive step. In practice, we use a combination of fuzzy string matching, embedding-based similarity, and an LLM resolver for ambiguous cases. Getting entity resolution right for your domain's specific entities (your product names, your vendor names, your regulatory citation formats) requires domain-specific prompt tuning.

Neo4j Graph Schema Design

Neo4j AuraDB is the most operationally accessible managed graph database for enterprise deployments as of 2025. The schema design follows a property graph model: nodes represent entities with a type label and properties (name, description, source document, confidence score), and edges represent relationships with a type and optional properties (weight, date, source document, confidence score).

Schema design for enterprise knowledge graphs requires deliberate decisions about relationship granularity. A schema that has 50 relationship types is powerful but requires complex Cypher queries. A schema with 10 generalised relationship types (RELATED_TO, IS_A, HAS_PART, APPLIES_TO, PRODUCED_BY, etc.) is easier to query but loses semantic precision. The production approach is to start with a small set of high-value relationship types that directly map to the queries your users actually ask, and expand the schema incrementally as new query patterns emerge.

Microsoft GraphRAG: Community Summaries

Microsoft released GraphRAG in July 2025, introducing a novel approach: rather than using the graph for direct traversal, GraphRAG builds community summaries — LLM-generated summaries of clusters of related entities detected via community detection algorithms (Leiden clustering on the entity graph). These community summaries capture thematic relationships across large document collections that neither individual chunks nor direct traversal surface efficiently.

The practical value is for 'global' queries — questions about themes, trends, or patterns that span many documents and entities. 'What are the main regulatory themes in our compliance document library' is answered poorly by both vector RAG (too diffuse) and graph traversal (no single traversal path). GraphRAG's community summaries create an intermediate representation that handles this query class well. The tradeoff is high indexing cost — community summary generation requires many LLM calls during the indexing phase — making it appropriate for knowledge bases with infrequent updates but high query volume.

Retrieval Strategy Comparison: When to Use Each Approach

Query TypeExampleBest Retrieval StrategyWhy
Semantic similarityWhat does our privacy policy say about data retention?Vector RAGAnswer is in a specific chunk, semantic similarity finds it
Direct entity lookupWhat are all contracts signed with vendor Acme Corp?Graph traversal (Cypher)Property lookup on a specific named entity node
Single-hop relationshipWhich regulations apply to our Healthcare product line?Graph traversalOne relationship hop: Product → GOVERNED_BY → Regulation
Multi-hop relationshipWhich suppliers also supply our top competitor?Graph traversalTwo hops: Our suppliers → SUPPLIES → Products, then Products ← SUPPLIED_BY ← Competitor
Thematic / global queryWhat are the main risk themes across our contract portfolio?GraphRAG community summariesDistributed across many entities — community summaries capture themes
Hybrid: entity + contextWhat does the contract with Acme say about liability and what are our recent disputes with them?Graph traversal + vector RAG mergedEntity lookup for contract, vector search for dispute context

Hybrid Retrieval: LangChain + Neo4j Graph Queries

python
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_cohere import CohereRerank
from langchain.retrievers import ContextualCompressionRetriever
from langchain.schema import Document
import logging
from typing import Optional

logger = logging.getLogger(__name__)

# ---- Configuration ----
NEO4J_URI = "neo4j+s://your-aura-db.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "your-password"

llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")


class HybridRAGPipeline:
    """
    Hybrid retrieval pipeline combining:
    1. Neo4j vector store for semantic similarity retrieval
    2. Neo4j graph traversal for relationship queries via Cypher
    3. Cohere reranker to merge and rank results from both sources
    """

    def __init__(self):
        # Vector store: documents embedded into Neo4j with vector index
        self.vector_store = Neo4jVector.from_existing_index(
            embedding=embeddings,
            url=NEO4J_URI,
            username=NEO4J_USERNAME,
            password=NEO4J_PASSWORD,
            index_name="document_embeddings",
            node_label="Document",
            text_node_property="content",
            embedding_node_property="embedding"
        )

        # Graph connection for Cypher queries
        self.graph = Neo4jGraph(
            url=NEO4J_URI,
            username=NEO4J_USERNAME,
            password=NEO4J_PASSWORD
        )

        # Graph QA chain: LLM generates Cypher from natural language, executes against Neo4j
        self.cypher_chain = GraphCypherQAChain.from_llm(
            llm=llm,
            graph=self.graph,
            verbose=False,
            return_intermediate_steps=True,
            cypher_prompt=self._get_cypher_generation_prompt()
        )

        # Reranker for merging results from both retrieval paths
        self.reranker = CohereRerank(model="rerank-english-v3.0", top_n=6)
        self.compression_retriever = ContextualCompressionRetriever(
            base_compressor=self.reranker,
            base_retriever=self.vector_store.as_retriever(search_kwargs={"k": 15})
        )

    def _get_cypher_generation_prompt(self) -> ChatPromptTemplate:
        """Prompt that instructs the LLM to generate safe, read-only Cypher queries."""
        return ChatPromptTemplate.from_messages([
            ("system",
             """You are an expert Neo4j Cypher query generator for an enterprise knowledge graph.

Graph schema:
- Nodes: (Document {{id, title, content, source}}), (Entity {{id, name, type, description}}),
  (Regulation {{id, name, jurisdiction, effective_date}}), (Product {{id, name, category}}),
  (Organisation {{id, name, type}}), (Contract {{id, title, value, signed_date}})
- Relationships: (Entity)-[:MENTIONED_IN]->(Document), (Regulation)-[:APPLIES_TO]->(Product),
  (Organisation)-[:PARTY_TO]->(Contract), (Organisation)-[:SUPPLIES]->(Product),
  (Entity)-[:RELATED_TO]->(Entity)

Rules:
- Generate READ-ONLY Cypher (MATCH/RETURN only, never MERGE/CREATE/DELETE)
- Always LIMIT results to maximum 20 unless explicitly asked for more
- Return only relevant properties, not entire nodes
- If the question cannot be answered via graph traversal, return an empty Cypher query"""),
            ("human", "Schema: {schema}\n\nQuestion: {question}\n\nCypher query:")
        ])

    def _run_graph_query(self, query: str) -> list[Document]:
        """Run the graph QA chain and convert results to Document objects."""
        try:
            result = self.cypher_chain.invoke({"query": query})
            answer = result.get("result", "")
            if answer and answer.strip() and "don't know" not in answer.lower():
                return [Document(
                    page_content=f"Graph Query Result: {answer}",
                    metadata={"source": "neo4j_graph", "query_type": "cypher"}
                )]
        except Exception as e:
            logger.warning(f"graph_query_failed | error={str(e)}")
        return []

    def _run_vector_search(self, query: str, k: int = 10) -> list[Document]:
        """Run vector similarity search."""
        try:
            return self.compression_retriever.invoke(query)
        except Exception as e:
            logger.warning(f"vector_search_failed | error={str(e)}")
            return []

    def retrieve(self, query: str) -> list[Document]:
        """
        Hybrid retrieval: run both graph and vector search,
        deduplicate, and return merged results.
        """
        graph_docs = self._run_graph_query(query)
        vector_docs = self._run_vector_search(query)

        # Merge: graph results first (they are relationship-specific),
        # then vector results, deduplicating by content hash
        seen_content = set()
        merged = []
        for doc in graph_docs + vector_docs:
            content_hash = hash(doc.page_content[:200])
            if content_hash not in seen_content:
                seen_content.add(content_hash)
                merged.append(doc)

        logger.info(f"hybrid_retrieval | query_len={len(query)} | graph_docs={len(graph_docs)} | vector_docs={len(vector_docs)} | merged={len(merged)}")
        return merged[:8]  # Cap final context at 8 documents

    def query(self, question: str) -> str:
        """Full hybrid RAG query: retrieve from both sources, generate answer."""
        docs = self.retrieve(question)

        context = "\n\n".join(
            f"[Source: {d.metadata.get('source', 'unknown')}]\n{d.page_content}"
            for d in docs
        )

        if not context.strip():
            return "I could not find relevant information in the knowledge base to answer this question."

        prompt = ChatPromptTemplate.from_template("""
You are an enterprise knowledge assistant with access to both a document corpus
and a structured knowledge graph. Use the provided context to answer the question.
Cite the source type (document or graph) for key facts.

Context:
{context}

Question: {question}

Answer:
""")

        chain = prompt | llm | StrOutputParser()
        return chain.invoke({"context": context, "question": question})


if __name__ == "__main__":
    pipeline = HybridRAGPipeline()

    # Pure vector query — answered by document chunks
    answer1 = pipeline.query("What are the payment terms in our standard MSA template?")
    print("Vector query answer:", answer1[:200])

    # Multi-hop graph query — answered by relationship traversal
    answer2 = pipeline.query("Which of our products are subject to GDPR and also have contracts expiring in 2025?")
    print("Graph query answer:", answer2[:200])

    # Hybrid query — requires both sources
    answer3 = pipeline.query("What does our contract with Meridian Analytics say about liability, and are they also our competitor's supplier?")
    print("Hybrid query answer:", answer3[:200])

Hybrid retrieval running vector similarity search and LLM-generated Cypher graph queries in parallel, merging results before generation. The Cypher generation prompt is constrained to read-only operations — a critical security requirement for production graph queries.

Warning

LLM-generated Cypher queries must be constrained to read-only operations at the database connection level, not just in the prompt. A Neo4j user with WRITE privileges and a well-intentioned but flawed Cypher generation prompt can produce queries that delete or modify graph data. Create a dedicated Neo4j user with READ-ONLY access (GRANT MATCH, READ on the relevant nodes and relationships) for the GraphCypherQAChain connection. Never allow the LLM to execute queries against a connection with CREATE, MERGE, or DELETE privileges, regardless of how well the system prompt is worded.

Implementation Roadmap for Hybrid RAG

  • Start with the query inventory: before building any infrastructure, collect 50-100 real questions from your target users and classify them by type (semantic lookup, direct entity query, relationship traversal, global/thematic). This classification determines how much of your retrieval investment should go toward the graph layer versus the vector layer.
  • Build the vector layer first: get a working vector RAG pipeline handling the semantic lookup queries before adding graph complexity. This validates the document ingestion pipeline, embedding model choice, and generation quality before introducing the additional complexity of graph construction and query routing.
  • Invest in entity extraction quality before graph ingestion: the graph is only as useful as the quality of its entity and relationship extraction. Benchmark your extraction pipeline on a 100-document sample, manually review the extracted entities and relationships, and iterate on extraction prompts before processing the full corpus. Bad extractions compound — a 10% extraction error rate means 10% of your graph traversals return wrong answers.
  • Use Neo4j AuraDB for managed hosting: the operational simplicity of a managed graph database is worth the cost for most enterprise deployments. Self-hosting Neo4j Enterprise is the right choice only if data sovereignty requirements prohibit external hosting or if graph query volume exceeds AuraDB's cost-effective tier.
  • Implement query classification to route efficiently: not every query needs both retrieval paths. A lightweight classifier (keyword rules or a small LLM call) that determines whether a query requires graph traversal, pure vector search, or both reduces latency and cost for the majority of queries that are straightforward semantic lookups.

How Inductivee Builds Knowledge Graphs for Enterprise Clients

The knowledge graph is part of what Inductivee calls the Liquify phase — transforming frozen enterprise knowledge (documents, databases, legacy systems) into a queryable, AI-accessible layer. For clients with complex entity relationship requirements — financial services firms with interconnected counterparty, product, and regulatory entity graphs; manufacturing companies with multi-tier supply chain graphs; legal teams with entity relationship graphs across thousands of contracts — the knowledge graph layer is built before the agentic system that queries it.

The practical lesson from building these graphs across multiple industries is that entity resolution is always underestimated. Every enterprise has years of data accumulated with inconsistent naming conventions, merged entities, deprecated entity names, and informal references that technically refer to formal entities. The investment in a robust entity resolver — built on fuzzy matching, embedding similarity, and domain-specific rules — determines the quality of every downstream query. We now allocate a minimum of 30% of the Liquify phase to entity resolution, regardless of how clean the client believes their data to be. It has never been as clean as they believe.

Frequently Asked Questions

What is GraphRAG and how does it differ from standard RAG?

GraphRAG is a retrieval approach that builds a knowledge graph from the document corpus and uses graph traversal — in addition to or instead of vector similarity search — to answer queries. Standard RAG retrieves semantically similar text chunks; GraphRAG can traverse entity relationships to answer multi-hop questions like 'which of our products are affected by this regulation and also have contracts expiring this year.' Microsoft's GraphRAG (released July 2025) adds community summarisation — LLM-generated summaries of entity clusters — to handle global thematic queries that neither vector search nor direct traversal handle well.

When should you use a knowledge graph instead of vector search for RAG?

Use knowledge graph retrieval when queries require relationship traversal that cannot be answered by finding semantically similar text chunks. The clearest signals are: multi-hop relationship questions (entity A connects to entity B which connects to entity C), queries about entity properties that are not stated in any single document (but emerge from relationships across many documents), and queries that require set operations over entity collections (all contracts with vendor X that also mention regulation Y). Vector search remains the right choice for semantic similarity queries where the answer lives in a specific document chunk.

What database is best for knowledge graphs in enterprise RAG systems?

Neo4j is the most widely deployed graph database for enterprise AI applications as of 2025, with the best ecosystem support for LangChain and LlamaIndex integrations, and Neo4j AuraDB providing a managed cloud offering that eliminates operational overhead. For enterprises with data sovereignty requirements, Neo4j Enterprise self-hosted is the alternative. Apache TinkerPop-compatible databases (Amazon Neptune, JanusGraph) are viable alternatives for teams already invested in AWS infrastructure. The LangChain Neo4jGraph and GraphCypherQAChain integrations are the most mature path to LLM-generated Cypher queries as of mid-2025.

How do you build a knowledge graph from enterprise documents?

Building a knowledge graph from enterprise documents requires three steps: entity and relationship extraction (using an LLM to identify named entities and their relationships in document chunks), entity resolution (disambiguating and merging references to the same entity across documents — the most labour-intensive step), and graph ingestion (loading the resolved entity-relationship triples into Neo4j or another graph database with a well-designed schema). The quality of the resulting graph is primarily determined by entity resolution quality — a 10% extraction error rate means 10% of graph traversals return wrong answers. Invest in the resolution step before scaling ingestion.

What is the performance overhead of hybrid RAG versus pure vector RAG?

Hybrid retrieval adds the latency of an additional graph query (typically 50-200ms for simple Cypher traversals on Neo4j AuraDB) plus an LLM call to generate the Cypher query (200-500ms with GPT-4o-mini). For a vector RAG baseline of 500-800ms end-to-end, hybrid retrieval adds 250-700ms — approximately 1-1.5 seconds total. For the class of multi-hop relationship queries that only graph traversal can answer, this latency premium is the cost of correctness. Query classification that routes simple semantic queries to pure vector search (avoiding the graph overhead) is the standard production optimisation for systems handling mixed query types.

Written By

Inductivee Team — AI Engineering at Inductivee

Inductivee Team

Author

Agentic AI Engineering Team

The Inductivee engineering team — a remote-first group of multi-agent orchestration specialists, RAG pipeline architects, and data liquidity engineers who have shipped 40+ agentic deployments across 25+ enterprises since 2012. Our writing is grounded in what we actually build, break, and operate in production.

Agentic AI ArchitectureMulti-Agent OrchestrationLangChainLangGraphCrewAIMicrosoft AutoGen
LinkedIn profile

Inductivee is a remote-first agentic AI engineering firm with 40+ production deployments across 25+ enterprises since 2012. Our engineering content is written by active practitioners and technically reviewed before publication. Compliance: SOC2 Type II, HIPAA, GDPR, ISO 27001.

Ready to Build This Into Your Enterprise?

Inductivee engineers agentic systems, RAG pipelines, and enterprise data liquidity solutions. Let's scope your project.

Start a Project