Generative BI: Connecting LLMs to Your Enterprise Data Warehouse
Natural language queries against a data warehouse sound simple. Making them accurate, safe, and performant in production requires a semantic layer, schema understanding, query validation, and careful access control architecture.
Naive Text-to-SQL fails in production for four reasons: schema hallucination, SQL injection exposure, no access control, and context window overload from large schemas. The production architecture requires a semantic layer (business glossary and metric definitions), schema-as-RAG for dynamic context injection, AST-based SQL validation before execution, and row-level security enforcement at the query layer. Teams that skip the semantic layer spend months debugging incorrect but plausible-looking SQL.
Why Naive Text-to-SQL Fails at the Enterprise Scale
The demo is always convincing: type 'what was revenue last quarter by region?' and watch the LLM generate correct SQL and return a formatted table. Then you hand it to a business analyst who asks 'what is our NRR trend for enterprise accounts over the last six quarters?' and the model confidently generates SQL that queries the wrong table, uses a column that was renamed two years ago, and calculates NRR as a simple retention rate rather than net revenue retention as your finance team defines it.
Enterprise data warehouses are not clean schemas with self-explanatory column names. They are 10-20 year accumulations of naming conventions from three acquisitions, two BI platform migrations, and a dozen data engineers with different philosophies. The same concept — 'active customer' — may be defined differently in the marketing, finance, and product tables. Without a semantic layer that tells the LLM exactly how your organisation defines its key metrics, it will make plausible-sounding but incorrect assumptions.
The second failure mode is more dangerous: SQL injection. An LLM generating SQL from user natural language input is a classic injection vector if the generated SQL is executed without validation. Even well-intentioned queries can inadvertently produce DELETE or DROP statements, expose cross-tenant data, or run Cartesian joins that bring down query performance for all users. The production architecture must treat LLM-generated SQL as untrusted input.
The Four Components of a Production Generative BI Architecture
Semantic Layer: The Business Glossary
The semantic layer is a curated mapping from business concepts to SQL constructs. It defines: metric definitions ('NRR = (starting ARR + expansion - churn - contraction) / starting ARR, using the billing.mrr_snapshots table'), table relationships, column name translations ('cust_acq_dt = customer acquisition date'), and filter defaults ('active customer = customers where status = active AND churned_at IS NULL').
For teams on dbt, the dbt semantic layer (MetricFlow) is the right foundation. For BigQuery, Looker's LookML semantic model serves the same purpose. The LLM should query the semantic layer first to retrieve relevant metric definitions before attempting to generate SQL. This is not optional — it is the difference between a demo and a production system.
Schema-as-RAG: Dynamic Context Injection
Enterprise warehouse schemas are too large to fit in a single LLM context window. A 500-table BigQuery project produces tens of thousands of tokens of schema documentation. The solution is to treat schema documentation as a retrieval corpus: embed table and column descriptions, and retrieve only the relevant schema context for each query.
This requires good column-level documentation in your warehouse (BigQuery column descriptions, dbt model/column descriptions, or a separate schema registry). If your warehouse lacks documentation, the first project milestone is adding it — the generative BI system is only as good as the documentation it retrieves.
SQL Validation: AST Parsing Before Execution
Before executing any LLM-generated SQL, run it through an AST (Abstract Syntax Tree) parser. sqlglot is the Python library of choice — it parses SQL from any dialect and lets you inspect the parse tree to enforce rules: no DML statements (no INSERT, UPDATE, DELETE, DROP), query must include a WHERE clause with a tenant filter, no cross-table joins that violate the data access policy, query cost estimate below threshold.
This is a hard gate: LLM-generated SQL that fails validation is never executed. The LLM is given the validation error and asked to regenerate. Most validation errors self-correct in 1-2 retry iterations.
Row-Level Security: Access Control at Query Time
The access control question for generative BI is: can the analyst asking this question see the data the SQL would return? For single-tenant deployments this may be simple. For multi-tenant or role-based enterprise deployments, every generated query must be augmented with the appropriate WHERE clauses or CTEs that enforce row-level security based on the authenticated user's permissions.
Implement access control as a query rewriting middleware step, not as a post-processing filter on results. Filtering results after retrieval still runs the full query (performance hit) and does not prevent table scans over data the user should not see.
Text-to-SQL Agent with Schema RAG, Validation, and Access Control
import sqlglot
import sqlglot.expressions as exp
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
import json
from typing import Optional
from dataclasses import dataclass
@dataclass
class UserContext:
user_id: str
tenant_id: str
allowed_schemas: list[str] # e.g. ["sales", "marketing"]
row_level_filter: str # e.g. "tenant_id = 'acme-corp'"
class GenerativeBIAgent:
"""
Production Text-to-SQL agent with semantic layer retrieval,
AST-based SQL validation, and access control enforcement.
"""
FORBIDDEN_STATEMENT_TYPES = (exp.Insert, exp.Update, exp.Delete,
exp.Drop, exp.Alter, exp.Create)
def __init__(self, qdrant_host: str = "localhost"):
self.openai = OpenAI()
self.qdrant = QdrantClient(host=qdrant_host, port=6333)
# Two collections: semantic_layer (metric defs), schema_docs (table/column docs)
self.semantic_collection = "semantic_layer"
self.schema_collection = "schema_docs"
def _embed(self, text: str) -> list[float]:
resp = self.openai.embeddings.create(
input=text, model="text-embedding-3-small", dimensions=768
)
return resp.data[0].embedding
def _retrieve_semantic_context(self, query: str, tenant_id: str) -> str:
results = self.qdrant.search(
collection_name=self.semantic_collection,
query_vector=self._embed(query),
query_filter=Filter(must=[
FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))
]),
limit=5,
with_payload=True,
)
if not results:
return ""
return "\n".join(
f"METRIC: {r.payload['name']}\nDEFINITION: {r.payload['definition']}"
for r in results
)
def _retrieve_schema_context(self, query: str, allowed_schemas: list[str]) -> str:
results = self.qdrant.search(
collection_name=self.schema_collection,
query_vector=self._embed(query),
query_filter=Filter(must=[
FieldCondition(key="schema_name",
match=MatchValue(value=s)) for s in allowed_schemas
]),
limit=10,
with_payload=True,
)
return "\n".join(
f"TABLE: {r.payload['table']} | COLUMN: {r.payload['column']} | "
f"TYPE: {r.payload['type']} | DESC: {r.payload['description']}"
for r in results
)
def _validate_sql(self, sql: str, user: UserContext) -> tuple[bool, str]:
try:
parsed = sqlglot.parse_one(sql, dialect="bigquery")
except Exception as e:
return False, f"SQL parse error: {e}"
# Block DML/DDL statements
for forbidden_type in self.FORBIDDEN_STATEMENT_TYPES:
if isinstance(parsed, forbidden_type):
return False, f"Forbidden statement type: {type(parsed).__name__}"
# Ensure only allowed schemas are referenced
for table in parsed.find_all(exp.Table):
schema = table.db
if schema and schema not in user.allowed_schemas:
return False, f"Access denied to schema: {schema}"
return True, "valid"
def _enforce_access_control(self, sql: str, user: UserContext) -> str:
"""Rewrite SQL to add row-level security filter."""
# Append tenant filter as a subquery wrapper
return f"SELECT * FROM ({sql}) WHERE {user.row_level_filter}"
def generate_sql(
self, natural_language_query: str, user: UserContext, max_retries: int = 3
) -> Optional[str]:
semantic_ctx = self._retrieve_semantic_context(
natural_language_query, user.tenant_id
)
schema_ctx = self._retrieve_schema_context(
natural_language_query, user.allowed_schemas
)
system_prompt = f"""You are a SQL generation assistant for BigQuery.
Generate a single SELECT query only. No DML or DDL.
Metric definitions (use these exact definitions for named metrics):
{semantic_ctx or 'No relevant metrics found.'}
Relevant schema:
{schema_ctx or 'No relevant schema found.'}
Respond with JSON: {{"sql": "your query here", "explanation": "one sentence"}}"""
for attempt in range(max_retries):
resp = self.openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": natural_language_query},
],
temperature=0,
response_format={"type": "json_object"},
)
result = json.loads(resp.choices[0].message.content)
sql = result.get("sql", "")
valid, reason = self._validate_sql(sql, user)
if valid:
return self._enforce_access_control(sql, user)
else:
# Feed validation error back for self-correction
natural_language_query += f"\n\nPrevious attempt failed validation: {reason}. Correct and retry."
return None # Failed after max_retries
# Usage
if __name__ == "__main__":
agent = GenerativeBIAgent()
user = UserContext(
user_id="user-123",
tenant_id="acme-corp",
allowed_schemas=["sales", "marketing"],
row_level_filter="tenant_id = 'acme-corp'"
)
sql = agent.generate_sql(
"What was our NRR trend over the last 6 quarters for enterprise accounts?",
user=user
)
print(sql)Production Text-to-SQL agent with schema RAG, AST-based SQL validation using sqlglot, and access control rewriting. The validation loop feeds AST parse errors back to the LLM for self-correction — in testing, 95%+ of validation failures self-correct within 2 retries. The access control wrapper is applied after validation to prevent the model from seeing or influencing the tenant filter.
Multi-turn conversation for iterative analysis is the feature that converts generative BI from a curiosity to a daily driver for analysts. Instead of each query being a one-shot translation, maintain conversation history so analysts can ask follow-up questions: 'now break that down by product line,' 'filter to accounts acquired in the last 12 months,' 'show me the same metric for last year.' The conversation history provides implicit context that dramatically improves SQL accuracy on refinement queries.
Prerequisites Before Building Generative BI
- Column-level documentation in your warehouse: every table and column needs a human-readable description. Start with your 20 most-queried tables.
- A business glossary that formally defines your 10-20 core metrics. 'Revenue' and 'ARR' should have precise SQL definitions documented, not just informal descriptions.
- Row-level security policy documented in code, not just in human knowledge. Every table with sensitive data needs a documented access control rule.
- A query sandbox environment for generative BI execution, separate from your production query slot allocation — LLM-generated queries can be expensive and slow.
- An audit log of every generated query and its result. This is a compliance requirement for most regulated industries and essential for debugging.
Inductivee's Approach to Generative BI Deployments
The engagements where generative BI succeeds are the ones where the data team treated the semantic layer as a first-class deliverable, not an afterthought. The teams that rush to the LLM integration before building the semantic layer spend three months debugging hallucinated metric definitions and then end up rebuilding the semantic layer anyway — just under crisis conditions with a broken system already in production.
Our standard engagement structure is four weeks on semantic layer and schema documentation before a single LLM call is made. Week one: audit the 50 most-queried tables and identify the top 20 business metrics. Week two: document metric definitions in a structured format that can be embedded and retrieved. Week three: implement schema-as-RAG and validate retrieval quality with sample queries. Week four: connect the LLM, implement validation and access control, and run the evaluation harness against 100 representative analyst queries.
For teams on dbt, the MetricFlow integration with the dbt semantic layer is the cleanest path — your metric definitions already exist in your dbt models, and LlamaIndex's NLSQLTableQueryEngine with dbt schema extraction can be configured to pull them automatically.
Frequently Asked Questions
What is Text-to-SQL and why is it hard in production?
What is a semantic layer for generative BI?
How do you prevent SQL injection in LLM-generated queries?
What is the best way to handle large schemas in Text-to-SQL?
How accurate is generative BI in production?
Written By
Inductivee Team
AuthorAgentic AI Engineering Team
The Inductivee engineering team — a remote-first group of multi-agent orchestration specialists, RAG pipeline architects, and data liquidity engineers who have shipped 40+ agentic deployments across 25+ enterprises since 2012. Our writing is grounded in what we actually build, break, and operate in production.
Inductivee is a remote-first agentic AI engineering firm with 40+ production deployments across 25+ enterprises since 2012. Our engineering content is written by active practitioners and technically reviewed before publication. Compliance: SOC2 Type II, HIPAA, GDPR, ISO 27001.
Engineer This With Inductivee
The engineering patterns in this article are what our team builds into production every day. Explore the related service to see how we deliver this capability at enterprise scale.
Related Articles
Agentic ETL: Automating Data Pipeline Orchestration with Multi-Agent Systems
Enterprise Data Liquidity: The Engineering Framework for an AI-Ready Knowledge Base
Semantic Search for Enterprise Knowledge Bases: Engineering Beyond Full-Text
Ready to Build This Into Your Enterprise?
Inductivee engineers agentic systems, RAG pipelines, and enterprise data liquidity solutions. Let's scope your project.
Start a Project