Data Engineering

Generative BI: Connecting LLMs to Your Enterprise Data Warehouse

Natural language queries against a data warehouse sound simple. Making them accurate, safe, and performant in production requires a semantic layer, schema understanding, query validation, and careful access control architecture.

Inductivee Team· AI EngineeringOctober 14, 2025(updated April 15, 2026)12 min read

TL;DR

Naive Text-to-SQL fails in production for four reasons: schema hallucination, SQL injection exposure, no access control, and context window overload from large schemas. The production architecture requires a semantic layer (business glossary and metric definitions), schema-as-RAG for dynamic context injection, AST-based SQL validation before execution, and row-level security enforcement at the query layer. Teams that skip the semantic layer spend months debugging incorrect but plausible-looking SQL.

Why Naive Text-to-SQL Fails at the Enterprise Scale

The demo is always convincing: type 'what was revenue last quarter by region?' and watch the LLM generate correct SQL and return a formatted table. Then you hand it to a business analyst who asks 'what is our NRR trend for enterprise accounts over the last six quarters?' and the model confidently generates SQL that queries the wrong table, uses a column that was renamed two years ago, and calculates NRR as a simple retention rate rather than net revenue retention as your finance team defines it.

Enterprise data warehouses are not clean schemas with self-explanatory column names. They are 10-20 year accumulations of naming conventions from three acquisitions, two BI platform migrations, and a dozen data engineers with different philosophies. The same concept — 'active customer' — may be defined differently in the marketing, finance, and product tables. Without a semantic layer that tells the LLM exactly how your organisation defines its key metrics, it will make plausible-sounding but incorrect assumptions.

The second failure mode is more dangerous: SQL injection. An LLM generating SQL from user natural language input is a classic injection vector if the generated SQL is executed without validation. Even well-intentioned queries can inadvertently produce DELETE or DROP statements, expose cross-tenant data, or run Cartesian joins that bring down query performance for all users. The production architecture must treat LLM-generated SQL as untrusted input.

The Four Components of a Production Generative BI Architecture

Semantic Layer: The Business Glossary

The semantic layer is a curated mapping from business concepts to SQL constructs. It defines: metric definitions ('NRR = (starting ARR + expansion - churn - contraction) / starting ARR, using the billing.mrr_snapshots table'), table relationships, column name translations ('cust_acq_dt = customer acquisition date'), and filter defaults ('active customer = customers where status = active AND churned_at IS NULL').

For teams on dbt, the dbt semantic layer (MetricFlow) is the right foundation. For BigQuery, Looker's LookML semantic model serves the same purpose. The LLM should query the semantic layer first to retrieve relevant metric definitions before attempting to generate SQL. This is not optional — it is the difference between a demo and a production system.

Schema-as-RAG: Dynamic Context Injection

Enterprise warehouse schemas are too large to fit in a single LLM context window. A 500-table BigQuery project produces tens of thousands of tokens of schema documentation. The solution is to treat schema documentation as a retrieval corpus: embed table and column descriptions, and retrieve only the relevant schema context for each query.

This requires good column-level documentation in your warehouse (BigQuery column descriptions, dbt model/column descriptions, or a separate schema registry). If your warehouse lacks documentation, the first project milestone is adding it — the generative BI system is only as good as the documentation it retrieves.

SQL Validation: AST Parsing Before Execution

Before executing any LLM-generated SQL, run it through an AST (Abstract Syntax Tree) parser. sqlglot is the Python library of choice — it parses SQL from any dialect and lets you inspect the parse tree to enforce rules: no DML statements (no INSERT, UPDATE, DELETE, DROP), query must include a WHERE clause with a tenant filter, no cross-table joins that violate the data access policy, query cost estimate below threshold.

This is a hard gate: LLM-generated SQL that fails validation is never executed. The LLM is given the validation error and asked to regenerate. Most validation errors self-correct in 1-2 retry iterations.

Row-Level Security: Access Control at Query Time

The access control question for generative BI is: can the analyst asking this question see the data the SQL would return? For single-tenant deployments this may be simple. For multi-tenant or role-based enterprise deployments, every generated query must be augmented with the appropriate WHERE clauses or CTEs that enforce row-level security based on the authenticated user's permissions.

Implement access control as a query rewriting middleware step, not as a post-processing filter on results. Filtering results after retrieval still runs the full query (performance hit) and does not prevent table scans over data the user should not see.

Text-to-SQL Agent with Schema RAG, Validation, and Access Control

python

import sqlglot
import sqlglot.expressions as exp
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue
import json
from typing import Optional
from dataclasses import dataclass


@dataclass
class UserContext:
    user_id: str
    tenant_id: str
    allowed_schemas: list[str]  # e.g. ["sales", "marketing"]
    row_level_filter: str       # e.g. "tenant_id = 'acme-corp'"


class GenerativeBIAgent:
    """
    Production Text-to-SQL agent with semantic layer retrieval,
    AST-based SQL validation, and access control enforcement.
    """
    FORBIDDEN_STATEMENT_TYPES = (exp.Insert, exp.Update, exp.Delete,
                                  exp.Drop, exp.Alter, exp.Create)

    def __init__(self, qdrant_host: str = "localhost"):
        self.openai = OpenAI()
        self.qdrant = QdrantClient(host=qdrant_host, port=6333)
        # Two collections: semantic_layer (metric defs), schema_docs (table/column docs)
        self.semantic_collection = "semantic_layer"
        self.schema_collection = "schema_docs"

    def _embed(self, text: str) -> list[float]:
        resp = self.openai.embeddings.create(
            input=text, model="text-embedding-3-small", dimensions=768
        )
        return resp.data[0].embedding

    def _retrieve_semantic_context(self, query: str, tenant_id: str) -> str:
        results = self.qdrant.search(
            collection_name=self.semantic_collection,
            query_vector=self._embed(query),
            query_filter=Filter(must=[
                FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))
            ]),
            limit=5,
            with_payload=True,
        )
        if not results:
            return ""
        return "\n".join(
            f"METRIC: {r.payload['name']}\nDEFINITION: {r.payload['definition']}"
            for r in results
        )

    def _retrieve_schema_context(self, query: str, allowed_schemas: list[str]) -> str:
        results = self.qdrant.search(
            collection_name=self.schema_collection,
            query_vector=self._embed(query),
            query_filter=Filter(must=[
                FieldCondition(key="schema_name",
                               match=MatchValue(value=s)) for s in allowed_schemas
            ]),
            limit=10,
            with_payload=True,
        )
        return "\n".join(
            f"TABLE: {r.payload['table']} | COLUMN: {r.payload['column']} | "
            f"TYPE: {r.payload['type']} | DESC: {r.payload['description']}"
            for r in results
        )

    def _validate_sql(self, sql: str, user: UserContext) -> tuple[bool, str]:
        try:
            parsed = sqlglot.parse_one(sql, dialect="bigquery")
        except Exception as e:
            return False, f"SQL parse error: {e}"

        # Block DML/DDL statements
        for forbidden_type in self.FORBIDDEN_STATEMENT_TYPES:
            if isinstance(parsed, forbidden_type):
                return False, f"Forbidden statement type: {type(parsed).__name__}"

        # Ensure only allowed schemas are referenced
        for table in parsed.find_all(exp.Table):
            schema = table.db
            if schema and schema not in user.allowed_schemas:
                return False, f"Access denied to schema: {schema}"

        return True, "valid"

    def _enforce_access_control(self, sql: str, user: UserContext) -> str:
        """Rewrite SQL to add row-level security filter."""
        # Append tenant filter as a subquery wrapper
        return f"SELECT * FROM ({sql}) WHERE {user.row_level_filter}"

    def generate_sql(
        self, natural_language_query: str, user: UserContext, max_retries: int = 3
    ) -> Optional[str]:
        semantic_ctx = self._retrieve_semantic_context(
            natural_language_query, user.tenant_id
        )
        schema_ctx = self._retrieve_schema_context(
            natural_language_query, user.allowed_schemas
        )
        system_prompt = f"""You are a SQL generation assistant for BigQuery.
Generate a single SELECT query only. No DML or DDL.

Metric definitions (use these exact definitions for named metrics):
{semantic_ctx or 'No relevant metrics found.'}

Relevant schema:
{schema_ctx or 'No relevant schema found.'}

Respond with JSON: {{"sql": "your query here", "explanation": "one sentence"}}"""

        for attempt in range(max_retries):
            resp = self.openai.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": natural_language_query},
                ],
                temperature=0,
                response_format={"type": "json_object"},
            )
            result = json.loads(resp.choices[0].message.content)
            sql = result.get("sql", "")

            valid, reason = self._validate_sql(sql, user)
            if valid:
                return self._enforce_access_control(sql, user)
            else:
                # Feed validation error back for self-correction
                natural_language_query += f"\n\nPrevious attempt failed validation: {reason}. Correct and retry."

        return None  # Failed after max_retries


# Usage
if __name__ == "__main__":
    agent = GenerativeBIAgent()
    user = UserContext(
        user_id="user-123",
        tenant_id="acme-corp",
        allowed_schemas=["sales", "marketing"],
        row_level_filter="tenant_id = 'acme-corp'"
    )
    sql = agent.generate_sql(
        "What was our NRR trend over the last 6 quarters for enterprise accounts?",
        user=user
    )
    print(sql)

Production Text-to-SQL agent with schema RAG, AST-based SQL validation using sqlglot, and access control rewriting. The validation loop feeds AST parse errors back to the LLM for self-correction — in testing, 95%+ of validation failures self-correct within 2 retries. The access control wrapper is applied after validation to prevent the model from seeing or influencing the tenant filter.

Tip

Multi-turn conversation for iterative analysis is the feature that converts generative BI from a curiosity to a daily driver for analysts. Instead of each query being a one-shot translation, maintain conversation history so analysts can ask follow-up questions: 'now break that down by product line,' 'filter to accounts acquired in the last 12 months,' 'show me the same metric for last year.' The conversation history provides implicit context that dramatically improves SQL accuracy on refinement queries.

Prerequisites Before Building Generative BI

Column-level documentation in your warehouse: every table and column needs a human-readable description. Start with your 20 most-queried tables.
A business glossary that formally defines your 10-20 core metrics. 'Revenue' and 'ARR' should have precise SQL definitions documented, not just informal descriptions.
Row-level security policy documented in code, not just in human knowledge. Every table with sensitive data needs a documented access control rule.
A query sandbox environment for generative BI execution, separate from your production query slot allocation — LLM-generated queries can be expensive and slow.
An audit log of every generated query and its result. This is a compliance requirement for most regulated industries and essential for debugging.

Inductivee's Approach to Generative BI Deployments

The engagements where generative BI succeeds are the ones where the data team treated the semantic layer as a first-class deliverable, not an afterthought. The teams that rush to the LLM integration before building the semantic layer spend three months debugging hallucinated metric definitions and then end up rebuilding the semantic layer anyway — just under crisis conditions with a broken system already in production.

Our standard engagement structure is four weeks on semantic layer and schema documentation before a single LLM call is made. Week one: audit the 50 most-queried tables and identify the top 20 business metrics. Week two: document metric definitions in a structured format that can be embedded and retrieved. Week three: implement schema-as-RAG and validate retrieval quality with sample queries. Week four: connect the LLM, implement validation and access control, and run the evaluation harness against 100 representative analyst queries.

For teams on dbt, the MetricFlow integration with the dbt semantic layer is the cleanest path — your metric definitions already exist in your dbt models, and LlamaIndex's NLSQLTableQueryEngine with dbt schema extraction can be configured to pull them automatically.

Frequently Asked Questions

What is Text-to-SQL and why is it hard in production?

Text-to-SQL converts natural language questions into SQL queries using LLMs. It is hard in production because enterprise schemas use non-obvious naming conventions, business metrics have organisation-specific definitions that LLMs cannot guess, generated SQL must be validated against injection and access control policies, and large schemas exceed LLM context windows. The production solution requires a semantic layer, schema-as-RAG, and SQL validation middleware.

What is a semantic layer for generative BI?

A semantic layer is a curated mapping from business concepts to SQL constructs — it formally defines metrics, table relationships, column name translations, and filter defaults. For generative BI, it provides the LLM with organisation-specific metric definitions (NRR, ARR, active customer) that prevent hallucinated interpretations. dbt's MetricFlow and Looker's LookML are the standard enterprise implementations.

How do you prevent SQL injection in LLM-generated queries?

Parse every LLM-generated SQL statement through an AST parser (sqlglot is the standard Python library) before execution. Validate that: the statement type is SELECT only (no INSERT, UPDATE, DELETE, DROP), only permitted schemas and tables are referenced, and cost estimates are within threshold. Treat the LLM as an untrusted input source — never execute generated SQL without AST validation.

What is the best way to handle large schemas in Text-to-SQL?

Embed table and column descriptions as a vector retrieval corpus and dynamically inject only the relevant schema context for each query. For a 500-table warehouse, the full schema exceeds context window limits and degrades query quality. Retrieve the 5-10 most relevant tables per query using semantic similarity search, providing focused context that improves SQL accuracy and reduces token cost by 80-90%.

How accurate is generative BI in production?

With a well-implemented semantic layer, schema RAG, and SQL validation, production Text-to-SQL systems achieve 85-92% query accuracy on a representative sample of analyst queries. Without a semantic layer, accuracy drops to 60-70% on metric-heavy queries. Accuracy is significantly higher for simple aggregation and filter queries than for complex multi-join or window function queries.

Written By

Inductivee Team

Author

Agentic AI Engineering Team

The Inductivee engineering team — a remote-first group of multi-agent orchestration specialists, RAG pipeline architects, and data liquidity engineers who have shipped 40+ agentic deployments across 25+ enterprises since 2012. Our writing is grounded in what we actually build, break, and operate in production.

Agentic AI ArchitectureMulti-Agent OrchestrationLangChainLangGraphCrewAIMicrosoft AutoGen

LinkedIn profile

Inductivee is a remote-first agentic AI engineering firm with 40+ production deployments across 25+ enterprises since 2012. Our engineering content is written by active practitioners and technically reviewed before publication. Compliance: SOC2 Type II, HIPAA, GDPR, ISO 27001.

Engineer This With Inductivee

The engineering patterns in this article are what our team builds into production every day. Explore the related service to see how we deliver this capability at enterprise scale.

Service

Ready to Build This Into Your Enterprise?

Inductivee engineers agentic systems, RAG pipelines, and enterprise data liquidity solutions. Let's scope your project.

Start a Project

We value your privacy