- Published on
Text-to-SQL With LLMs — Building a Natural Language Database Interface
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Converting natural language to SQL queries is one of the most practical applications of LLMs in enterprise environments. A user can ask "show me all customers who purchased in the last 30 days" and immediately get results without needing SQL knowledge. However, building a system that handles complex schemas, prevents SQL injection, and validates queries reliably requires careful engineering.
- Schema-Aware Prompting
- Multi-Shot Examples in Prompts
- Iterative Refinement Loop
- SQL Injection Prevention
- Complex Query Decomposition
- Explaining Query Results in Natural Language
- Supported SQL Dialects
- Read-Only Connection Enforcement
- Usage Analytics and Monitoring
- Checklist
- Conclusion
Schema-Aware Prompting
The foundation of reliable text-to-SQL is injecting the complete table schema into your prompt. Rather than hoping the LLM knows your database structure, explicitly provide it:
async function buildSchemaMeta(db: Database): Promise<string> {
const tables = await db.query(`
SELECT table_name, column_name, data_type
FROM information_schema.columns
ORDER BY table_name, ordinal_position
`);
let schema = 'DATABASE SCHEMA:\n';
let currentTable = '';
for (const col of tables) {
if (col.table_name !== currentTable) {
currentTable = col.table_name;
schema += `\n${currentTable}:\n`;
}
schema += ` - ${col.column_name} (${col.data_type})\n`;
}
return schema;
}
Include constraints, indexes, and relationships. The more context you provide, the more accurate the LLM's SQL generation becomes.
Multi-Shot Examples in Prompts
Few-shot examples dramatically improve accuracy. Include 3-5 realistic query examples with expected SQL:
const FEW_SHOT_EXAMPLES = `
EXAMPLES:
1. User: "How many active users do we have?"
SELECT COUNT(*) as active_count FROM users WHERE status = 'active';
2. User: "List orders over $100 placed this month"
SELECT order_id, amount, created_at
FROM orders
WHERE amount > 100
AND YEAR(created_at) = YEAR(NOW())
AND MONTH(created_at) = MONTH(NOW());
3. User: "Which customers spent the most last quarter?"
SELECT customer_id, SUM(amount) as total_spent
FROM orders
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 3 MONTH)
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;
`;
Select examples that match your most common query patterns. Update this set as you discover failing queries.
Iterative Refinement Loop
Most text-to-SQL queries fail on first attempt. Build an iterative refinement system:
interface QueryAttempt {
naturalLanguage: string;
generatedSql: string;
executionError?: string;
attemptNumber: number;
}
async function refineQuery(
attempt: QueryAttempt,
maxAttempts: number = 3
): Promise<any[]> {
if (attempt.attemptNumber >= maxAttempts) {
throw new Error('Max refinement attempts exceeded');
}
try {
const result = await db.query(attempt.generatedSql);
return result;
} catch (error) {
const refinementPrompt = `
Original query: "${attempt.naturalLanguage}"
Generated SQL: ${attempt.generatedSql}
Error: ${error.message}
Fix the SQL query. Return ONLY the corrected SQL.
`;
const refinedSql = await llm.generate(refinementPrompt);
return refineQuery({
naturalLanguage: attempt.naturalLanguage,
generatedSql: refinedSql,
executionError: error.message,
attemptNumber: attempt.attemptNumber + 1
}, maxAttempts);
}
}
This loop catches syntax errors, type mismatches, and schema issues automatically.
SQL Injection Prevention
Never execute LLM-generated SQL directly. Always validate and sanitize:
async function sanitizeAndExecute(
sql: string,
allowedDialect: 'postgres' | 'mysql' | 'sqlite'
): Promise<any[]> {
// Parse SQL structure
const parsed = sqlParser.parse(sql);
// Reject dangerous operations
const forbiddenOps = ['DROP', 'TRUNCATE', 'DELETE', 'ALTER', 'INSERT', 'UPDATE'];
if (forbiddenOps.some(op => sql.toUpperCase().includes(op))) {
throw new Error('Write operations not allowed');
}
// Validate against schema
for (const table of parsed.tables) {
if (!allowedTables.has(table.name)) {
throw new Error(`Table "${table.name}" not accessible`);
}
}
// Enforce read-only connection
const readOnlyConn = pool.getReadOnlyConnection();
return readOnlyConn.query(sql);
}
Maintain an allowlist of accessible tables. Enforce read-only database credentials for text-to-SQL endpoints.
Complex Query Decomposition
For ambiguous multi-part questions, decompose into multiple simpler queries:
interface QueryPlan {
steps: Array<{ question: string; sql: string }>;
finalQuery: string;
}
async function decomposeQuery(
userQuestion: string
): Promise<QueryPlan> {
const decompositionPrompt = `
Break down this question into SQL sub-queries: "${userQuestion}"
Return a JSON array with step-by-step queries.
`;
const plan = JSON.parse(await llm.generate(decompositionPrompt));
let results: Record<string, any> = {};
for (const step of plan.steps) {
results[step.question] = await db.query(step.sql);
}
// Use sub-query results to construct final query
const finalSql = await llm.generate(`
Based on intermediate results: ${JSON.stringify(results)}
Write the final SQL to answer: "${userQuestion}"
`);
return { steps: plan.steps, finalQuery: finalSql };
}
Explaining Query Results in Natural Language
Convert raw SQL results back into readable narratives:
async function explainResults(
userQuestion: string,
sqlQuery: string,
results: any[]
): Promise<string> {
const explanation = await llm.generate(`
User asked: "${userQuestion}"
SQL executed: ${sqlQuery}
Results (first 5): ${JSON.stringify(results.slice(0, 5))}
Explain these results in plain English for a non-technical user.
Keep it concise (2-3 sentences).
`);
return explanation;
}
Supported SQL Dialects
Design your system to support multiple SQL dialects:
type SqlDialect = 'postgres' | 'mysql' | 'sqlite' | 'snowflake' | 'bigquery';
async function generateSqlForDialect(
question: string,
dialect: SqlDialect,
schema: string
): Promise<string> {
const dialertHints = {
postgres: 'Use PostgreSQL syntax. Prefer JSONB for semi-structured data.',
mysql: 'Use MySQL 8.0+. Window functions available.',
snowflake: 'Use UPPERCASE for keywords. Semi-structured data in VARIANT.',
bigquery: 'Use ARRAY and STRUCT types. Unnesting patterns common.'
};
return llm.generate(`
${schema}
Generate ${dialect} SQL for: "${question}"
Dialect notes: ${dialertHints[dialect]}
Return ONLY the SQL query.
`);
}
Read-Only Connection Enforcement
Prevent accidental or malicious writes with connection pooling:
class SafeDbPool {
private readOnlyPool: Pool;
private writePool: Pool; // Separate, restricted
constructor(config: DbConfig) {
this.readOnlyPool = new Pool({
...config,
user: config.readOnlyUser,
password: config.readOnlyPassword
});
this.writePool = new Pool(config);
}
getReadOnlyConnection() {
return this.readOnlyPool;
}
getWriteConnection() {
// Only used for admin operations, never exposed to LLM
return this.writePool;
}
}
Usage Analytics and Monitoring
Track which queries are generated and executed:
interface QueryLog {
id: string;
userQuestion: string;
generatedSql: string;
executionTimeMs: number;
resultCount: number;
successfulAttempt: number;
timestamp: Date;
}
async function logQuery(log: QueryLog): Promise<void> {
await analyticsDb.insert('query_logs', log);
// Alert on suspicious patterns
if (log.resultCount > 1000000) {
await alertOps('Large query detected', log);
}
}
Monitor for queries that consistently fail at attempt 1, patterns indicating schema misunderstandings, and unusually expensive queries.
Checklist
- Inject complete schema metadata with constraints and relationships
- Include 3-5 relevant few-shot examples in your system prompt
- Build an iterative refinement loop that executes → catches errors → fixes
- Parse and validate SQL before execution; reject write operations
- Maintain allowlists of accessible tables and columns
- Use read-only database credentials for text-to-SQL
- Handle multiple SQL dialects in a dialect-aware prompt
- Log all queries for monitoring and improvement
- Test with complex, ambiguous, and adversarial questions
- Set up alerts for slow, large, or repeated failing queries
Conclusion
Text-to-SQL systems unlock database access for non-technical users and reduce engineering overhead for common analytics queries. By combining schema-aware prompting, multi-shot examples, iterative refinement, and strict validation, you can build systems that are both powerful and safe. Start with read-only access, monitor failures, and gradually expand capabilities as confidence grows.