- Published on
AI-Powered Data Analysis — Natural Language to Insights Without Code
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Most business users can''t write SQL or Python. They have questions: "What was our revenue trend last quarter?" or "Which product had the highest churn rate?" Building systems that convert natural language to executable analysis pipelines democratizes data access. This guide covers the full stack: question interpretation, data retrieval, analysis execution, chart generation, and result narration.
- Natural Language to Pandas/SQL Pipeline
- Chart Generation from Data Analysis
- Statistical Interpretation in Plain English
- Anomaly Explanation
- Trend Narration
- Report Generation Pipeline
- Data Validation Before Analysis
- Iterative Analysis (Follow-Up Questions)
- Multi-Dataset JOIN Handling
- Sharing and Exporting AI Analysis
- Checklist
- Conclusion
Natural Language to Pandas/SQL Pipeline
Convert questions into structured analysis:
interface AnalysisRequest {
question: string;
userId: string;
datasets: string[];
timeframe?: { startDate: Date; endDate: Date };
constraints?: Record<string, any>;
}
interface AnalysisPlan {
sqlQueries: string[];
pythonCode: string;
chartTypes: Array<'line' | 'bar' | 'scatter' | 'heatmap' | 'pie'>;
analysisType: 'summary' | 'trend' | 'comparison' | 'correlation';
}
async function planAnalysis(
request: AnalysisRequest
): Promise<AnalysisPlan> {
const planningPrompt = `
User question: "${request.question}"
Available datasets: ${request.datasets.join(', ')}
Timeframe: ${request.timeframe ? JSON.stringify(request.timeframe) : 'any'}
Generate an analysis plan:
1. What SQL queries are needed? (one per line, executable PostgreSQL)
2. Any Python transformations or calculations?
3. What chart types best show these insights?
4. What is the analysis type (summary/trend/comparison)?
Return JSON: { sqlQueries, pythonCode, chartTypes, analysisType }
`;
return JSON.parse(await llm.generate(planningPrompt));
}
async function executeAnalysis(
plan: AnalysisPlan,
dbConnection: any
): Promise<Map<string, any>> {
const results = new Map<string, any>();
// Execute SQL queries
for (const [idx, query] of plan.sqlQueries.entries()) {
const data = await dbConnection.query(query);
results.set(`query_${idx}`, data);
}
// Execute Python code on results
if (plan.pythonCode) {
const pythonContext = Object.fromEntries(results);
const pyResult = await executePython(plan.pythonCode, pythonContext);
results.set('python_output', pyResult);
}
return results;
}
Chart Generation from Data Analysis
Create visualizations automatically:
interface Chart {
type: 'line' | 'bar' | 'scatter' | 'heatmap' | 'pie';
title: string;
data: Record<string, any>;
configuration: Record<string, any>;
svg?: string;
}
async function generateCharts(
analysisResults: Map<string, any>,
plan: AnalysisPlan
): Promise<Chart[]> {
const charts: Chart[] = [];
for (const [idx, chartType] of plan.chartTypes.entries()) {
const dataKey = `query_${idx}`;
const data = analysisResults.get(dataKey);
if (!data) continue;
const chartConfig = await generateChartConfig(
chartType,
data,
extractTableMetadata(data)
);
charts.push({
type: chartType,
title: generateChartTitle(plan.analysisType, data),
data,
configuration: chartConfig,
svg: await renderChart(chartType, data, chartConfig)
});
}
return charts;
}
function generateChartConfig(
type: string,
data: any[],
metadata: any
): Record<string, any> {
const baseConfig = {
responsive: true,
maintainAspectRatio: true,
plugins: {
legend: { position: 'top' },
title: { display: true }
}
};
switch (type) {
case 'line':
return {
...baseConfig,
scales: { y: { beginAtZero: true } },
tension: 0.4
};
case 'bar':
return baseConfig;
case 'scatter':
return {
...baseConfig,
scales: { x: { type: 'linear' }, y: { beginAtZero: false } }
};
default:
return baseConfig;
}
}
Statistical Interpretation in Plain English
Narrate the data:
interface StatisticalInsight {
metric: string;
value: number;
interpretation: string;
significance: 'critical' | 'significant' | 'noteworthy' | 'unremarkable';
}
async function interpretStatistics(
data: any[],
analysisType: string
): Promise<StatisticalInsight[]> {
const insights: StatisticalInsight[] = [];
// Calculate basic stats
const numericColumns = identifyNumericColumns(data);
for (const col of numericColumns) {
const values = data.map((row: any) => row[col]);
const stats = {
mean: calculateMean(values),
median: calculateMedian(values),
stdDev: calculateStdDev(values),
min: Math.min(...values),
max: Math.max(...values),
trend: detectTrend(values)
};
const interpretationPrompt = `
Interpret these statistics in plain English:
Metric: ${col}
Mean: ${stats.mean}
Median: ${stats.median}
Std Dev: ${stats.stdDev}
Range: ${stats.min} to ${stats.max}
Trend: ${stats.trend}
What do these numbers mean for a business decision?
Is this value significant or expected?
`;
const interpretation = await llm.generate(interpretationPrompt);
insights.push({
metric: col,
value: stats.mean,
interpretation,
significance: assessSignificance(stats)
});
}
return insights;
}
function assessSignificance(stats: any): string {
if (stats.stdDev / stats.mean > 0.5) return 'critical';
if (stats.trend !== 'stable') return 'significant';
return 'noteworthy';
}
Anomaly Explanation
Detect and explain unusual patterns:
interface Anomaly {
point: any;
metric: string;
deviation: number;
explanation: string;
likelyReason?: string;
}
async function detectAnomalies(
data: any[],
context: any
): Promise<Anomaly[]> {
const anomalies: Anomaly[] = [];
for (const column of identifyNumericColumns(data)) {
const values = data.map((row: any) => row[column]);
const mean = calculateMean(values);
const stdDev = calculateStdDev(values);
for (let i = 0; i < data.length; i++) {
const value = values[i];
const zscore = Math.abs((value - mean) / stdDev);
if (zscore > 2.5) {
const explanation = await llm.generate(`
This data point is unusual:
Metric: ${column}
Expected: ${mean}
Actual: ${value}
Standard deviation: ${stdDev}
Context: ${JSON.stringify(context)}
Date: ${data[i].date}
Explain why this spike or drop might have occurred.
`);
anomalies.push({
point: data[i],
metric: column,
deviation: zscore,
explanation
});
}
}
}
return anomalies;
}
Trend Narration
Convert trend data into narrative:
async function narrateTrend(
data: any[],
timeColumn: string,
metricColumn: string
): Promise<string> {
const timeSeries = data.sort((a, b) =>
new Date(a[timeColumn]).getTime() - new Date(b[timeColumn]).getTime()
);
const values = timeSeries.map((row: any) => row[metricColumn]);
const trend = calculateTrendDirection(values);
const momentum = calculateMomentum(values);
const volatility = calculateVolatility(values);
const narrationPrompt = `
Convert this time series into a business narrative:
Metric: ${metricColumn}
Time range: ${timeSeries[0][timeColumn]} to ${timeSeries[timeSeries.length - 1][timeColumn]}
Values: ${values.slice(0, 10).join(', ')}... (${values.length} total points)
Trend: ${trend}
Momentum: ${momentum}
Volatility: ${volatility}
Write a 2-3 sentence summary suitable for an executive dashboard.
Include: current level, direction, and momentum.
`;
return llm.generate(narrationPrompt);
}
function calculateTrendDirection(values: number[]): string {
const recent = values.slice(-10);
const older = values.slice(0, 10);
const recentAvg = calculateMean(recent);
const olderAvg = calculateMean(older);
if (recentAvg > olderAvg * 1.1) return 'increasing';
if (recentAvg < olderAvg * 0.9) return 'decreasing';
return 'stable';
}
Report Generation Pipeline
Assemble insights into reports:
interface DataReport {
title: string;
executiveSummary: string;
insights: StatisticalInsight[];
charts: Chart[];
anomalies: Anomaly[];
recommendations: string[];
generatedAt: Date;
dataQuality: number; // 0-1
}
async function generateReport(
question: string,
analysisResults: Map<string, any>,
plan: AnalysisPlan
): Promise<DataReport> {
const data = analysisResults.get('query_0');
const statistics = await interpretStatistics(data, plan.analysisType);
const anomalies = await detectAnomalies(data, {});
const charts = await generateCharts(analysisResults, plan);
const trendNarrative = await narrateTrend(
data,
identifyTimeColumn(data),
identifyMetricColumn(data)
);
const reportPrompt = `
Create an executive summary from this analysis:
Original question: "${question}"
Analysis type: ${plan.analysisType}
Key insights: ${statistics.map(s => s.interpretation).join('\n')}
Trends: ${trendNarrative}
Anomalies found: ${anomalies.length}
Write a 3-4 sentence executive summary with key takeaways and recommendations.
`;
const summary = await llm.generate(reportPrompt);
const recommendations = await generateRecommendations(statistics);
return {
title: `Analysis: ${question}`,
executiveSummary: summary,
insights: statistics,
charts,
anomalies,
recommendations,
generatedAt: new Date(),
dataQuality: await assessDataQuality(data)
};
}
async function generateRecommendations(
insights: StatisticalInsight[]
): Promise<string[]> {
const criticalInsights = insights.filter(i => i.significance === 'critical');
const recommendations = [];
for (const insight of criticalInsights) {
const rec = await llm.generate(`
Based on this insight: "${insight.interpretation}"
What action should the business take?
Keep to 1 sentence.
`);
recommendations.push(rec);
}
return recommendations;
}
Data Validation Before Analysis
Ensure data quality:
interface DataQualityReport {
totalRows: number;
nullCount: number;
duplicateCount: number;
outlierCount: number;
isAnalyzable: boolean;
warnings: string[];
}
async function validateData(data: any[]): Promise<DataQualityReport> {
const warnings: string[] = [];
let nullCount = 0;
let duplicateCount = 0;
let outlierCount = 0;
// Check for nulls
for (const row of data) {
for (const value of Object.values(row)) {
if (value === null || value === undefined || value === '') {
nullCount++;
}
}
}
// Check for duplicates
const seen = new Set();
for (const row of data) {
const key = JSON.stringify(row);
if (seen.has(key)) {
duplicateCount++;
}
seen.add(key);
}
if (nullCount / (data.length * Object.keys(data[0] || {}).length) > 0.2) {
warnings.push('More than 20% null values');
}
if (duplicateCount / data.length > 0.05) {
warnings.push('More than 5% duplicate rows');
}
return {
totalRows: data.length,
nullCount,
duplicateCount,
outlierCount,
isAnalyzable: warnings.length === 0,
warnings
};
}
Iterative Analysis (Follow-Up Questions)
Support multi-turn analysis sessions:
interface AnalysisSession {
id: string;
userId: string;
initialQuestion: string;
messages: Array<{
role: 'user' | 'assistant';
content: string;
analysisResults?: any;
}>;
context: Map<string, any>;
}
async function handleFollowUp(
session: AnalysisSession,
followUpQuestion: string
): Promise<string> {
// Preserve context from previous questions
const context = Array.from(session.context.entries()).map(
([key, value]) => `${key}: ${JSON.stringify(value).slice(0, 100)}`
).join('\n');
const followUpPrompt = `
Previous analysis context:
${context}
Original question: "${session.initialQuestion}"
New follow-up: "${followUpQuestion}"
How does this follow-up relate to what we''ve already analyzed?
Can we answer it from existing data or do we need a new query?
`;
const needsNewAnalysis = (await llm.generate(followUpPrompt)).includes('new query');
if (needsNewAnalysis) {
const plan = await planAnalysis({
question: followUpQuestion,
userId: session.userId,
datasets: Array.from(session.context.keys()),
constraints: { context: session.context }
});
return `New analysis needed for this follow-up. Running: ${plan.analysisType}`;
} else {
return `I can answer this from our previous analysis...`;
}
}
Multi-Dataset JOIN Handling
Combine data from multiple sources:
async function joinDatasets(
queries: string[],
joinKey: string
): Promise<any[]> {
const datasets = [];
for (const query of queries) {
const data = await executeQuery(query);
datasets.push(data);
}
if (datasets.length === 1) return datasets[0];
// Intelligently join datasets
let result = datasets[0];
for (let i = 1; i < datasets.length; i++) {
const joinPrompt = `
How should these datasets be joined?
Dataset 1 columns: ${Object.keys(result[0] || {}).join(', ')}
Dataset 2 columns: ${Object.keys(datasets[i][0] || {}).join(', ')}
Join key: ${joinKey}
Join type: left/inner/full outer?
Return JSON: { joinKey, joinType, matchingColumns }
`;
const joinStrategy = JSON.parse(await llm.generate(joinPrompt));
result = performJoin(result, datasets[i], joinStrategy);
}
return result;
}
Sharing and Exporting AI Analysis
Make results accessible:
async function exportReport(
report: DataReport,
format: 'pdf' | 'html' | 'json' | 'csv'
): Promise<Buffer> {
switch (format) {
case 'pdf':
return generatePDF(report);
case 'html':
return generateHTML(report);
case 'json':
return Buffer.from(JSON.stringify(report, null, 2));
case 'csv':
return generateCSV(report);
}
}
async function shareReport(
report: DataReport,
recipientEmail: string,
format: string = 'pdf'
): Promise<void> {
const buffer = await exportReport(report, format as any);
await emailService.send({
to: recipientEmail,
subject: report.title,
body: `Your analysis: ${report.executiveSummary}`,
attachments: [{
filename: `analysis.${format}`,
content: buffer
}]
});
}
Checklist
- Convert natural language questions to SQL and Python code
- Execute queries and run data transformations
- Generate charts (line, bar, scatter, heatmap, pie) automatically
- Interpret statistics with plain-English explanations
- Detect anomalies with z-score analysis and explain them
- Narrate trends as business-ready copy
- Assemble full reports with summary, insights, and recommendations
- Validate data quality before analysis; warn on nulls > 20%, duplicates > 5%
- Support multi-turn conversations with context preservation
- Handle multi-dataset joins intelligently
- Export reports in PDF, HTML, JSON, and CSV formats
- Share reports via email with one-click actions
- Track analysis usage to identify common questions
- Cache analysis results for repeated questions
- Set confidence thresholds: only generate if > 80% data quality
Conclusion
AI-powered data analysis democratizes insights access. By converting natural language to executable analysis pipelines, generating charts, interpreting statistics, and packaging results into reports, you empower non-technical users to answer their own questions. Start with simple aggregations and trends, then expand to anomaly detection and predictive insights as your system matures.