- Published on
Building a Code Generation Agent — From Spec to Tested Code
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Code generation agents automate turning specifications into working, tested code. Unlike simple LLM code completion, agents verify syntax, run tests in sandboxes, fix failures, and iterate until tests pass. This post covers spec parsing, code generation with examples, test execution, and automatic error fixes.
- Spec Parsing and Clarification
- Code Generation with Examples
- Syntax Validation
- Unit Test Generation
- Test Execution in Sandbox
- Error Reflection and Fix Loop
- Code Review Agent
- PR Description Generation
- Checklist
- Conclusion
Spec Parsing and Clarification
Before generating code, fully understand the specification.
interface CodeSpec {
title: string;
description: string;
inputs: Array<{ name: string; type: string; description: string }>;
outputs: Array<{ name: string; type: string; description: string }>;
requirements: string[];
examples: Array<{ input: Record<string, unknown>; output: unknown }>;
constraints: string[];
}
class SpecParser {
async parseAndClarify(rawSpec: string): Promise<CodeSpec> {
// First, parse what's provided
const initial = await this.parseSpec(rawSpec);
// Then, ask LLM to clarify ambiguities
const clarified = await this.clarifySpec(initial);
return clarified;
}
private async parseSpec(rawSpec: string): Promise<CodeSpec> {
const prompt = `Parse this specification into structured format.
Return JSON: {
"title": "...",
"description": "...",
"inputs": [{"name": "...", "type": "...", "description": "..."}],
"outputs": [{"name": "...", "type": "...", "description": "..."}],
"requirements": ["..."],
"examples": [{"input": {...}, "output": ...}],
"constraints": ["..."]
}
Spec:
${rawSpec}`;
const response = await this.llmCall(prompt);
try {
return JSON.parse(response);
} catch {
throw new Error('Failed to parse specification');
}
}
private async clarifySpec(spec: CodeSpec): Promise<CodeSpec> {
const ambiguities: string[] = [];
if (!spec.inputs || spec.inputs.length === 0) {
ambiguities.push('No input parameters specified');
}
if (!spec.outputs || spec.outputs.length === 0) {
ambiguities.push('No output format specified');
}
if (!spec.examples || spec.examples.length === 0) {
ambiguities.push('No examples provided');
}
if (ambiguities.length === 0) {
return spec;
}
// Ask LLM to clarify
const clarifyPrompt = `The specification has gaps:
${ambiguities.map((a) => `- ${a}`).join('\n')}
Please provide clarifications:
${ambiguities.map((a) => `- ${a}: `).join('\n')}`;
const clarifications = await this.llmCall(clarifyPrompt);
// Merge clarifications into spec
return spec;
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Good specs include examples, clear input/output types, and stated constraints.
Code Generation with Examples
Generate code by providing LLM with similar examples.
interface GeneratedCode {
code: string;
language: string;
isAsync: boolean;
dependencies: string[];
}
class CodeGenerator {
async generate(spec: CodeSpec, language: string = 'typescript'): Promise<GeneratedCode> {
// Build prompt with spec and examples
const exampleContext = await this.buildExampleContext(spec, language);
const prompt = `Generate ${language} code based on this specification:
Title: ${spec.title}
Description: ${spec.description}
Inputs: ${JSON.stringify(spec.inputs, null, 2)}
Outputs: ${JSON.stringify(spec.outputs, null, 2)}
Requirements:
${spec.requirements.map((r) => `- ${r}`).join('\n')}
Constraints:
${spec.constraints.map((c) => `- ${c}`).join('\n')}
Examples:
${spec.examples.map((e) => `Input: ${JSON.stringify(e.input)}\nOutput: ${JSON.stringify(e.output)}`).join('\n\n')}
Reference implementations in similar languages:
${exampleContext}
Generate production-quality code that passes all requirements.`;
const code = await this.llmCall(prompt);
return {
code: code.trim(),
language,
isAsync: code.includes('async'),
dependencies: this.extractDependencies(code, language),
};
}
private async buildExampleContext(spec: CodeSpec, language: string): Promise<string> {
// Retrieve similar code examples from a knowledge base
const query = spec.title + ' ' + spec.description;
const examples = await this.findSimilarExamples(query, language);
return examples.map((e) => `// Example from ${e.source}:\n${e.code}`).join('\n\n');
}
private async findSimilarExamples(query: string, language: string): Promise<any[]> {
// In production: vector search similar problems
return [];
}
private extractDependencies(code: string, language: string): string[] {
const deps: string[] = [];
if (language === 'typescript' || language === 'javascript') {
const importRegex = /import\s+.*\s+from\s+['"]([^'"]+)['"]/g;
let match;
while ((match = importRegex.exec(code)) !== null) {
if (!match[1].startsWith('.')) {
deps.push(match[1]);
}
}
}
return [...new Set(deps)];
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Quality examples dramatically improve code generation output.
Syntax Validation
Validate syntax before trying to run tests.
interface SyntaxError {
line: number;
column: number;
message: string;
code: string;
}
class SyntaxValidator {
async validate(code: string, language: string): Promise<SyntaxError[]> {
if (language === 'typescript') {
return this.validateTypeScript(code);
}
if (language === 'javascript') {
return this.validateJavaScript(code);
}
if (language === 'python') {
return this.validatePython(code);
}
return [];
}
private validateTypeScript(code: string): SyntaxError[] {
const errors: SyntaxError[] = [];
// Check for common TypeScript syntax errors
if (!code.includes('{') || !code.includes('}')) {
errors.push({
line: 1,
column: 0,
message: 'Missing braces',
code,
});
}
// Check for unclosed strings
const stringRegex = /['"`]/g;
let match;
let quoteCount = 0;
while ((match = stringRegex.exec(code)) !== null) {
quoteCount++;
}
if (quoteCount % 2 !== 0) {
errors.push({
line: 1,
column: 0,
message: 'Unclosed string literal',
code,
});
}
return errors;
}
private validateJavaScript(code: string): SyntaxError[] {
try {
new Function(code);
return [];
} catch (error) {
const msg = (error as Error).message;
const match = msg.match(/line (\d+)/);
const line = match ? parseInt(match[1]) : 1;
return [
{
line,
column: 0,
message: msg,
code,
},
];
}
}
private validatePython(code: string): SyntaxError[] {
// Use Python subprocess to check syntax
return [];
}
}
Catch syntax errors before test execution saves time and cloud credits.
Unit Test Generation
Auto-generate tests from spec examples.
interface GeneratedTests {
code: string;
testFramework: string;
testCount: number;
}
class TestGenerator {
async generateTests(spec: CodeSpec, functionName: string): Promise<GeneratedTests> {
const testTemplate = this.getTestTemplate(spec);
const testCases = spec.examples.map((example, i) => {
return `
test('Example ${i + 1}', async () => {
const input = ${JSON.stringify(example.input)};
const expected = ${JSON.stringify(example.output)};
const result = await ${functionName}(${Object.values(example.input).map((v) => JSON.stringify(v)).join(', ')});
assert.deepEqual(result, expected);
});`;
});
const testCode = `
import { test } from 'node:test';
import assert from 'node:assert';
import { ${functionName} } from './solution';
${testCases.join('\n')}
`;
return {
code: testCode,
testFramework: 'node:test',
testCount: spec.examples.length,
};
}
private getTestTemplate(spec: CodeSpec): string {
return `Test template for: ${spec.title}`;
}
}
Tests generated from examples ensure the code matches specification.
Test Execution in Sandbox
Run tests in an isolated sandbox environment.
interface TestResult {
passed: boolean;
passCount: number;
failCount: number;
output: string;
errors: string[];
duration: number;
}
class SandboxTestRunner {
async runTests(code: string, tests: string, language: string): Promise<TestResult> {
const startTime = Date.now();
try {
// Write code and tests to temp files
const codeFile = `/tmp/solution.${this.getExtension(language)}`;
const testFile = `/tmp/test.${this.getExtension(language)}`;
await this.writeFile(codeFile, code);
await this.writeFile(testFile, tests);
// Execute in sandbox (Docker container or similar)
const result = await this.executeInSandbox(testFile, language);
return {
passed: result.exitCode === 0,
passCount: this.countPassed(result.output),
failCount: this.countFailed(result.output),
output: result.output,
errors: this.parseErrors(result.output),
duration: Date.now() - startTime,
};
} catch (error) {
return {
passed: false,
passCount: 0,
failCount: 0,
output: '',
errors: [(error as Error).message],
duration: Date.now() - startTime,
};
}
}
private async executeInSandbox(
testFile: string,
language: string,
): Promise<{ exitCode: number; output: string }> {
// In production: use Docker or E2B for sandboxing
// For now: use Node subprocess with timeout
const timeout = 10000; // 10 seconds
return {
exitCode: 0,
output: 'Tests passed',
};
}
private getExtension(language: string): string {
return language === 'typescript' ? 'ts' : language === 'python' ? 'py' : 'js';
}
private async writeFile(path: string, content: string): Promise<void> {
// Write to file
}
private countPassed(output: string): number {
const match = output.match(/(\d+) passed/);
return match ? parseInt(match[1]) : 0;
}
private countFailed(output: string): number {
const match = output.match(/(\d+) failed/);
return match ? parseInt(match[1]) : 0;
}
private parseErrors(output: string): string[] {
return output.split('\n').filter((line) => line.includes('Error'));
}
}
Sandboxing prevents malicious or buggy code from affecting the system.
Error Reflection and Fix Loop
When tests fail, fix the code.
class FixingAgent {
async generateAndFix(spec: CodeSpec, maxAttempts: number = 3): Promise<GeneratedCode> {
let code = await this.generateCode(spec);
let attempt = 0;
while (attempt < maxAttempts) {
// Test the code
const tests = await this.generateTests(spec);
const result = await this.runTests(code, tests);
if (result.passed) {
return { code, language: 'typescript', isAsync: true, dependencies: [] };
}
attempt++;
if (attempt >= maxAttempts) {
throw new Error(`Failed to generate working code after ${maxAttempts} attempts`);
}
// Ask agent to fix the code
const fixPrompt = `The code failed tests:
Errors:
${result.errors.join('\n')}
Test output:
${result.output}
Original code:
${code}
Fix the code to pass the tests. Only return the fixed code, no explanations.`;
const fixed = await this.llmCall(fixPrompt);
// Validate syntax
const syntaxErrors = await this.validator.validate(fixed, 'typescript');
if (syntaxErrors.length === 0) {
code = fixed;
}
}
throw new Error('Max fix attempts exceeded');
}
private async generateCode(spec: CodeSpec): Promise<string> {
return '';
}
private async generateTests(spec: CodeSpec): Promise<string> {
return '';
}
private async runTests(code: string, tests: string): Promise<TestResult> {
return {
passed: true,
passCount: 0,
failCount: 0,
output: '',
errors: [],
duration: 0,
};
}
private validator = new SyntaxValidator();
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Iterative fixing with test feedback converges on correct solutions.
Code Review Agent
Before accepting generated code, have an agent review it.
interface CodeReview {
approved: boolean;
issues: Array<{ severity: 'error' | 'warning' | 'info'; message: string }>;
suggestions: string[];
}
class CodeReviewAgent {
async review(code: string, spec: CodeSpec): Promise<CodeReview> {
const issues: CodeReview['issues'] = [];
// Check 1: Does it follow the spec?
const meetsSpec = await this.checkSpecCompliance(code, spec);
if (!meetsSpec) {
issues.push({
severity: 'error',
message: 'Code does not match specification',
});
}
// Check 2: Readability and style
const styleIssues = await this.checkCodeStyle(code);
issues.push(...styleIssues);
// Check 3: Security issues
const securityIssues = await this.checkSecurity(code);
issues.push(...securityIssues);
// Check 4: Performance
const performanceIssues = await this.checkPerformance(code);
issues.push(...performanceIssues);
const suggestions = await this.generateSuggestions(code, spec);
return {
approved: issues.filter((i) => i.severity === 'error').length === 0,
issues,
suggestions,
};
}
private async checkSpecCompliance(code: string, spec: CodeSpec): Promise<boolean> {
// Ensure all inputs are used
// Ensure all outputs are returned
return true;
}
private async checkCodeStyle(code: string): Promise<CodeReview['issues']> {
return [];
}
private async checkSecurity(code: string): Promise<CodeReview['issues']> {
const issues: CodeReview['issues'] = [];
if (code.includes('eval(')) {
issues.push({
severity: 'error',
message: 'eval() is dangerous and forbidden',
});
}
if (code.includes('require(\'fs\')')) {
issues.push({
severity: 'warning',
message: 'File system access should be restricted',
});
}
return issues;
}
private async checkPerformance(code: string): Promise<CodeReview['issues']> {
return [];
}
private async generateSuggestions(code: string, spec: CodeSpec): Promise<string[]> {
return [];
}
}
Code review catches issues that tests don't catch.
PR Description Generation
Generate thoughtful PR descriptions for generated code.
interface PRDescription {
title: string;
body: string;
closesIssues: string[];
}
class PRDescriptionGenerator {
async generate(code: string, spec: CodeSpec, review: CodeReview): Promise<PRDescription> {
const title = `feat: ${spec.title}`;
const body = `## Description
${spec.description}
## Changes
- Implemented ${spec.title}
- Added ${spec.examples.length} test cases
- Code reviewed and approved
${review.issues.length > 0 ? `## Review Notes\n${review.issues.map((i) => `- ${i.message}`).join('\n')}` : ''}
## Testing
All ${spec.examples.length} examples passed
## Checklist
- [x] Code generated from specification
- [x] Tests generated and passing
- [x] Code reviewed`;
return {
title,
body,
closesIssues: [],
};
}
}
Checklist
- Spec: clarify ambiguities, extract examples
- Generation: use reference examples in prompts
- Validation: check syntax before testing
- Tests: auto-generate from spec examples
- Execution: run in sandbox with timeout
- Fixing: iterate on failures with LLM
- Review: check spec compliance, security, style
- PR: auto-generate description
Conclusion
Code generation agents turn specs into tested, reviewed code. The key is closing the loop: generate, test, fix, review, repeat. With proper validation at each step, generated code is reliable enough for production.