Building a Code Generation Agent — From Spec to Tested Code

Introduction

Code generation agents automate turning specifications into working, tested code. Unlike simple LLM code completion, agents verify syntax, run tests in sandboxes, fix failures, and iterate until tests pass. This post covers spec parsing, code generation with examples, test execution, and automatic error fixes.

Spec Parsing and Clarification
Code Generation with Examples
Syntax Validation
Unit Test Generation
Test Execution in Sandbox
Error Reflection and Fix Loop
Code Review Agent
PR Description Generation
Checklist
Conclusion

Spec Parsing and Clarification

Before generating code, fully understand the specification.

interface CodeSpec {
  title: string;
  description: string;
  inputs: Array<{ name: string; type: string; description: string }>;
  outputs: Array<{ name: string; type: string; description: string }>;
  requirements: string[];
  examples: Array<{ input: Record<string, unknown>; output: unknown }>;
  constraints: string[];
}

class SpecParser {
  async parseAndClarify(rawSpec: string): Promise<CodeSpec> {
    // First, parse what's provided
    const initial = await this.parseSpec(rawSpec);

    // Then, ask LLM to clarify ambiguities
    const clarified = await this.clarifySpec(initial);

    return clarified;
  }

  private async parseSpec(rawSpec: string): Promise<CodeSpec> {
    const prompt = `Parse this specification into structured format.
Return JSON: {
  "title": "...",
  "description": "...",
  "inputs": [{"name": "...", "type": "...", "description": "..."}],
  "outputs": [{"name": "...", "type": "...", "description": "..."}],
  "requirements": ["..."],
  "examples": [{"input": {...}, "output": ...}],
  "constraints": ["..."]
}

Spec:
${rawSpec}`;

    const response = await this.llmCall(prompt);

    try {
      return JSON.parse(response);
    } catch {
      throw new Error('Failed to parse specification');
    }
  }

  private async clarifySpec(spec: CodeSpec): Promise<CodeSpec> {
    const ambiguities: string[] = [];

    if (!spec.inputs || spec.inputs.length === 0) {
      ambiguities.push('No input parameters specified');
    }

    if (!spec.outputs || spec.outputs.length === 0) {
      ambiguities.push('No output format specified');
    }

    if (!spec.examples || spec.examples.length === 0) {
      ambiguities.push('No examples provided');
    }

    if (ambiguities.length === 0) {
      return spec;
    }

    // Ask LLM to clarify
    const clarifyPrompt = `The specification has gaps:
${ambiguities.map((a) => `- ${a}`).join('\n')}

Please provide clarifications:
${ambiguities.map((a) => `- ${a}: `).join('\n')}`;

    const clarifications = await this.llmCall(clarifyPrompt);

    // Merge clarifications into spec
    return spec;
  }

  private async llmCall(prompt: string): Promise<string> {
    return '';
  }
}

Good specs include examples, clear input/output types, and stated constraints.

Code Generation with Examples

Generate code by providing LLM with similar examples.

interface GeneratedCode {
  code: string;
  language: string;
  isAsync: boolean;
  dependencies: string[];
}

class CodeGenerator {
  async generate(spec: CodeSpec, language: string = 'typescript'): Promise<GeneratedCode> {
    // Build prompt with spec and examples
    const exampleContext = await this.buildExampleContext(spec, language);

    const prompt = `Generate ${language} code based on this specification:

Title: ${spec.title}
Description: ${spec.description}

Inputs: ${JSON.stringify(spec.inputs, null, 2)}
Outputs: ${JSON.stringify(spec.outputs, null, 2)}

Requirements:
${spec.requirements.map((r) => `- ${r}`).join('\n')}

Constraints:
${spec.constraints.map((c) => `- ${c}`).join('\n')}

Examples:
${spec.examples.map((e) => `Input: ${JSON.stringify(e.input)}\nOutput: ${JSON.stringify(e.output)}`).join('\n\n')}

Reference implementations in similar languages:
${exampleContext}

Generate production-quality code that passes all requirements.`;

    const code = await this.llmCall(prompt);

    return {
      code: code.trim(),
      language,
      isAsync: code.includes('async'),
      dependencies: this.extractDependencies(code, language),
    };
  }

  private async buildExampleContext(spec: CodeSpec, language: string): Promise<string> {
    // Retrieve similar code examples from a knowledge base
    const query = spec.title + ' ' + spec.description;
    const examples = await this.findSimilarExamples(query, language);

    return examples.map((e) => `// Example from ${e.source}:\n${e.code}`).join('\n\n');
  }

  private async findSimilarExamples(query: string, language: string): Promise<any[]> {
    // In production: vector search similar problems
    return [];
  }

  private extractDependencies(code: string, language: string): string[] {
    const deps: string[] = [];

    if (language === 'typescript' || language === 'javascript') {
      const importRegex = /import\s+.*\s+from\s+['"]([^'"]+)['"]/g;
      let match;

      while ((match = importRegex.exec(code)) !== null) {
        if (!match[1].startsWith('.')) {
          deps.push(match[1]);
        }
      }
    }

    return [...new Set(deps)];
  }

  private async llmCall(prompt: string): Promise<string> {
    return '';
  }
}

Quality examples dramatically improve code generation output.

Syntax Validation

Validate syntax before trying to run tests.

interface SyntaxError {
  line: number;
  column: number;
  message: string;
  code: string;
}

class SyntaxValidator {
  async validate(code: string, language: string): Promise<SyntaxError[]> {
    if (language === 'typescript') {
      return this.validateTypeScript(code);
    }

    if (language === 'javascript') {
      return this.validateJavaScript(code);
    }

    if (language === 'python') {
      return this.validatePython(code);
    }

    return [];
  }

  private validateTypeScript(code: string): SyntaxError[] {
    const errors: SyntaxError[] = [];

    // Check for common TypeScript syntax errors
    if (!code.includes('{') || !code.includes('}')) {
      errors.push({
        line: 1,
        column: 0,
        message: 'Missing braces',
        code,
      });
    }

    // Check for unclosed strings
    const stringRegex = /['"`]/g;
    let match;
    let quoteCount = 0;

    while ((match = stringRegex.exec(code)) !== null) {
      quoteCount++;
    }

    if (quoteCount % 2 !== 0) {
      errors.push({
        line: 1,
        column: 0,
        message: 'Unclosed string literal',
        code,
      });
    }

    return errors;
  }

  private validateJavaScript(code: string): SyntaxError[] {
    try {
      new Function(code);
      return [];
    } catch (error) {
      const msg = (error as Error).message;
      const match = msg.match(/line (\d+)/);
      const line = match ? parseInt(match[1]) : 1;

      return [
        {
          line,
          column: 0,
          message: msg,
          code,
        },
      ];
    }
  }

  private validatePython(code: string): SyntaxError[] {
    // Use Python subprocess to check syntax
    return [];
  }
}

Catch syntax errors before test execution saves time and cloud credits.

Unit Test Generation

Auto-generate tests from spec examples.

interface GeneratedTests {
  code: string;
  testFramework: string;
  testCount: number;
}

class TestGenerator {
  async generateTests(spec: CodeSpec, functionName: string): Promise<GeneratedTests> {
    const testTemplate = this.getTestTemplate(spec);

    const testCases = spec.examples.map((example, i) => {
      return `
test('Example ${i + 1}', async () => {
  const input = ${JSON.stringify(example.input)};
  const expected = ${JSON.stringify(example.output)};
  const result = await ${functionName}(${Object.values(example.input).map((v) => JSON.stringify(v)).join(', ')});
  assert.deepEqual(result, expected);
});`;
    });

    const testCode = `
import { test } from 'node:test';
import assert from 'node:assert';
import { ${functionName} } from './solution';

${testCases.join('\n')}
`;

    return {
      code: testCode,
      testFramework: 'node:test',
      testCount: spec.examples.length,
    };
  }

  private getTestTemplate(spec: CodeSpec): string {
    return `Test template for: ${spec.title}`;
  }
}

Tests generated from examples ensure the code matches specification.

Test Execution in Sandbox

Run tests in an isolated sandbox environment.

interface TestResult {
  passed: boolean;
  passCount: number;
  failCount: number;
  output: string;
  errors: string[];
  duration: number;
}

class SandboxTestRunner {
  async runTests(code: string, tests: string, language: string): Promise<TestResult> {
    const startTime = Date.now();

    try {
      // Write code and tests to temp files
      const codeFile = `/tmp/solution.${this.getExtension(language)}`;
      const testFile = `/tmp/test.${this.getExtension(language)}`;

      await this.writeFile(codeFile, code);
      await this.writeFile(testFile, tests);

      // Execute in sandbox (Docker container or similar)
      const result = await this.executeInSandbox(testFile, language);

      return {
        passed: result.exitCode === 0,
        passCount: this.countPassed(result.output),
        failCount: this.countFailed(result.output),
        output: result.output,
        errors: this.parseErrors(result.output),
        duration: Date.now() - startTime,
      };
    } catch (error) {
      return {
        passed: false,
        passCount: 0,
        failCount: 0,
        output: '',
        errors: [(error as Error).message],
        duration: Date.now() - startTime,
      };
    }
  }

  private async executeInSandbox(
    testFile: string,
    language: string,
  ): Promise<{ exitCode: number; output: string }> {
    // In production: use Docker or E2B for sandboxing
    // For now: use Node subprocess with timeout
    const timeout = 10000; // 10 seconds

    return {
      exitCode: 0,
      output: 'Tests passed',
    };
  }

  private getExtension(language: string): string {
    return language === 'typescript' ? 'ts' : language === 'python' ? 'py' : 'js';
  }

  private async writeFile(path: string, content: string): Promise<void> {
    // Write to file
  }

  private countPassed(output: string): number {
    const match = output.match(/(\d+) passed/);
    return match ? parseInt(match[1]) : 0;
  }

  private countFailed(output: string): number {
    const match = output.match(/(\d+) failed/);
    return match ? parseInt(match[1]) : 0;
  }

  private parseErrors(output: string): string[] {
    return output.split('\n').filter((line) => line.includes('Error'));
  }
}

Sandboxing prevents malicious or buggy code from affecting the system.

Error Reflection and Fix Loop

When tests fail, fix the code.

class FixingAgent {
  async generateAndFix(spec: CodeSpec, maxAttempts: number = 3): Promise<GeneratedCode> {
    let code = await this.generateCode(spec);
    let attempt = 0;

    while (attempt < maxAttempts) {
      // Test the code
      const tests = await this.generateTests(spec);
      const result = await this.runTests(code, tests);

      if (result.passed) {
        return { code, language: 'typescript', isAsync: true, dependencies: [] };
      }

      attempt++;

      if (attempt >= maxAttempts) {
        throw new Error(`Failed to generate working code after ${maxAttempts} attempts`);
      }

      // Ask agent to fix the code
      const fixPrompt = `The code failed tests:

Errors:
${result.errors.join('\n')}

Test output:
${result.output}

Original code:
${code}

Fix the code to pass the tests. Only return the fixed code, no explanations.`;

      const fixed = await this.llmCall(fixPrompt);

      // Validate syntax
      const syntaxErrors = await this.validator.validate(fixed, 'typescript');

      if (syntaxErrors.length === 0) {
        code = fixed;
      }
    }

    throw new Error('Max fix attempts exceeded');
  }

  private async generateCode(spec: CodeSpec): Promise<string> {
    return '';
  }

  private async generateTests(spec: CodeSpec): Promise<string> {
    return '';
  }

  private async runTests(code: string, tests: string): Promise<TestResult> {
    return {
      passed: true,
      passCount: 0,
      failCount: 0,
      output: '',
      errors: [],
      duration: 0,
    };
  }

  private validator = new SyntaxValidator();

  private async llmCall(prompt: string): Promise<string> {
    return '';
  }
}

Iterative fixing with test feedback converges on correct solutions.

Code Review Agent

Before accepting generated code, have an agent review it.

interface CodeReview {
  approved: boolean;
  issues: Array<{ severity: 'error' | 'warning' | 'info'; message: string }>;
  suggestions: string[];
}

class CodeReviewAgent {
  async review(code: string, spec: CodeSpec): Promise<CodeReview> {
    const issues: CodeReview['issues'] = [];

    // Check 1: Does it follow the spec?
    const meetsSpec = await this.checkSpecCompliance(code, spec);
    if (!meetsSpec) {
      issues.push({
        severity: 'error',
        message: 'Code does not match specification',
      });
    }

    // Check 2: Readability and style
    const styleIssues = await this.checkCodeStyle(code);
    issues.push(...styleIssues);

    // Check 3: Security issues
    const securityIssues = await this.checkSecurity(code);
    issues.push(...securityIssues);

    // Check 4: Performance
    const performanceIssues = await this.checkPerformance(code);
    issues.push(...performanceIssues);

    const suggestions = await this.generateSuggestions(code, spec);

    return {
      approved: issues.filter((i) => i.severity === 'error').length === 0,
      issues,
      suggestions,
    };
  }

  private async checkSpecCompliance(code: string, spec: CodeSpec): Promise<boolean> {
    // Ensure all inputs are used
    // Ensure all outputs are returned
    return true;
  }

  private async checkCodeStyle(code: string): Promise<CodeReview['issues']> {
    return [];
  }

  private async checkSecurity(code: string): Promise<CodeReview['issues']> {
    const issues: CodeReview['issues'] = [];

    if (code.includes('eval(')) {
      issues.push({
        severity: 'error',
        message: 'eval() is dangerous and forbidden',
      });
    }

    if (code.includes('require(\'fs\')')) {
      issues.push({
        severity: 'warning',
        message: 'File system access should be restricted',
      });
    }

    return issues;
  }

  private async checkPerformance(code: string): Promise<CodeReview['issues']> {
    return [];
  }

  private async generateSuggestions(code: string, spec: CodeSpec): Promise<string[]> {
    return [];
  }
}

Code review catches issues that tests don't catch.

PR Description Generation

Generate thoughtful PR descriptions for generated code.

interface PRDescription {
  title: string;
  body: string;
  closesIssues: string[];
}

class PRDescriptionGenerator {
  async generate(code: string, spec: CodeSpec, review: CodeReview): Promise<PRDescription> {
    const title = `feat: ${spec.title}`;

    const body = `## Description
${spec.description}

## Changes
- Implemented ${spec.title}
- Added ${spec.examples.length} test cases
- Code reviewed and approved

${review.issues.length > 0 ? `## Review Notes\n${review.issues.map((i) => `- ${i.message}`).join('\n')}` : ''}

## Testing
All ${spec.examples.length} examples passed

## Checklist
- [x] Code generated from specification
- [x] Tests generated and passing
- [x] Code reviewed`;

    return {
      title,
      body,
      closesIssues: [],
    };
  }
}

Checklist

Spec: clarify ambiguities, extract examples
Generation: use reference examples in prompts
Validation: check syntax before testing
Tests: auto-generate from spec examples
Execution: run in sandbox with timeout
Fixing: iterate on failures with LLM
Review: check spec compliance, security, style
PR: auto-generate description

Conclusion

Code generation agents turn specs into tested, reviewed code. The key is closing the loop: generate, test, fix, review, repeat. With proper validation at each step, generated code is reliable enough for production.