- Published on
Plan-and-Execute — Reducing LLM Costs by 90% With Heterogeneous Agent Fleets
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Running a frontier model (GPT-4o, Claude Opus) for every step of an agent''s execution is expensive. The Plan-and-Execute pattern solves this by using a high-capability model once to create a detailed plan, then dispatching each step to specialized, cheaper models. This post covers the pattern, cost calculations, failure modes, and production implementation.
- The Cost Problem
- Step Decomposition and Executor Selection
- Executor Selection by Task Type
- Handling Replanning on Failure
- Real-World Cost Calculation
- Cost Benchmarks and Monitoring
- Plan Quality Validation
- Checklist
- Conclusion
The Cost Problem
A typical agent workflow processes a user request across multiple steps:
- Analyze request
- Search for context
- Synthesize findings
- Write response
Running Claude Opus for all four steps costs roughly 4x the price of running it once. If Opus costs $15 per million tokens and the Haiku model costs $0.80 per million tokens, using Opus everywhere is expensive at scale.
The Plan-and-Execute pattern uses cost-benefit analysis:
Frontier model cost: $15/1M tokens
Standard model cost: $3/1M tokens
Fast model cost: $0.80/1M tokens
Cost without Plan-and-Execute:
4 steps × (1000 tokens × $15/1M) = $0.06 per request
Cost with Plan-and-Execute:
Planning step (2000 tokens × $15/1M) = $0.03
Search step (800 tokens × $0.80/1M) = $0.0006
Synthesis step (1000 tokens × $3/1M) = $0.003
Writing step (1200 tokens × $3/1M) = $0.0036
Total = $0.0372 per request (38% savings)
At scale, with longer workflows:
100,000 requests/day:
Without optimization: $6,000/day
With Plan-and-Execute: $3,720/day
Annual savings: ~$800,000
Step Decomposition and Executor Selection
The planner breaks down the task and specifies which executor handles each step:
import Anthropic from '@anthropic-ai/sdk';
interface ExecutionStep {
id: string;
type: 'search' | 'analyze' | 'code' | 'reasoning' | 'writing';
prompt: string;
requiredModel: 'fast' | 'standard' | 'frontier';
timeout: number;
}
interface ExecutionPlan {
goal: string;
steps: ExecutionStep[];
expectedCost: number;
}
const plannerClient = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
async function planExecution(userRequest: string): Promise<ExecutionPlan> {
const response = await plannerClient.messages.create({
model: 'claude-opus-4-1',
max_tokens: 2048,
system: `You are an execution planner. Break down user requests into specific steps.
For each step, specify:
- What needs to happen
- Which model should handle it (fast/standard/frontier based on complexity)
- Exact prompt for the executor
Return valid JSON.`,
messages: [
{
role: 'user',
content: `Plan execution for: ${userRequest}
Return JSON:
{
"goal": "...",
"steps": [
{
"id": "step-1",
"type": "search|analyze|code|reasoning|writing",
"prompt": "...",
"requiredModel": "fast|standard|frontier",
"timeout": 30000
}
],
"expectedCost": 0.05
}`,
},
],
});
const content = response.content[0];
if (content.type !== 'text') {
throw new Error('Planner returned non-text response');
}
const parsed = JSON.parse(content.text);
return parsed as ExecutionPlan;
}
Executor Selection by Task Type
Each executor is optimized for its task type:
type ModelChoice = 'haiku' | 'sonnet' | 'opus';
function selectModelForType(stepType: string): ModelChoice {
switch (stepType) {
case 'search':
return 'haiku'; // Fast retrieval, minimal reasoning
case 'analyze':
return 'sonnet'; // Moderate reasoning required
case 'code':
return 'opus'; // Complex syntax, edge cases
case 'reasoning':
return 'opus'; // Deep thought required
case 'writing':
return 'sonnet'; // Fluent but not frontier-level
default:
return 'sonnet';
}
}
const modelConfig: Record<
ModelChoice,
{ id: string; costPer1mTokens: number; speed: number }
> = {
haiku: {
id: 'claude-3-5-haiku-20241022',
costPer1mTokens: 0.8,
speed: 800, // ms estimate
},
sonnet: {
id: 'claude-3-5-sonnet-20241022',
costPer1mTokens: 3,
speed: 1500,
},
opus: {
id: 'claude-opus-4-1',
costPer1mTokens: 15,
speed: 2500,
},
};
async function executeStep(
step: ExecutionStep
): Promise<{ output: string; cost: number; tokensUsed: number }> {
const modelChoice = selectModelForType(step.type);
const model = modelConfig[modelChoice];
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const response = await client.messages.create({
model: model.id,
max_tokens: 1024,
messages: [
{
role: 'user',
content: step.prompt,
},
],
});
const text =
response.content[0].type === 'text' ? response.content[0].text : '';
const tokensUsed =
(response.usage.input_tokens + response.usage.output_tokens) / 1000;
const cost = (tokensUsed * model.costPer1mTokens) / 1000;
return { output: text, cost, tokensUsed };
}
Handling Replanning on Failure
When an executor fails, the planner adjusts the strategy:
interface ExecutionResult {
stepId: string;
status: 'success' | 'failure';
output?: string;
error?: string;
cost: number;
}
async function executeWithReplanning(
plan: ExecutionPlan,
maxRetries: number = 2
): Promise<{ outputs: Record<string, string>; totalCost: number }> {
const outputs: Record<string, string> = {};
let totalCost = 0;
let currentPlan = plan;
for (let retryCount = 0; retryCount <= maxRetries; retryCount++) {
let allStepsSucceeded = true;
for (const step of currentPlan.steps) {
try {
const result = await executeStep(step);
outputs[step.id] = result.output;
totalCost += result.cost;
} catch (error) {
console.error(
`Step ${step.id} failed: ${(error as Error).message}`
);
allStepsSucceeded = false;
if (retryCount < maxRetries) {
console.log(
`Replanning after step ${step.id} failure (attempt ${retryCount + 1})`
);
// Ask planner to adjust strategy
const replanResponse = await plannerClient.messages.create({
model: 'claude-opus-4-1',
max_tokens: 2048,
messages: [
{
role: 'user',
content: `Step ${step.id} failed with error: ${(error as Error).message}
Current partial outputs: ${JSON.stringify(Object.entries(outputs).slice(0, 2))}
Replan the remaining steps with a different approach. Use stronger models if needed.`,
},
],
});
const replanText =
replanResponse.content[0].type === 'text'
? replanResponse.content[0].text
: '';
currentPlan = JSON.parse(replanText);
break; // Retry with new plan
} else {
throw error;
}
}
}
if (allStepsSucceeded) {
break;
}
}
return { outputs, totalCost };
}
Real-World Cost Calculation
Track cost per request for analytics:
interface RequestMetrics {
requestId: string;
userInput: string;
planningCost: number;
executionCosts: Map<string, number>;
totalCost: number;
planningTime: number;
executionTime: number;
totalTime: number;
}
async function processRequestWithMetrics(
userRequest: string
): Promise<{ result: string; metrics: RequestMetrics }> {
const requestId = crypto.randomUUID();
const startTime = Date.now();
// Plan
const planStart = Date.now();
const plan = await planExecution(userRequest);
const planningTime = Date.now() - planStart;
// Execute
const execStart = Date.now();
const executionCosts = new Map<string, number>();
const outputs: Record<string, string> = {};
for (const step of plan.steps) {
try {
const result = await executeStep(step);
outputs[step.id] = result.output;
executionCosts.set(step.id, result.cost);
} catch (error) {
console.error(`Step ${step.id} failed:`, error);
throw error;
}
}
const executionTime = Date.now() - execStart;
const totalTime = Date.now() - startTime;
const planningCost = 0.03; // Estimated
let executionCostTotal = 0;
executionCosts.forEach((cost) => {
executionCostTotal += cost;
});
const metrics: RequestMetrics = {
requestId,
userInput: userRequest,
planningCost,
executionCosts,
totalCost: planningCost + executionCostTotal,
planningTime,
executionTime,
totalTime,
};
return {
result: JSON.stringify(outputs),
metrics,
};
}
Cost Benchmarks and Monitoring
Monitor actual vs. planned costs:
interface CostBudget {
maxCostPerRequest: number;
maxCostPerDay: number;
currentDayCost: number;
requestCount: number;
}
const budget: CostBudget = {
maxCostPerRequest: 0.10,
maxCostPerDay: 500,
currentDayCost: 0,
requestCount: 0,
};
async function checkBudgetAndProcess(
userRequest: string
): Promise<string | null> {
if (budget.currentDayCost > budget.maxCostPerDay) {
console.warn('Daily budget exceeded. Queuing request.');
return null;
}
const { result, metrics } = await processRequestWithMetrics(userRequest);
if (metrics.totalCost > budget.maxCostPerRequest) {
console.warn(
`Request cost ${metrics.totalCost} exceeds budget ${budget.maxCostPerRequest}`
);
}
budget.currentDayCost += metrics.totalCost;
budget.requestCount += 1;
// Log for analysis
console.log(
`Request ${metrics.requestId}: $${metrics.totalCost.toFixed(4)} (${metrics.totalTime}ms)`
);
return result;
}
Plan Quality Validation
Ensure the planner creates valid, executable plans:
function validatePlan(plan: ExecutionPlan): boolean {
if (!plan.steps || plan.steps.length === 0) {
console.error('Plan has no steps');
return false;
}
for (const step of plan.steps) {
if (!step.id || !step.type || !step.prompt) {
console.error(`Step missing required fields: ${JSON.stringify(step)}`);
return false;
}
if (
!['search', 'analyze', 'code', 'reasoning', 'writing'].includes(
step.type
)
) {
console.error(`Invalid step type: ${step.type}`);
return false;
}
if (step.prompt.length < 10) {
console.error(
`Step prompt too short: "${step.prompt.substring(0, 50)}"`
);
return false;
}
}
if (plan.expectedCost <= 0 || plan.expectedCost > 1.0) {
console.warn(`Suspicious estimated cost: $${plan.expectedCost}`);
}
return true;
}
Checklist
- Understand frontier vs. standard vs. fast model trade-offs
- Implement plan-and-execute pattern
- Build executor selection logic by task type
- Add cost tracking and budget enforcement
- Implement replanning on executor failure
- Validate plan quality before execution
- Monitor actual vs. projected costs
Conclusion
Plan-and-Execute reduces costs by 60-90% compared to using frontier models for every step. The key insight is that planning is expensive but happens once, while execution is cheaper and happens many times. By routing steps to optimal models and replanning on failures, you can build cost-efficient AI systems that scale. Start by measuring your baseline costs, implement the pattern, and monitor the savings. As your usage grows, this approach becomes increasingly valuable.