- Published on
Idempotent AI Operations — Handling Retries Without Duplicate Side Effects
- Authors
- Name
Introduction
LLM calls fail. Networks drop. Timeouts happen. When they do, clients retry. Without idempotency, retries create duplicates: duplicate API calls, duplicate database records, duplicate charges. This post covers making AI operations safe to retry.
- Why AI Operations Need Idempotency
- Idempotency Key Generation for LLM Requests
- Storing LLM Responses With Idempotency Keys in Redis
- Replay Cached Response on Retry
- Idempotent Tool Calls
- Database Upserts for AI-Generated Content
- Deduplication of Webhook-Triggered AI Jobs
- Idempotency Window Expiry
- Testing Idempotent AI Endpoints
- Checklist
- Conclusion
Why AI Operations Need Idempotency
AI operations are expensive. A failed LLM call costs money even if it fails. If a client retries the same operation, you've now spent money twice for one result.
Worse: side effects multiply. An AI decides to transfer $1000. The operation fails. The client retries. If not idempotent, you transfer $2000.
Idempotency means: "running this operation twice has the same effect as running it once."
// Bad: not idempotent
app.post('/api/transfer', async (req, res) => {
const balance = await db.accounts.findOne({ id: req.body.accountId });
if (balance.amount < req.body.amount) {
return res.status(400).json({ error: 'Insufficient funds' });
}
// Transfer happens
await db.accounts.updateOne(
{ id: req.body.accountId },
{ $inc: { amount: -req.body.amount } }
);
// If the response fails to send, client retries, and we transfer TWICE
res.json({ success: true });
});
// Good: idempotent
app.post('/api/transfer', async (req, res) => {
const idempotencyKey = req.headers['idempotency-key'];
// Check if we''ve seen this key before
const existing = await redis.get(`idempotency:${idempotencyKey}`);
if (existing) {
return res.json(JSON.parse(existing)); // Return cached result
}
const balance = await db.accounts.findOne({ id: req.body.accountId });
if (balance.amount < req.body.amount) {
return res.status(400).json({ error: 'Insufficient funds' });
}
await db.accounts.updateOne(
{ id: req.body.accountId },
{ $inc: { amount: -req.body.amount } }
);
const result = { success: true };
await redis.setex(`idempotency:${idempotencyKey}`, 3600, JSON.stringify(result));
res.json(result);
});
Idempotency Key Generation for LLM Requests
Clients provide idempotency keys. They should be UUIDs.
// Client generates once, reuses on retry
const idempotencyKey = uuidv4();
try {
const response = await fetch('/api/generate', {
method: 'POST',
headers: {
'Idempotency-Key': idempotencyKey
},
body: JSON.stringify({ prompt: 'Summarize this document' })
});
return response.json();
} catch (error) {
// Retry with same key
const response = await fetch('/api/generate', {
method: 'POST',
headers: {
'Idempotency-Key': idempotencyKey // Same key
},
body: JSON.stringify({ prompt: 'Summarize this document' })
});
return response.json();
}
The server sees the same key and returns the cached result without reprocessing.
Storing LLM Responses With Idempotency Keys in Redis
Cache responses keyed by idempotency key.
async function handleLLMRequest(req, res) {
const idempotencyKey = req.headers['idempotency-key'];
if (!idempotencyKey) {
return res.status(400).json({
error: 'Idempotency-Key header required'
});
}
// Check cache
const cached = await redis.get(`llm_response:${idempotencyKey}`);
if (cached) {
res.setHeader('X-Idempotency-Replayed', 'true');
return res.json(JSON.parse(cached));
}
// Call LLM
const response = await openai.createChatCompletion({
model: 'gpt-4o',
messages: [{ role: 'user', content: req.body.prompt }],
temperature: req.body.temperature || 0.7
});
const result = {
id: uuidv4(),
content: response.choices[0].message.content,
tokens: response.usage.total_tokens,
finishReason: response.choices[0].finish_reason
};
// Cache the response
const ttl = 86400; // 24 hours
await redis.setex(
`llm_response:${idempotencyKey}`,
ttl,
JSON.stringify(result)
);
// Also record cost against idempotency key
await db.costs.insertOne({
idempotencyKey,
userId: req.user.id,
tokens: result.tokens,
costUsd: (result.tokens / 1000) * 0.01,
timestamp: new Date()
});
res.json(result);
}
Cache TTL should match your retention policy. 24 hours is typical: long enough to handle retries, short enough to free memory.
Replay Cached Response on Retry
Replayed responses should be identical to the original response.
async function handleRequest(req, res) {
const idempotencyKey = req.headers['idempotency-key'] || uuidv4();
// Check if we''ve processed this before
const cached = await redis.get(`request:${idempotencyKey}`);
if (cached) {
const { response, timestamp } = JSON.parse(cached);
// Add header indicating this is a replayed response
res.setHeader('X-From-Cache', 'true');
res.setHeader('X-Original-Request-Time', timestamp);
// Log the replay
logger.info('Request replayed from cache', {
idempotencyKey,
age: Date.now() - new Date(timestamp).getTime(),
originalCost: response.cost
});
// Important: do NOT recharge the user
return res.json(response);
}
// Process normally
const result = await processRequest(req);
// Store for replay
await redis.setex(
`request:${idempotencyKey}`,
3600,
JSON.stringify({
response: result,
timestamp: new Date().toISOString()
})
);
res.json(result);
}
Mark replayed responses with a header. This lets clients know the request was served from cache (old data, no new charge).
Idempotent Tool Calls
When an AI agent calls tools, it might call the same tool twice. Handle this.
// Tool: transfer funds
const transferTool = {
name: 'transfer_funds',
handler: async (params) => {
const toolCallId = params.toolCallId; // Unique per tool invocation
// Check if we''ve executed this tool call before
const existing = await redis.get(`tool_call:${toolCallId}`);
if (existing) {
return JSON.parse(existing);
}
// Execute tool
const result = await db.accounts.updateOne(
{ id: params.fromAccountId },
{ $inc: { amount: -params.amount } }
);
// Cache the result
await redis.setex(
`tool_call:${toolCallId}`,
3600,
JSON.stringify(result)
);
return result;
}
};
// AI agent calls tool
const toolCall = {
id: 'call_abc123',
type: 'function',
function: {
name: 'transfer_funds',
arguments: '{ "fromAccountId": "acc_1", "toAccountId": "acc_2", "amount": 100 }'
}
};
// Execute with deduplication
const result = await transferTool.handler({
...JSON.parse(toolCall.function.arguments),
toolCallId: toolCall.id // Use tool call ID for dedup
});
Tool calls have IDs. Use them for deduplication. If the same tool call ID arrives twice, return the cached result.
Database Upserts for AI-Generated Content
Upserts prevent duplicates when ingesting AI outputs.
// AI generates a summary
async function generateAndStoreSummary(documentId: string) {
const summary = await openai.createChatCompletion({
model: 'gpt-4o',
messages: [
{ role: 'user', content: `Summarize this document: ${documentId}` }
]
});
// Upsert: insert if new, update if exists
await db.summaries.updateOne(
{ documentId },
{
$set: {
content: summary.choices[0].message.content,
tokens: summary.usage.total_tokens,
model: 'gpt-4o',
generatedAt: new Date()
}
},
{ upsert: true }
);
}
Upserts are idempotent. Running them twice has the same effect as running once: the record is updated to the same final state.
Deduplication of Webhook-Triggered AI Jobs
Webhooks can fire multiple times. Deduplicate.
// Webhook endpoint
app.post('/api/webhooks/document-uploaded', async (req, res) => {
const eventId = req.body.eventId; // Unique per webhook event
// Check if we''ve processed this webhook before
const processed = await redis.get(`webhook:${eventId}`);
if (processed) {
return res.json({ message: 'Already processed' });
}
// Mark as processing
await redis.set(`webhook:${eventId}`, 'processing');
try {
// Enqueue job
const jobId = await queue.enqueue({
type: 'process_document',
documentId: req.body.documentId,
webhookEventId: eventId
});
// Mark as processed
await redis.set(`webhook:${eventId}`, JSON.stringify({ jobId }));
res.json({ jobId });
} catch (error) {
// Delete processing flag on error so retry can try again
await redis.del(`webhook:${eventId}`);
throw error;
}
});
// Worker
worker.on('process_document', async (job) => {
// Double-check we haven''t processed this webhook before
const alreadyProcessed = await db.processedWebhooks.findOne({
webhookEventId: job.webhookEventId
});
if (alreadyProcessed) {
logger.info('Webhook already processed', { webhookEventId: job.webhookEventId });
return;
}
// Process document
const result = await processDocument(job.documentId);
// Record as processed
await db.processedWebhooks.insertOne({
webhookEventId: job.webhookEventId,
jobId: job.id,
result,
processedAt: new Date()
});
});
Handle webhooks at two levels: immediate dedup in Redis (fast rejection of obvious duplicates), then persistent dedup in the database (guard against job reprocessing).
Idempotency Window Expiry
Don't cache forever. Set expiry windows.
// Short window for sensitive operations (transfers, deletes)
const SENSITIVE_TTL = 3600; // 1 hour
// Longer window for expensive operations (summaries, embeddings)
const EXPENSIVE_TTL = 86400; // 24 hours
// Very short window for high-volume operations
const HIGH_VOLUME_TTL = 60; // 1 minute
async function cacheResponse(idempotencyKey, response, type) {
let ttl;
switch (type) {
case 'transfer':
case 'delete':
ttl = SENSITIVE_TTL;
break;
case 'summarize':
case 'embed':
ttl = EXPENSIVE_TTL;
break;
case 'search':
case 'list':
ttl = HIGH_VOLUME_TTL;
break;
default:
ttl = 3600;
}
await redis.setex(
`idempotency:${idempotencyKey}`,
ttl,
JSON.stringify(response)
);
}
Longer windows for expensive operations (they cost money, avoid recompute). Shorter windows for fast operations (save cache space).
Testing Idempotent AI Endpoints
Test that endpoints handle retries correctly.
async function testIdempotency() {
const idempotencyKey = uuidv4();
// First request
const response1 = await fetch('/api/generate', {
method: 'POST',
headers: { 'Idempotency-Key': idempotencyKey },
body: JSON.stringify({ prompt: 'Hello' })
});
const data1 = await response1.json();
const timestamp1 = response1.headers.get('X-Original-Request-Time');
// Retry with same key
const response2 = await fetch('/api/generate', {
method: 'POST',
headers: { 'Idempotency-Key': idempotencyKey },
body: JSON.stringify({ prompt: 'Hello' })
});
const data2 = await response2.json();
// Verify identical responses
assert.deepEqual(data1, data2);
assert.equal(response2.headers.get('X-From-Cache'), 'true');
assert.equal(response1.status, 200);
assert.equal(response2.status, 200);
// Verify only charged once
const costs = await db.costs.find({ idempotencyKey });
assert.equal(costs.length, 1);
}
// Test different prompt with same key should still return cached response
async function testIdempotencyKeyIgnoresBody() {
const idempotencyKey = uuidv4();
// First request
await fetch('/api/generate', {
method: 'POST',
headers: { 'Idempotency-Key': idempotencyKey },
body: JSON.stringify({ prompt: 'Hello' })
});
// Retry with different prompt, same key
const response = await fetch('/api/generate', {
method: 'POST',
headers: { 'Idempotency-Key': idempotencyKey },
body: JSON.stringify({ prompt: 'Goodbye' }) // Different
});
const data = await response.json();
// Should still return response from first request (Hello)
assert.equal(response.headers.get('X-From-Cache'), 'true');
}
Test that idempotent endpoints survive retries, duplicates, and different bodies with same key.
Checklist
- Clients send
Idempotency-Keyheader on state-changing requests - Server checks Redis for cached response before processing
- Responses cached with TTL based on operation type
- Replayed responses marked with
X-From-Cacheheader - Tool calls deduplicated by tool call ID
- Database upserts used for AI-generated content
- Webhooks deduplicated at both Redis and database layers
- Idempotency window TTL longer for expensive operations
- Cost recorded once per idempotency key, not per attempt
- Idempotent endpoints tested in CI/CD
Conclusion
Idempotency is about safety: retries shouldn't have side effects. Implement it at two levels: cache checks in Redis (fast) and database deduplication (durable). Test relentlessly. The cost of a duplicate LLM call or a duplicate transfer is higher than the cost of caching.