Idempotent AI Operations — Handling Retries Without Duplicate Side Effects

Introduction

LLM calls fail. Networks drop. Timeouts happen. When they do, clients retry. Without idempotency, retries create duplicates: duplicate API calls, duplicate database records, duplicate charges. This post covers making AI operations safe to retry.

Why AI Operations Need Idempotency
Idempotency Key Generation for LLM Requests
Storing LLM Responses With Idempotency Keys in Redis
Replay Cached Response on Retry
Idempotent Tool Calls
Database Upserts for AI-Generated Content
Deduplication of Webhook-Triggered AI Jobs
Idempotency Window Expiry
Testing Idempotent AI Endpoints
Checklist
Conclusion

Why AI Operations Need Idempotency

AI operations are expensive. A failed LLM call costs money even if it fails. If a client retries the same operation, you've now spent money twice for one result.

Worse: side effects multiply. An AI decides to transfer $1000. The operation fails. The client retries. If not idempotent, you transfer $2000.

Idempotency means: "running this operation twice has the same effect as running it once."

// Bad: not idempotent
app.post('/api/transfer', async (req, res) =&gt; {
  const balance = await db.accounts.findOne({ id: req.body.accountId });

  if (balance.amount &lt; req.body.amount) {
    return res.status(400).json({ error: 'Insufficient funds' });
  }

  // Transfer happens
  await db.accounts.updateOne(
    { id: req.body.accountId },
    { $inc: { amount: -req.body.amount } }
  );

  // If the response fails to send, client retries, and we transfer TWICE
  res.json({ success: true });
});

// Good: idempotent
app.post('/api/transfer', async (req, res) =&gt; {
  const idempotencyKey = req.headers['idempotency-key'];

  // Check if we''ve seen this key before
  const existing = await redis.get(`idempotency:${idempotencyKey}`);
  if (existing) {
    return res.json(JSON.parse(existing)); // Return cached result
  }

  const balance = await db.accounts.findOne({ id: req.body.accountId });
  if (balance.amount &lt; req.body.amount) {
    return res.status(400).json({ error: 'Insufficient funds' });
  }

  await db.accounts.updateOne(
    { id: req.body.accountId },
    { $inc: { amount: -req.body.amount } }
  );

  const result = { success: true };
  await redis.setex(`idempotency:${idempotencyKey}`, 3600, JSON.stringify(result));
  res.json(result);
});

Idempotency Key Generation for LLM Requests

Clients provide idempotency keys. They should be UUIDs.

// Client generates once, reuses on retry
const idempotencyKey = uuidv4();

try {
  const response = await fetch('/api/generate', {
    method: 'POST',
    headers: {
      'Idempotency-Key': idempotencyKey
    },
    body: JSON.stringify({ prompt: 'Summarize this document' })
  });

  return response.json();
} catch (error) {
  // Retry with same key
  const response = await fetch('/api/generate', {
    method: 'POST',
    headers: {
      'Idempotency-Key': idempotencyKey // Same key
    },
    body: JSON.stringify({ prompt: 'Summarize this document' })
  });

  return response.json();
}

The server sees the same key and returns the cached result without reprocessing.

Storing LLM Responses With Idempotency Keys in Redis

Cache responses keyed by idempotency key.

async function handleLLMRequest(req, res) {
  const idempotencyKey = req.headers['idempotency-key'];

  if (!idempotencyKey) {
    return res.status(400).json({
      error: 'Idempotency-Key header required'
    });
  }

  // Check cache
  const cached = await redis.get(`llm_response:${idempotencyKey}`);
  if (cached) {
    res.setHeader('X-Idempotency-Replayed', 'true');
    return res.json(JSON.parse(cached));
  }

  // Call LLM
  const response = await openai.createChatCompletion({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: req.body.prompt }],
    temperature: req.body.temperature || 0.7
  });

  const result = {
    id: uuidv4(),
    content: response.choices[0].message.content,
    tokens: response.usage.total_tokens,
    finishReason: response.choices[0].finish_reason
  };

  // Cache the response
  const ttl = 86400; // 24 hours
  await redis.setex(
    `llm_response:${idempotencyKey}`,
    ttl,
    JSON.stringify(result)
  );

  // Also record cost against idempotency key
  await db.costs.insertOne({
    idempotencyKey,
    userId: req.user.id,
    tokens: result.tokens,
    costUsd: (result.tokens / 1000) * 0.01,
    timestamp: new Date()
  });

  res.json(result);
}

Cache TTL should match your retention policy. 24 hours is typical: long enough to handle retries, short enough to free memory.

Replay Cached Response on Retry

Replayed responses should be identical to the original response.

async function handleRequest(req, res) {
  const idempotencyKey = req.headers['idempotency-key'] || uuidv4();

  // Check if we''ve processed this before
  const cached = await redis.get(`request:${idempotencyKey}`);

  if (cached) {
    const { response, timestamp } = JSON.parse(cached);

    // Add header indicating this is a replayed response
    res.setHeader('X-From-Cache', 'true');
    res.setHeader('X-Original-Request-Time', timestamp);

    // Log the replay
    logger.info('Request replayed from cache', {
      idempotencyKey,
      age: Date.now() - new Date(timestamp).getTime(),
      originalCost: response.cost
    });

    // Important: do NOT recharge the user
    return res.json(response);
  }

  // Process normally
  const result = await processRequest(req);

  // Store for replay
  await redis.setex(
    `request:${idempotencyKey}`,
    3600,
    JSON.stringify({
      response: result,
      timestamp: new Date().toISOString()
    })
  );

  res.json(result);
}

Mark replayed responses with a header. This lets clients know the request was served from cache (old data, no new charge).

Idempotent Tool Calls

When an AI agent calls tools, it might call the same tool twice. Handle this.

// Tool: transfer funds
const transferTool = {
  name: 'transfer_funds',
  handler: async (params) =&gt; {
    const toolCallId = params.toolCallId; // Unique per tool invocation

    // Check if we''ve executed this tool call before
    const existing = await redis.get(`tool_call:${toolCallId}`);
    if (existing) {
      return JSON.parse(existing);
    }

    // Execute tool
    const result = await db.accounts.updateOne(
      { id: params.fromAccountId },
      { $inc: { amount: -params.amount } }
    );

    // Cache the result
    await redis.setex(
      `tool_call:${toolCallId}`,
      3600,
      JSON.stringify(result)
    );

    return result;
  }
};

// AI agent calls tool
const toolCall = {
  id: 'call_abc123',
  type: 'function',
  function: {
    name: 'transfer_funds',
    arguments: '{ "fromAccountId": "acc_1", "toAccountId": "acc_2", "amount": 100 }'
  }
};

// Execute with deduplication
const result = await transferTool.handler({
  ...JSON.parse(toolCall.function.arguments),
  toolCallId: toolCall.id // Use tool call ID for dedup
});

Tool calls have IDs. Use them for deduplication. If the same tool call ID arrives twice, return the cached result.

Database Upserts for AI-Generated Content

Upserts prevent duplicates when ingesting AI outputs.

// AI generates a summary
async function generateAndStoreSummary(documentId: string) {
  const summary = await openai.createChatCompletion({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: `Summarize this document: ${documentId}` }
    ]
  });

  // Upsert: insert if new, update if exists
  await db.summaries.updateOne(
    { documentId },
    {
      $set: {
        content: summary.choices[0].message.content,
        tokens: summary.usage.total_tokens,
        model: 'gpt-4o',
        generatedAt: new Date()
      }
    },
    { upsert: true }
  );
}

Upserts are idempotent. Running them twice has the same effect as running once: the record is updated to the same final state.

Deduplication of Webhook-Triggered AI Jobs

Webhooks can fire multiple times. Deduplicate.

// Webhook endpoint
app.post('/api/webhooks/document-uploaded', async (req, res) =&gt; {
  const eventId = req.body.eventId; // Unique per webhook event

  // Check if we''ve processed this webhook before
  const processed = await redis.get(`webhook:${eventId}`);
  if (processed) {
    return res.json({ message: 'Already processed' });
  }

  // Mark as processing
  await redis.set(`webhook:${eventId}`, 'processing');

  try {
    // Enqueue job
    const jobId = await queue.enqueue({
      type: 'process_document',
      documentId: req.body.documentId,
      webhookEventId: eventId
    });

    // Mark as processed
    await redis.set(`webhook:${eventId}`, JSON.stringify({ jobId }));

    res.json({ jobId });
  } catch (error) {
    // Delete processing flag on error so retry can try again
    await redis.del(`webhook:${eventId}`);
    throw error;
  }
});

// Worker
worker.on('process_document', async (job) =&gt; {
  // Double-check we haven''t processed this webhook before
  const alreadyProcessed = await db.processedWebhooks.findOne({
    webhookEventId: job.webhookEventId
  });

  if (alreadyProcessed) {
    logger.info('Webhook already processed', { webhookEventId: job.webhookEventId });
    return;
  }

  // Process document
  const result = await processDocument(job.documentId);

  // Record as processed
  await db.processedWebhooks.insertOne({
    webhookEventId: job.webhookEventId,
    jobId: job.id,
    result,
    processedAt: new Date()
  });
});

Handle webhooks at two levels: immediate dedup in Redis (fast rejection of obvious duplicates), then persistent dedup in the database (guard against job reprocessing).

Idempotency Window Expiry

Don't cache forever. Set expiry windows.

// Short window for sensitive operations (transfers, deletes)
const SENSITIVE_TTL = 3600; // 1 hour

// Longer window for expensive operations (summaries, embeddings)
const EXPENSIVE_TTL = 86400; // 24 hours

// Very short window for high-volume operations
const HIGH_VOLUME_TTL = 60; // 1 minute

async function cacheResponse(idempotencyKey, response, type) {
  let ttl;

  switch (type) {
    case 'transfer':
    case 'delete':
      ttl = SENSITIVE_TTL;
      break;
    case 'summarize':
    case 'embed':
      ttl = EXPENSIVE_TTL;
      break;
    case 'search':
    case 'list':
      ttl = HIGH_VOLUME_TTL;
      break;
    default:
      ttl = 3600;
  }

  await redis.setex(
    `idempotency:${idempotencyKey}`,
    ttl,
    JSON.stringify(response)
  );
}

Longer windows for expensive operations (they cost money, avoid recompute). Shorter windows for fast operations (save cache space).

Testing Idempotent AI Endpoints

Test that endpoints handle retries correctly.

async function testIdempotency() {
  const idempotencyKey = uuidv4();

  // First request
  const response1 = await fetch('/api/generate', {
    method: 'POST',
    headers: { 'Idempotency-Key': idempotencyKey },
    body: JSON.stringify({ prompt: 'Hello' })
  });

  const data1 = await response1.json();
  const timestamp1 = response1.headers.get('X-Original-Request-Time');

  // Retry with same key
  const response2 = await fetch('/api/generate', {
    method: 'POST',
    headers: { 'Idempotency-Key': idempotencyKey },
    body: JSON.stringify({ prompt: 'Hello' })
  });

  const data2 = await response2.json();

  // Verify identical responses
  assert.deepEqual(data1, data2);
  assert.equal(response2.headers.get('X-From-Cache'), 'true');
  assert.equal(response1.status, 200);
  assert.equal(response2.status, 200);

  // Verify only charged once
  const costs = await db.costs.find({ idempotencyKey });
  assert.equal(costs.length, 1);
}

// Test different prompt with same key should still return cached response
async function testIdempotencyKeyIgnoresBody() {
  const idempotencyKey = uuidv4();

  // First request
  await fetch('/api/generate', {
    method: 'POST',
    headers: { 'Idempotency-Key': idempotencyKey },
    body: JSON.stringify({ prompt: 'Hello' })
  });

  // Retry with different prompt, same key
  const response = await fetch('/api/generate', {
    method: 'POST',
    headers: { 'Idempotency-Key': idempotencyKey },
    body: JSON.stringify({ prompt: 'Goodbye' }) // Different
  });

  const data = await response.json();

  // Should still return response from first request (Hello)
  assert.equal(response.headers.get('X-From-Cache'), 'true');
}

Test that idempotent endpoints survive retries, duplicates, and different bodies with same key.

Checklist

Clients send Idempotency-Key header on state-changing requests
Server checks Redis for cached response before processing
Responses cached with TTL based on operation type
Replayed responses marked with X-From-Cache header
Tool calls deduplicated by tool call ID
Database upserts used for AI-generated content
Webhooks deduplicated at both Redis and database layers
Idempotency window TTL longer for expensive operations
Cost recorded once per idempotency key, not per attempt
Idempotent endpoints tested in CI/CD

Conclusion

Idempotency is about safety: retries shouldn't have side effects. Implement it at two levels: cache checks in Redis (fast) and database deduplication (durable). Test relentlessly. The cost of a duplicate LLM call or a duplicate transfer is higher than the cost of caching.