Published on

Error Handling Patterns for AI Applications — Timeouts, Retries, and Graceful Degradation

Authors
  • Name
    Twitter

Introduction

LLM calls fail in ways that traditional APIs don''t. Context windows overflow, rate limits trigger, tokens timeout, and models occasionally refuse requests. This post covers battle-tested error handling patterns that keep AI features resilient and users happy—even when things go wrong.

LLM Timeout Configuration

Configure timeouts explicitly. Don''t rely on provider defaults—they''re usually too generous for user-facing requests.

interface TimeoutConfig {
  connectionTimeoutMs: number;
  readTimeoutMs: number;
  totalTimeoutMs: number;
}

// Production timeout strategy
const timeoutConfigs: Record<string, TimeoutConfig> = {
  // Fast response required (user-facing, synchronous)
  lowLatency: {
    connectionTimeoutMs: 5000, // 5 seconds to establish connection
    readTimeoutMs: 25000, // 25 seconds for response
    totalTimeoutMs: 30000, // 30 seconds total request lifetime
  },

  // Background processing (async, non-blocking)
  backgroundTask: {
    connectionTimeoutMs: 10000,
    readTimeoutMs: 120000, // 2 minutes for background work
    totalTimeoutMs: 130000,
  },

  // Real-time streaming
  streaming: {
    connectionTimeoutMs: 5000,
    readTimeoutMs: 60000, // Reset per token for streaming
    totalTimeoutMs: 300000, // 5 minutes for full stream
  },
};

import Anthropic from "@anthropic-ai/sdk";

class TimeoutError extends Error {
  constructor(message: string, readonly timeoutMs: number) {
    super(message);
    this.name = "TimeoutError";
  }
}

async function callLLMWithTimeout(
  prompt: string,
  config: TimeoutConfig
): Promise<string> {
  const client = new Anthropic();
  const controller = new AbortController();

  const timeoutId = setTimeout(() => {
    controller.abort();
  }, config.totalTimeoutMs);

  try {
    const response = await client.messages.create({
      model: "claude-3-5-sonnet-20241022",
      max_tokens: 1024,
      messages: [{ role: "user", content: prompt }],
    });

    clearTimeout(timeoutId);
    return response.content[0].type === "text" ? response.content[0].text : "";
  } catch (error) {
    clearTimeout(timeoutId);

    if (error instanceof Error && error.name === "AbortError") {
      throw new TimeoutError(
        `Request exceeded ${config.totalTimeoutMs}ms timeout`,
        config.totalTimeoutMs
      );
    }

    throw error;
  }
}

Exponential Backoff with Jitter

Retry failed requests with exponential backoff and randomized jitter to avoid thundering herd problems.

interface RetryConfig {
  maxRetries: number;
  initialBackoffMs: number;
  maxBackoffMs: number;
  jitterFraction: number; // 0.1 = 10% jitter
}

interface RetryResult<T> {
  success: boolean;
  data?: T;
  error?: Error;
  attempt: number;
  totalDurationMs: number;
}

async function retryWithExponentialBackoff<T>(
  fn: () => Promise<T>,
  config: RetryConfig
): Promise<RetryResult<T>> {
  const startTime = Date.now();
  let lastError: Error | undefined;

  for (let attempt = 1; attempt <= config.maxRetries; attempt++) {
    try {
      const data = await fn();
      return {
        success: true,
        data,
        attempt,
        totalDurationMs: Date.now() - startTime,
      };
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error));

      // Check if error is retryable
      if (!isRetryableError(error)) {
        return {
          success: false,
          error: lastError,
          attempt,
          totalDurationMs: Date.now() - startTime,
        };
      }

      // Don''t sleep after final attempt
      if (attempt < config.maxRetries) {
        const backoffMs = calculateBackoff(attempt, config);
        await sleep(backoffMs);
      }
    }
  }

  return {
    success: false,
    error: lastError,
    attempt: config.maxRetries,
    totalDurationMs: Date.now() - startTime,
  };
}

function calculateBackoff(attempt: number, config: RetryConfig): number {
  // Exponential: 2^(attempt-1) * initial
  const exponentialBackoff = Math.pow(2, attempt - 1) * config.initialBackoffMs;
  const capped = Math.min(exponentialBackoff, config.maxBackoffMs);

  // Add jitter: randomize by ±jitterFraction
  const jitterAmount = capped * config.jitterFraction;
  const jitter = Math.random() * jitterAmount * 2 - jitterAmount;

  return Math.max(0, capped + jitter);
}

function isRetryableError(error: unknown): boolean {
  if (!(error instanceof Error)) return false;

  const retryableMessages = [
    "429", // Rate limit
    "503", // Service unavailable
    "504", // Gateway timeout
    "ECONNRESET",
    "ETIMEDOUT",
    "ENOTFOUND",
  ];

  return retryableMessages.some((msg) =>
    error.message.toLowerCase().includes(msg.toLowerCase())
  );
}

function sleep(ms: number): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

// Usage
const result = await retryWithExponentialBackoff(
  () => generateAiResponse(userMessage),
  {
    maxRetries: 3,
    initialBackoffMs: 100,
    maxBackoffMs: 10000,
    jitterFraction: 0.1,
  }
);

if (!result.success) {
  console.error(
    `Failed after ${result.attempt} attempts: ${result.error?.message}`
  );
}

Different Error Types and Handling

Distinguish between error types and apply appropriate recovery strategies.

type ErrorType = "RateLimit" | "ServerError" | "Timeout" | "ContextWindow" | "Auth" | "Unknown";

interface ErrorContext {
  type: ErrorType;
  statusCode?: number;
  message: string;
  retryable: boolean;
  userMessage: string;
}

function classifyError(error: unknown): ErrorContext {
  if (error instanceof Error) {
    const msg = error.message.toLowerCase();

    if (msg.includes("429") || msg.includes("rate limit")) {
      return {
        type: "RateLimit",
        message: error.message,
        retryable: true,
        userMessage:
          "We''re experiencing high demand. Please try again in a moment.",
        statusCode: 429,
      };
    }

    if (
      msg.includes("500") ||
      msg.includes("502") ||
      msg.includes("503") ||
      msg.includes("server error")
    ) {
      return {
        type: "ServerError",
        message: error.message,
        retryable: true,
        userMessage: "Our service is temporarily unavailable. Please try again.",
        statusCode: 503,
      };
    }

    if (msg.includes("timeout") || msg.includes("etimedout")) {
      return {
        type: "Timeout",
        message: error.message,
        retryable: true,
        userMessage: "Request took too long. Please try a shorter query.",
        statusCode: 408,
      };
    }

    if (
      msg.includes("context") ||
      msg.includes("tokens") ||
      msg.includes("length")
    ) {
      return {
        type: "ContextWindow",
        message: error.message,
        retryable: false,
        userMessage: "Your message is too long. Please shorten it and try again.",
        statusCode: 413,
      };
    }

    if (msg.includes("unauthorized") || msg.includes("403")) {
      return {
        type: "Auth",
        message: error.message,
        retryable: false,
        userMessage: "Authentication error. Please contact support.",
        statusCode: 401,
      };
    }
  }

  return {
    type: "Unknown",
    message: String(error),
    retryable: false,
    userMessage: "An unexpected error occurred. Please try again.",
  };
}

class ErrorHandler {
  async handleError(error: unknown): Promise<{ fallback: string; logged: boolean }> {
    const context = classifyError(error);

    // Log all errors for monitoring
    await this.logError(context);

    // Apply type-specific handling
    switch (context.type) {
      case "RateLimit":
        await this.handleRateLimit();
        break;
      case "ServerError":
        await this.handleServerError();
        break;
      case "ContextWindow":
        // Don''t retry context window errors—they won''t succeed
        break;
      case "Timeout":
        // Timeout might be retryable but needs different backoff
        break;
    }

    return {
      fallback: context.userMessage,
      logged: true,
    };
  }

  private async logError(context: ErrorContext): Promise<void> {
    // Send to monitoring service
    console.error(JSON.stringify(context));
  }

  private async handleRateLimit(): Promise<void> {
    // Implement backpressure: slow down requests
  }

  private async handleServerError(): Promise<void> {
    // Activate fallback service
  }
}

Context Window Exceeded Handling

Detect and handle context window overflow before it becomes a production issue.

interface ContextWindowCheckResult {
  exceedsWindow: boolean;
  estimatedTokens: number;
  maxTokens: number;
  buffer: number;
}

function estimateTokens(text: string): number {
  // Rough estimation: 1 token ≈ 4 characters
  // For exact counts, use provider''s tokenizer
  return Math.ceil(text.length / 4);
}

function checkContextWindow(
  systemPrompt: string,
  messages: Array<{ role: string; content: string }>,
  maxTokens: number,
  modelContextWindow: number = 200000
): ContextWindowCheckResult {
  const systemTokens = estimateTokens(systemPrompt);
  const messageTokens = messages.reduce(
    (sum, msg) => sum + estimateTokens(msg.content),
    0
  );
  const responseTokens = maxTokens;
  const totalTokens = systemTokens + messageTokens + responseTokens;
  const buffer = 500; // Safety buffer

  return {
    exceedsWindow: totalTokens + buffer > modelContextWindow,
    estimatedTokens: totalTokens,
    maxTokens: modelContextWindow,
    buffer,
  };
}

async function callLLMWithContextCheck(
  systemPrompt: string,
  messages: Array<{ role: "user" | "assistant"; content: string }>,
  maxTokens: number
): Promise<string> {
  const check = checkContextWindow(systemPrompt, messages, maxTokens);

  if (check.exceedsWindow) {
    // Strategy 1: Truncate conversation history
    const truncated = truncateMessages(messages, check.maxTokens * 0.6);

    // Strategy 2: Summarize early messages
    if (truncated.length === 0) {
      const summary = await summarizeMessages(messages.slice(0, -5));
      messages = [
        { role: "user", content: `Summary of previous context: ${summary}` },
        messages[messages.length - 1],
      ];
    }
  }

  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: maxTokens,
    system: systemPrompt,
    messages: messages.map((m) => ({
      role: m.role,
      content: m.content,
    })),
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

function truncateMessages(
  messages: Array<{ role: string; content: string }>,
  maxTokens: number
): Array<{ role: string; content: string }> {
  let totalTokens = 0;
  const result = [];

  // Keep messages from the end (most recent first)
  for (let i = messages.length - 1; i >= 0; i--) {
    const tokens = estimateTokens(messages[i].content);

    if (totalTokens + tokens <= maxTokens) {
      result.unshift(messages[i]);
      totalTokens += tokens;
    } else {
      break;
    }
  }

  return result;
}

async function summarizeMessages(
  messages: Array<{ role: string; content: string }>
): Promise<string> {
  const client = new Anthropic();

  const summaryPrompt = `Summarize this conversation in 2-3 sentences:\n${messages
    .map((m) => `${m.role}: ${m.content}`)
    .join("\n")}`;

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 256,
    messages: [{ role: "user", content: summaryPrompt }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Graceful Degradation to Rule-Based Fallback

When AI fails, fall back to rule-based logic without user-visible errors.

type FallbackStrategy = "rule-based" | "cached" | "default" | "error-message";

interface DegradationConfig {
  aiTimeoutMs: number;
  fallbackStrategy: FallbackStrategy;
  enableLogging: boolean;
}

class ResilientAIService {
  async generateResponse(
    userInput: string,
    context: object,
    config: DegradationConfig
  ): Promise<{ response: string; source: "ai" | "fallback" }> {
    try {
      const response = await this.callLLMWithTimeout(
        userInput,
        config.aiTimeoutMs
      );
      return { response, source: "ai" };
    } catch (error) {
      if (config.enableLogging) {
        this.logDegradation(userInput, error);
      }

      // Apply fallback strategy
      const fallbackResponse = await this.getFallbackResponse(
        userInput,
        context,
        config.fallbackStrategy
      );

      return { response: fallbackResponse, source: "fallback" };
    }
  }

  private async getFallbackResponse(
    userInput: string,
    context: object,
    strategy: FallbackStrategy
  ): Promise<string> {
    switch (strategy) {
      case "rule-based":
        return this.generateRuleBasedResponse(userInput, context);

      case "cached":
        return (
          this.getCachedResponse(userInput) || this.generateRuleBasedResponse(userInput, context)
        );

      case "default":
        return "I''m temporarily unable to process that. Please try again in a moment.";

      case "error-message":
        return "An error occurred. Please contact support.";

      default:
        return "Unable to process request.";
    }
  }

  private generateRuleBasedResponse(
    userInput: string,
    context: object
  ): string {
    // Simple rule-based logic
    if (userInput.toLowerCase().includes("help")) {
      return "Need help? Check our documentation at help.example.com";
    }

    if (userInput.toLowerCase().includes("pricing")) {
      return "Our plans start at $9/month. Visit pricing.example.com for details.";
    }

    return "I''m unable to answer that right now. Please try a different question.";
  }

  private getCachedResponse(userInput: string): string | null {
    // Check cache for similar previous queries
    return null;
  }

  private async callLLMWithTimeout(
    userInput: string,
    timeoutMs: number
  ): Promise<string> {
    return "AI response";
  }

  private logDegradation(userInput: string, error: unknown): void {
    console.error({
      event: "ai-degradation",
      input: userInput,
      error: String(error),
      timestamp: new Date(),
    });
  }
}

Partial Failure Handling (Tool Call Failures)

Handle failures mid-chain when using tool-calling patterns. Retry individual steps or gracefully degrade.

interface ToolCallConfig {
  toolName: string;
  args: Record<string, unknown>;
  retryable: boolean;
}

interface ToolResult {
  success: boolean;
  data?: unknown;
  error?: string;
}

class ToolCallChain {
  private tools: Record<string, (args: object) => Promise<unknown>> = {
    search: async (args) => {
      // Search implementation
      return { results: [] };
    },
    fetch: async (args) => {
      // Fetch implementation
      return { content: "" };
    },
    calculate: async (args) => {
      // Calculate implementation
      return { result: 0 };
    },
  };

  async executeToolCall(config: ToolCallConfig): Promise<ToolResult> {
    const tool = this.tools[config.toolName];

    if (!tool) {
      return { success: false, error: `Tool not found: ${config.toolName}` };
    }

    try {
      const data = await tool(config.args);
      return { success: true, data };
    } catch (error) {
      if (!config.retryable) {
        return {
          success: false,
          error: `Tool ${config.toolName} failed: ${String(error)}`,
        };
      }

      // Retry once
      try {
        const data = await tool(config.args);
        return { success: true, data };
      } catch (retryError) {
        return {
          success: false,
          error: `Tool ${config.toolName} failed after retry: ${String(retryError)}`,
        };
      }
    }
  }

  async executeChain(
    toolCalls: ToolCallConfig[]
  ): Promise<ToolResult[]> {
    const results: ToolResult[] = [];
    const failures: number[] = [];

    for (let i = 0; i < toolCalls.length; i++) {
      const result = await this.executeToolCall(toolCalls[i]);

      if (!result.success) {
        failures.push(i);

        // For non-critical tools, continue chain
        if (!toolCalls[i].retryable) {
          results.push(result);
          continue;
        }

        // For critical tools, stop chain
        return results;
      }

      results.push(result);
    }

    return results;
  }
}

User-Facing Error Messages

Translate technical errors into user-friendly messages.

interface UserMessage {
  title: string;
  description: string;
  actionable: boolean;
  suggestedAction?: string;
}

function createUserMessage(error: unknown): UserMessage {
  const context = classifyError(error);

  const messages: Record<string, UserMessage> = {
    RateLimit: {
      title: "We''re experiencing high volume",
      description:
        "Our service is temporarily busy. Please try again in a few moments.",
      actionable: true,
      suggestedAction: "Try again",
    },
    ServerError: {
      title: "Service temporarily unavailable",
      description: "We''re experiencing technical difficulties. Please try again shortly.",
      actionable: true,
      suggestedAction: "Retry request",
    },
    Timeout: {
      title: "Request took too long",
      description: "Your request exceeded our time limit. Try with a simpler or shorter query.",
      actionable: true,
      suggestedAction: "Simplify your request",
    },
    ContextWindow: {
      title: "Message too long",
      description: "Your request exceeds our processing limits. Please use a shorter message.",
      actionable: true,
      suggestedAction: "Shorten your message",
    },
    Auth: {
      title: "Authentication error",
      description: "We couldn''t verify your credentials. Please try logging in again.",
      actionable: true,
      suggestedAction: "Log in again",
    },
    Unknown: {
      title: "Something went wrong",
      description: "An unexpected error occurred. Our team has been notified.",
      actionable: false,
    },
  };

  return messages[context.type] || messages.Unknown;
}

Error Rate SLOs for AI Features

Monitor error rates and alert on SLO violations.

interface ErrorSLO {
  featureName: string;
  maxErrorRatePercent: number;
  windowMinutes: number;
  alertThresholdPercent: number;
}

class ErrorRateMonitor {
  private errorCounts = new Map<string, number[]>();
  private totalCounts = new Map<string, number[]>();

  recordRequest(
    featureName: string,
    success: boolean
  ): void {
    const now = Date.now();
    const key = `${featureName}:${Math.floor(now / 60000)}`; // 1-minute buckets

    const errors = this.errorCounts.get(key) || 0;
    const total = this.totalCounts.get(key) || 0;

    this.errorCounts.set(
      key,
      errors + (success ? 0 : 1)
    );
    this.totalCounts.set(key, total + 1);
  }

  checkSLO(slo: ErrorSLO): {
    inSLO: boolean;
    currentErrorRate: number;
    triggered: boolean;
  } {
    const now = Date.now();
    let totalErrors = 0;
    let totalRequests = 0;

    // Check last N minutes
    for (let i = 0; i < slo.windowMinutes; i++) {
      const bucketTime = now - i * 60000;
      const key = `${slo.featureName}:${Math.floor(bucketTime / 60000)}`;

      totalErrors += this.errorCounts.get(key) || 0;
      totalRequests += this.totalCounts.get(key) || 0;
    }

    const errorRate =
      totalRequests > 0 ? (totalErrors / totalRequests) * 100 : 0;
    const inSLO = errorRate <= slo.maxErrorRatePercent;
    const triggered = errorRate >= slo.alertThresholdPercent;

    return {
      inSLO,
      currentErrorRate: errorRate,
      triggered,
    };
  }
}

Checklist

  • Set explicit timeouts for user-facing requests <30s, background <2 minutes
  • Implement exponential backoff with jitter to avoid thundering herd
  • Distinguish error types and apply type-specific recovery strategies
  • Check context window before sending requests, truncate if needed
  • Design graceful degradation with rule-based fallback
  • Handle partial tool-call failures in chains
  • Translate technical errors to user-friendly messages
  • Monitor error rates and alert on SLO violations

Conclusion

Error handling in AI systems requires a layered approach. Configure timeouts explicitly, retry intelligently with exponential backoff, and classify errors to apply appropriate recovery strategies. Build in graceful degradation from the start so that when AI fails, users still get a useful response from rule-based fallbacks. Monitor error rates continuously and alert when SLOs breach. By treating error handling as a first-class concern, you''ll build AI features that users can rely on.