- Published on
Error Handling Patterns for AI Applications — Timeouts, Retries, and Graceful Degradation
- Authors
- Name
Introduction
LLM calls fail in ways that traditional APIs don''t. Context windows overflow, rate limits trigger, tokens timeout, and models occasionally refuse requests. This post covers battle-tested error handling patterns that keep AI features resilient and users happy—even when things go wrong.
- LLM Timeout Configuration
- Exponential Backoff with Jitter
- Different Error Types and Handling
- Context Window Exceeded Handling
- Graceful Degradation to Rule-Based Fallback
- Partial Failure Handling (Tool Call Failures)
- User-Facing Error Messages
- Error Rate SLOs for AI Features
- Checklist
- Conclusion
LLM Timeout Configuration
Configure timeouts explicitly. Don''t rely on provider defaults—they''re usually too generous for user-facing requests.
interface TimeoutConfig {
connectionTimeoutMs: number;
readTimeoutMs: number;
totalTimeoutMs: number;
}
// Production timeout strategy
const timeoutConfigs: Record<string, TimeoutConfig> = {
// Fast response required (user-facing, synchronous)
lowLatency: {
connectionTimeoutMs: 5000, // 5 seconds to establish connection
readTimeoutMs: 25000, // 25 seconds for response
totalTimeoutMs: 30000, // 30 seconds total request lifetime
},
// Background processing (async, non-blocking)
backgroundTask: {
connectionTimeoutMs: 10000,
readTimeoutMs: 120000, // 2 minutes for background work
totalTimeoutMs: 130000,
},
// Real-time streaming
streaming: {
connectionTimeoutMs: 5000,
readTimeoutMs: 60000, // Reset per token for streaming
totalTimeoutMs: 300000, // 5 minutes for full stream
},
};
import Anthropic from "@anthropic-ai/sdk";
class TimeoutError extends Error {
constructor(message: string, readonly timeoutMs: number) {
super(message);
this.name = "TimeoutError";
}
}
async function callLLMWithTimeout(
prompt: string,
config: TimeoutConfig
): Promise<string> {
const client = new Anthropic();
const controller = new AbortController();
const timeoutId = setTimeout(() => {
controller.abort();
}, config.totalTimeoutMs);
try {
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
});
clearTimeout(timeoutId);
return response.content[0].type === "text" ? response.content[0].text : "";
} catch (error) {
clearTimeout(timeoutId);
if (error instanceof Error && error.name === "AbortError") {
throw new TimeoutError(
`Request exceeded ${config.totalTimeoutMs}ms timeout`,
config.totalTimeoutMs
);
}
throw error;
}
}
Exponential Backoff with Jitter
Retry failed requests with exponential backoff and randomized jitter to avoid thundering herd problems.
interface RetryConfig {
maxRetries: number;
initialBackoffMs: number;
maxBackoffMs: number;
jitterFraction: number; // 0.1 = 10% jitter
}
interface RetryResult<T> {
success: boolean;
data?: T;
error?: Error;
attempt: number;
totalDurationMs: number;
}
async function retryWithExponentialBackoff<T>(
fn: () => Promise<T>,
config: RetryConfig
): Promise<RetryResult<T>> {
const startTime = Date.now();
let lastError: Error | undefined;
for (let attempt = 1; attempt <= config.maxRetries; attempt++) {
try {
const data = await fn();
return {
success: true,
data,
attempt,
totalDurationMs: Date.now() - startTime,
};
} catch (error) {
lastError = error instanceof Error ? error : new Error(String(error));
// Check if error is retryable
if (!isRetryableError(error)) {
return {
success: false,
error: lastError,
attempt,
totalDurationMs: Date.now() - startTime,
};
}
// Don''t sleep after final attempt
if (attempt < config.maxRetries) {
const backoffMs = calculateBackoff(attempt, config);
await sleep(backoffMs);
}
}
}
return {
success: false,
error: lastError,
attempt: config.maxRetries,
totalDurationMs: Date.now() - startTime,
};
}
function calculateBackoff(attempt: number, config: RetryConfig): number {
// Exponential: 2^(attempt-1) * initial
const exponentialBackoff = Math.pow(2, attempt - 1) * config.initialBackoffMs;
const capped = Math.min(exponentialBackoff, config.maxBackoffMs);
// Add jitter: randomize by ±jitterFraction
const jitterAmount = capped * config.jitterFraction;
const jitter = Math.random() * jitterAmount * 2 - jitterAmount;
return Math.max(0, capped + jitter);
}
function isRetryableError(error: unknown): boolean {
if (!(error instanceof Error)) return false;
const retryableMessages = [
"429", // Rate limit
"503", // Service unavailable
"504", // Gateway timeout
"ECONNRESET",
"ETIMEDOUT",
"ENOTFOUND",
];
return retryableMessages.some((msg) =>
error.message.toLowerCase().includes(msg.toLowerCase())
);
}
function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
// Usage
const result = await retryWithExponentialBackoff(
() => generateAiResponse(userMessage),
{
maxRetries: 3,
initialBackoffMs: 100,
maxBackoffMs: 10000,
jitterFraction: 0.1,
}
);
if (!result.success) {
console.error(
`Failed after ${result.attempt} attempts: ${result.error?.message}`
);
}
Different Error Types and Handling
Distinguish between error types and apply appropriate recovery strategies.
type ErrorType = "RateLimit" | "ServerError" | "Timeout" | "ContextWindow" | "Auth" | "Unknown";
interface ErrorContext {
type: ErrorType;
statusCode?: number;
message: string;
retryable: boolean;
userMessage: string;
}
function classifyError(error: unknown): ErrorContext {
if (error instanceof Error) {
const msg = error.message.toLowerCase();
if (msg.includes("429") || msg.includes("rate limit")) {
return {
type: "RateLimit",
message: error.message,
retryable: true,
userMessage:
"We''re experiencing high demand. Please try again in a moment.",
statusCode: 429,
};
}
if (
msg.includes("500") ||
msg.includes("502") ||
msg.includes("503") ||
msg.includes("server error")
) {
return {
type: "ServerError",
message: error.message,
retryable: true,
userMessage: "Our service is temporarily unavailable. Please try again.",
statusCode: 503,
};
}
if (msg.includes("timeout") || msg.includes("etimedout")) {
return {
type: "Timeout",
message: error.message,
retryable: true,
userMessage: "Request took too long. Please try a shorter query.",
statusCode: 408,
};
}
if (
msg.includes("context") ||
msg.includes("tokens") ||
msg.includes("length")
) {
return {
type: "ContextWindow",
message: error.message,
retryable: false,
userMessage: "Your message is too long. Please shorten it and try again.",
statusCode: 413,
};
}
if (msg.includes("unauthorized") || msg.includes("403")) {
return {
type: "Auth",
message: error.message,
retryable: false,
userMessage: "Authentication error. Please contact support.",
statusCode: 401,
};
}
}
return {
type: "Unknown",
message: String(error),
retryable: false,
userMessage: "An unexpected error occurred. Please try again.",
};
}
class ErrorHandler {
async handleError(error: unknown): Promise<{ fallback: string; logged: boolean }> {
const context = classifyError(error);
// Log all errors for monitoring
await this.logError(context);
// Apply type-specific handling
switch (context.type) {
case "RateLimit":
await this.handleRateLimit();
break;
case "ServerError":
await this.handleServerError();
break;
case "ContextWindow":
// Don''t retry context window errors—they won''t succeed
break;
case "Timeout":
// Timeout might be retryable but needs different backoff
break;
}
return {
fallback: context.userMessage,
logged: true,
};
}
private async logError(context: ErrorContext): Promise<void> {
// Send to monitoring service
console.error(JSON.stringify(context));
}
private async handleRateLimit(): Promise<void> {
// Implement backpressure: slow down requests
}
private async handleServerError(): Promise<void> {
// Activate fallback service
}
}
Context Window Exceeded Handling
Detect and handle context window overflow before it becomes a production issue.
interface ContextWindowCheckResult {
exceedsWindow: boolean;
estimatedTokens: number;
maxTokens: number;
buffer: number;
}
function estimateTokens(text: string): number {
// Rough estimation: 1 token ≈ 4 characters
// For exact counts, use provider''s tokenizer
return Math.ceil(text.length / 4);
}
function checkContextWindow(
systemPrompt: string,
messages: Array<{ role: string; content: string }>,
maxTokens: number,
modelContextWindow: number = 200000
): ContextWindowCheckResult {
const systemTokens = estimateTokens(systemPrompt);
const messageTokens = messages.reduce(
(sum, msg) => sum + estimateTokens(msg.content),
0
);
const responseTokens = maxTokens;
const totalTokens = systemTokens + messageTokens + responseTokens;
const buffer = 500; // Safety buffer
return {
exceedsWindow: totalTokens + buffer > modelContextWindow,
estimatedTokens: totalTokens,
maxTokens: modelContextWindow,
buffer,
};
}
async function callLLMWithContextCheck(
systemPrompt: string,
messages: Array<{ role: "user" | "assistant"; content: string }>,
maxTokens: number
): Promise<string> {
const check = checkContextWindow(systemPrompt, messages, maxTokens);
if (check.exceedsWindow) {
// Strategy 1: Truncate conversation history
const truncated = truncateMessages(messages, check.maxTokens * 0.6);
// Strategy 2: Summarize early messages
if (truncated.length === 0) {
const summary = await summarizeMessages(messages.slice(0, -5));
messages = [
{ role: "user", content: `Summary of previous context: ${summary}` },
messages[messages.length - 1],
];
}
}
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: maxTokens,
system: systemPrompt,
messages: messages.map((m) => ({
role: m.role,
content: m.content,
})),
});
return response.content[0].type === "text" ? response.content[0].text : "";
}
function truncateMessages(
messages: Array<{ role: string; content: string }>,
maxTokens: number
): Array<{ role: string; content: string }> {
let totalTokens = 0;
const result = [];
// Keep messages from the end (most recent first)
for (let i = messages.length - 1; i >= 0; i--) {
const tokens = estimateTokens(messages[i].content);
if (totalTokens + tokens <= maxTokens) {
result.unshift(messages[i]);
totalTokens += tokens;
} else {
break;
}
}
return result;
}
async function summarizeMessages(
messages: Array<{ role: string; content: string }>
): Promise<string> {
const client = new Anthropic();
const summaryPrompt = `Summarize this conversation in 2-3 sentences:\n${messages
.map((m) => `${m.role}: ${m.content}`)
.join("\n")}`;
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 256,
messages: [{ role: "user", content: summaryPrompt }],
});
return response.content[0].type === "text" ? response.content[0].text : "";
}
Graceful Degradation to Rule-Based Fallback
When AI fails, fall back to rule-based logic without user-visible errors.
type FallbackStrategy = "rule-based" | "cached" | "default" | "error-message";
interface DegradationConfig {
aiTimeoutMs: number;
fallbackStrategy: FallbackStrategy;
enableLogging: boolean;
}
class ResilientAIService {
async generateResponse(
userInput: string,
context: object,
config: DegradationConfig
): Promise<{ response: string; source: "ai" | "fallback" }> {
try {
const response = await this.callLLMWithTimeout(
userInput,
config.aiTimeoutMs
);
return { response, source: "ai" };
} catch (error) {
if (config.enableLogging) {
this.logDegradation(userInput, error);
}
// Apply fallback strategy
const fallbackResponse = await this.getFallbackResponse(
userInput,
context,
config.fallbackStrategy
);
return { response: fallbackResponse, source: "fallback" };
}
}
private async getFallbackResponse(
userInput: string,
context: object,
strategy: FallbackStrategy
): Promise<string> {
switch (strategy) {
case "rule-based":
return this.generateRuleBasedResponse(userInput, context);
case "cached":
return (
this.getCachedResponse(userInput) || this.generateRuleBasedResponse(userInput, context)
);
case "default":
return "I''m temporarily unable to process that. Please try again in a moment.";
case "error-message":
return "An error occurred. Please contact support.";
default:
return "Unable to process request.";
}
}
private generateRuleBasedResponse(
userInput: string,
context: object
): string {
// Simple rule-based logic
if (userInput.toLowerCase().includes("help")) {
return "Need help? Check our documentation at help.example.com";
}
if (userInput.toLowerCase().includes("pricing")) {
return "Our plans start at $9/month. Visit pricing.example.com for details.";
}
return "I''m unable to answer that right now. Please try a different question.";
}
private getCachedResponse(userInput: string): string | null {
// Check cache for similar previous queries
return null;
}
private async callLLMWithTimeout(
userInput: string,
timeoutMs: number
): Promise<string> {
return "AI response";
}
private logDegradation(userInput: string, error: unknown): void {
console.error({
event: "ai-degradation",
input: userInput,
error: String(error),
timestamp: new Date(),
});
}
}
Partial Failure Handling (Tool Call Failures)
Handle failures mid-chain when using tool-calling patterns. Retry individual steps or gracefully degrade.
interface ToolCallConfig {
toolName: string;
args: Record<string, unknown>;
retryable: boolean;
}
interface ToolResult {
success: boolean;
data?: unknown;
error?: string;
}
class ToolCallChain {
private tools: Record<string, (args: object) => Promise<unknown>> = {
search: async (args) => {
// Search implementation
return { results: [] };
},
fetch: async (args) => {
// Fetch implementation
return { content: "" };
},
calculate: async (args) => {
// Calculate implementation
return { result: 0 };
},
};
async executeToolCall(config: ToolCallConfig): Promise<ToolResult> {
const tool = this.tools[config.toolName];
if (!tool) {
return { success: false, error: `Tool not found: ${config.toolName}` };
}
try {
const data = await tool(config.args);
return { success: true, data };
} catch (error) {
if (!config.retryable) {
return {
success: false,
error: `Tool ${config.toolName} failed: ${String(error)}`,
};
}
// Retry once
try {
const data = await tool(config.args);
return { success: true, data };
} catch (retryError) {
return {
success: false,
error: `Tool ${config.toolName} failed after retry: ${String(retryError)}`,
};
}
}
}
async executeChain(
toolCalls: ToolCallConfig[]
): Promise<ToolResult[]> {
const results: ToolResult[] = [];
const failures: number[] = [];
for (let i = 0; i < toolCalls.length; i++) {
const result = await this.executeToolCall(toolCalls[i]);
if (!result.success) {
failures.push(i);
// For non-critical tools, continue chain
if (!toolCalls[i].retryable) {
results.push(result);
continue;
}
// For critical tools, stop chain
return results;
}
results.push(result);
}
return results;
}
}
User-Facing Error Messages
Translate technical errors into user-friendly messages.
interface UserMessage {
title: string;
description: string;
actionable: boolean;
suggestedAction?: string;
}
function createUserMessage(error: unknown): UserMessage {
const context = classifyError(error);
const messages: Record<string, UserMessage> = {
RateLimit: {
title: "We''re experiencing high volume",
description:
"Our service is temporarily busy. Please try again in a few moments.",
actionable: true,
suggestedAction: "Try again",
},
ServerError: {
title: "Service temporarily unavailable",
description: "We''re experiencing technical difficulties. Please try again shortly.",
actionable: true,
suggestedAction: "Retry request",
},
Timeout: {
title: "Request took too long",
description: "Your request exceeded our time limit. Try with a simpler or shorter query.",
actionable: true,
suggestedAction: "Simplify your request",
},
ContextWindow: {
title: "Message too long",
description: "Your request exceeds our processing limits. Please use a shorter message.",
actionable: true,
suggestedAction: "Shorten your message",
},
Auth: {
title: "Authentication error",
description: "We couldn''t verify your credentials. Please try logging in again.",
actionable: true,
suggestedAction: "Log in again",
},
Unknown: {
title: "Something went wrong",
description: "An unexpected error occurred. Our team has been notified.",
actionable: false,
},
};
return messages[context.type] || messages.Unknown;
}
Error Rate SLOs for AI Features
Monitor error rates and alert on SLO violations.
interface ErrorSLO {
featureName: string;
maxErrorRatePercent: number;
windowMinutes: number;
alertThresholdPercent: number;
}
class ErrorRateMonitor {
private errorCounts = new Map<string, number[]>();
private totalCounts = new Map<string, number[]>();
recordRequest(
featureName: string,
success: boolean
): void {
const now = Date.now();
const key = `${featureName}:${Math.floor(now / 60000)}`; // 1-minute buckets
const errors = this.errorCounts.get(key) || 0;
const total = this.totalCounts.get(key) || 0;
this.errorCounts.set(
key,
errors + (success ? 0 : 1)
);
this.totalCounts.set(key, total + 1);
}
checkSLO(slo: ErrorSLO): {
inSLO: boolean;
currentErrorRate: number;
triggered: boolean;
} {
const now = Date.now();
let totalErrors = 0;
let totalRequests = 0;
// Check last N minutes
for (let i = 0; i < slo.windowMinutes; i++) {
const bucketTime = now - i * 60000;
const key = `${slo.featureName}:${Math.floor(bucketTime / 60000)}`;
totalErrors += this.errorCounts.get(key) || 0;
totalRequests += this.totalCounts.get(key) || 0;
}
const errorRate =
totalRequests > 0 ? (totalErrors / totalRequests) * 100 : 0;
const inSLO = errorRate <= slo.maxErrorRatePercent;
const triggered = errorRate >= slo.alertThresholdPercent;
return {
inSLO,
currentErrorRate: errorRate,
triggered,
};
}
}
Checklist
- Set explicit timeouts for user-facing requests <30s, background <2 minutes
- Implement exponential backoff with jitter to avoid thundering herd
- Distinguish error types and apply type-specific recovery strategies
- Check context window before sending requests, truncate if needed
- Design graceful degradation with rule-based fallback
- Handle partial tool-call failures in chains
- Translate technical errors to user-friendly messages
- Monitor error rates and alert on SLO violations
Conclusion
Error handling in AI systems requires a layered approach. Configure timeouts explicitly, retry intelligently with exponential backoff, and classify errors to apply appropriate recovery strategies. Build in graceful degradation from the start so that when AI fails, users still get a useful response from rule-based fallbacks. Monitor error rates continuously and alert when SLOs breach. By treating error handling as a first-class concern, you''ll build AI features that users can rely on.