Designing APIs for AI Agent Consumers — Not Humans

Introduction

AI agents are a different consumer than humans. They don't read your documentation carefully. They don't tolerate ambiguous error messages. They need bulk operations, structured responses, and ironclad idempotency guarantees. Designing APIs for agents requires rethinking API design from scratch.

Key Differences: AI Agents vs. Humans
Idempotency Keys Are Non-Negotiable
Verbose Error Messages With Actionable Context
Pagination Designed for Agents
Webhook Patterns for Long-Running Operations
OpenAPI Spec as the Agent's Documentation
Rate Limiting Agents: Token Bucket, Cost-Aware
Versioning for Agent Stability
Checklist
Conclusion

Key Differences: AI Agents vs. Humans

When a human encounters an ambiguous API response, they read the docs. When an agent encounters ambiguity, it retries randomly or gets stuck in a loop.

Humans tolerate UX polish: Nice error messages, helpful hints, partial information. Agents need structure.

Agents need determinism: If an endpoint sometimes returns { status: 'processing' } and sometimes { status: 'queued' }, an agent can't program against it.

Agents need bulk operations: Humans submit one form at a time. Agents submit 10,000 records. Your API must support batch endpoints.

Agents fear ambiguity: Humans infer context. Agents need every detail explicit.

// Bad API: ambiguous error
{
  "error": "failed"
}

// Good API: structured, actionable
{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Token quota exceeded",
    "retryAfter": 60,
    "details": {
      "tokenLimit": 1000000,
      "tokensUsed": 1000000,
      "resetAt": "2026-03-18T14:30:00Z"
    }
  }
}

Agents can parse structured errors and decide: retry after 60 seconds, or fail gracefully.

Idempotency Keys Are Non-Negotiable

Agents retry. Without idempotency, retries create duplicates.

app.post('/api/process', async (req, res) => {
  const idempotencyKey = req.headers['idempotency-key'];

  if (!idempotencyKey) {
    return res.status(400).json({
      error: {
        code: 'MISSING_IDEMPOTENCY_KEY',
        message: 'Idempotency-Key header required'
      }
    });
  }

  // Check if we've seen this key before
  const cached = await redis.get(`idempotency:${idempotencyKey}`);
  if (cached) {
    return res.json(cached);
  }

  // Process the request
  const result = await processRequest(req.body);

  // Cache the response keyed by idempotency key
  await redis.setex(
    `idempotency:${idempotencyKey}`,
    3600, // 1 hour TTL
    JSON.stringify(result)
  );

  res.json(result);
});

Every state-changing endpoint requires an Idempotency-Key header. The same key always produces the same result.

Verbose Error Messages With Actionable Context

Don't just tell agents something failed. Tell them why, how to fix it, and what they can do next.

// Bad: cryptic
{ "error": "validation failed" }

// Good: actionable
{
  "error": {
    "code": "INVALID_RESOURCE_FORMAT",
    "message": "Request body contains invalid resource type",
    "details": {
      "field": "resources[0].type",
      "value": "InvalidType",
      "allowedValues": ["document", "image", "audio", "video"],
      "suggestion": "Did you mean 'document'?"
    },
    "requestId": "req_123abc"
  }
}

Include requestId for tracing. Include allowedValues to guide the agent. Include suggestion when possible.

Pagination Designed for Agents

Humans use page numbers. Agents use cursors.

// Bad for agents: offset pagination
GET /api/users?limit=100&offset=1000

// Good for agents: cursor-based
GET /api/users?limit=100&after=cursor_abc123

// Response
{
  "data": [...],
  "pagination": {
    "cursor": "cursor_def456",
    "hasMore": true,
    "totalCount": 50000
  }
}

Cursor pagination is stable even when data changes. If records are deleted between pages, offset-based pagination skips or duplicates rows.

app.get('/api/users', async (req, res) => {
  const limit = Math.min(parseInt(req.query.limit) || 100, 1000);
  const after = req.query.after || null;

  let query = db.users.orderBy('id');
  if (after) {
    query = query.where('id', '&gt;', after);
  }

  const users = await query.limit(limit + 1);
  const hasMore = users.length &gt; limit;
  const data = users.slice(0, limit);
  const nextCursor = data.length &gt; 0 ? data[data.length - 1].id : null;

  res.json({
    data,
    pagination: {
      cursor: nextCursor,
      hasMore,
      totalCount: await db.users.count()
    }
  });
});

Webhook Patterns for Long-Running Operations

Agents can't long-poll. They need webhooks for status updates.

// Agent initiates async operation
POST /api/jobs
{
  "type": "process_dataset",
  "datasetId": "ds_123",
  "webhookUrl": "https://agent.example.com/webhooks/job-complete"
}

// Response: job created
{
  "jobId": "job_abc123",
  "status": "queued",
  "createdAt": "2026-03-18T10:00:00Z"
}

// Later: system calls webhook
POST https://agent.example.com/webhooks/job-complete
{
  "jobId": "job_abc123",
  "status": "completed",
  "result": {...},
  "completedAt": "2026-03-18T10:05:00Z"
}

Agents should also be able to poll for status:

GET /api/jobs/job_abc123

{
  "jobId": "job_abc123",
  "status": "completed",
  "progress": { "current": 100, "total": 100 },
  "result": {...}
}

OpenAPI Spec as the Agent's Documentation

Agents read code from OpenAPI specs. Your OpenAPI spec must be complete and correct.

openapi: 3.0.0
info:
  title: Agent API
  version: 1.0.0

paths:
  /api/process:
    post:
      operationId: processRequest
      summary: Process a request asynchronously
      description: |
        Processes a request in the background.
        Requires an Idempotency-Key header.
      parameters:
        - name: Idempotency-Key
          in: header
          required: true
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                data:
                  type: string
              required: [data]
      responses:
        '200':
          description: Request accepted
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ProcessResult'
        '429':
          description: Rate limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RateLimitError'

components:
  schemas:
    ProcessResult:
      type: object
      properties:
        id:
          type: string
        status:
          type: string
          enum: [queued, processing, completed, failed]
      required: [id, status]
    RateLimitError:
      type: object
      properties:
        error:
          type: object
          properties:
            code:
              type: string
              enum: [RATE_LIMIT_EXCEEDED]
            retryAfter:
              type: integer

Use operationId consistently. Agents need it to find the endpoint they want.

Rate Limiting Agents: Token Bucket, Cost-Aware

Human users trigger one request. An agent might trigger 10,000. Rate limiting must account for cost, not just count.

class CostAwareRateLimiter {
  async checkLimit(agentId, tokens) {
    const bucket = await redis.get(`bucket:${agentId}`);
    const now = Date.now();

    let { capacity, refillRate, lastRefill } = bucket || {
      capacity: 1000000,
      refillRate: 100000, // tokens per minute
      lastRefill: now
    };

    // Refill tokens based on time elapsed
    const elapsed = (now - lastRefill) / 60000; // minutes
    capacity = Math.min(
      capacity + (refillRate * elapsed),
      1000000
    );

    if (capacity &lt; tokens) {
      return { allowed: false, retryAfter: Math.ceil((tokens - capacity) / refillRate) * 60 };
    }

    capacity -= tokens;
    await redis.set(`bucket:${agentId}`, {
      capacity,
      refillRate,
      lastRefill: now
    });

    return { allowed: true };
  }
}

Rate limits should be transparent and generous for legitimate agents. Punish abusers, not normal traffic.

Versioning for Agent Stability

Agents depend on field names and types. Changing your API breaks them.

// Version in header
GET /api/users HTTP/1.1
API-Version: 2026-03-01

// Or in URL
GET /api/v2/users

// Or both: support multiple versions
app.get('/api/:version/users', async (req, res) => {
  const version = req.params.version;

  if (version === 'v1') {
    // Old format: userId, firstName, lastName
    return res.json({
      data: users.map(u => ({
        userId: u.id,
        firstName: u.first_name,
        lastName: u.last_name
      }))
    });
  }

  if (version === 'v2') {
    // New format: id, name, email
    return res.json({
      data: users.map(u => ({
        id: u.id,
        name: `${u.first_name} ${u.last_name}`,
        email: u.email
      }))
    });
  }
});

Deprecate old versions slowly. Announce deprecation months in advance. Support v1 and v2 simultaneously for 6+ months.

Checklist

Every state-changing endpoint requires Idempotency-Key header
Error responses include code, message, details, and requestId
Pagination uses cursors, not offsets
Offer both webhook and polling patterns for async operations
OpenAPI spec is complete and up-to-date
Rate limiting considers cost, not just request count
API supports multiple versions simultaneously
All enum values are documented in OpenAPI spec
Bulk endpoints support 10K+ records per request
Response shapes are deterministic (no optional fields without warning)

Conclusion

APIs for AI agents are contracts with code, not humans. They demand precision, determinism, and completeness. Idempotency keys, structured errors, cursor-based pagination, and versioning are non-negotiable. Build for agents, and humans will benefit from your precision too.