- Published on
Designing APIs for AI Agent Consumers — Not Humans
- Authors
- Name
Introduction
AI agents are a different consumer than humans. They don't read your documentation carefully. They don't tolerate ambiguous error messages. They need bulk operations, structured responses, and ironclad idempotency guarantees. Designing APIs for agents requires rethinking API design from scratch.
- Key Differences: AI Agents vs. Humans
- Idempotency Keys Are Non-Negotiable
- Verbose Error Messages With Actionable Context
- Pagination Designed for Agents
- Webhook Patterns for Long-Running Operations
- OpenAPI Spec as the Agent's Documentation
- Rate Limiting Agents: Token Bucket, Cost-Aware
- Versioning for Agent Stability
- Checklist
- Conclusion
Key Differences: AI Agents vs. Humans
When a human encounters an ambiguous API response, they read the docs. When an agent encounters ambiguity, it retries randomly or gets stuck in a loop.
Humans tolerate UX polish: Nice error messages, helpful hints, partial information. Agents need structure.
Agents need determinism: If an endpoint sometimes returns { status: 'processing' } and sometimes { status: 'queued' }, an agent can't program against it.
Agents need bulk operations: Humans submit one form at a time. Agents submit 10,000 records. Your API must support batch endpoints.
Agents fear ambiguity: Humans infer context. Agents need every detail explicit.
// Bad API: ambiguous error
{
"error": "failed"
}
// Good API: structured, actionable
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Token quota exceeded",
"retryAfter": 60,
"details": {
"tokenLimit": 1000000,
"tokensUsed": 1000000,
"resetAt": "2026-03-18T14:30:00Z"
}
}
}
Agents can parse structured errors and decide: retry after 60 seconds, or fail gracefully.
Idempotency Keys Are Non-Negotiable
Agents retry. Without idempotency, retries create duplicates.
app.post('/api/process', async (req, res) => {
const idempotencyKey = req.headers['idempotency-key'];
if (!idempotencyKey) {
return res.status(400).json({
error: {
code: 'MISSING_IDEMPOTENCY_KEY',
message: 'Idempotency-Key header required'
}
});
}
// Check if we've seen this key before
const cached = await redis.get(`idempotency:${idempotencyKey}`);
if (cached) {
return res.json(cached);
}
// Process the request
const result = await processRequest(req.body);
// Cache the response keyed by idempotency key
await redis.setex(
`idempotency:${idempotencyKey}`,
3600, // 1 hour TTL
JSON.stringify(result)
);
res.json(result);
});
Every state-changing endpoint requires an Idempotency-Key header. The same key always produces the same result.
Verbose Error Messages With Actionable Context
Don't just tell agents something failed. Tell them why, how to fix it, and what they can do next.
// Bad: cryptic
{ "error": "validation failed" }
// Good: actionable
{
"error": {
"code": "INVALID_RESOURCE_FORMAT",
"message": "Request body contains invalid resource type",
"details": {
"field": "resources[0].type",
"value": "InvalidType",
"allowedValues": ["document", "image", "audio", "video"],
"suggestion": "Did you mean 'document'?"
},
"requestId": "req_123abc"
}
}
Include requestId for tracing. Include allowedValues to guide the agent. Include suggestion when possible.
Pagination Designed for Agents
Humans use page numbers. Agents use cursors.
// Bad for agents: offset pagination
GET /api/users?limit=100&offset=1000
// Good for agents: cursor-based
GET /api/users?limit=100&after=cursor_abc123
// Response
{
"data": [...],
"pagination": {
"cursor": "cursor_def456",
"hasMore": true,
"totalCount": 50000
}
}
Cursor pagination is stable even when data changes. If records are deleted between pages, offset-based pagination skips or duplicates rows.
app.get('/api/users', async (req, res) => {
const limit = Math.min(parseInt(req.query.limit) || 100, 1000);
const after = req.query.after || null;
let query = db.users.orderBy('id');
if (after) {
query = query.where('id', '>', after);
}
const users = await query.limit(limit + 1);
const hasMore = users.length > limit;
const data = users.slice(0, limit);
const nextCursor = data.length > 0 ? data[data.length - 1].id : null;
res.json({
data,
pagination: {
cursor: nextCursor,
hasMore,
totalCount: await db.users.count()
}
});
});
Webhook Patterns for Long-Running Operations
Agents can't long-poll. They need webhooks for status updates.
// Agent initiates async operation
POST /api/jobs
{
"type": "process_dataset",
"datasetId": "ds_123",
"webhookUrl": "https://agent.example.com/webhooks/job-complete"
}
// Response: job created
{
"jobId": "job_abc123",
"status": "queued",
"createdAt": "2026-03-18T10:00:00Z"
}
// Later: system calls webhook
POST https://agent.example.com/webhooks/job-complete
{
"jobId": "job_abc123",
"status": "completed",
"result": {...},
"completedAt": "2026-03-18T10:05:00Z"
}
Agents should also be able to poll for status:
GET /api/jobs/job_abc123
{
"jobId": "job_abc123",
"status": "completed",
"progress": { "current": 100, "total": 100 },
"result": {...}
}
OpenAPI Spec as the Agent's Documentation
Agents read code from OpenAPI specs. Your OpenAPI spec must be complete and correct.
openapi: 3.0.0
info:
title: Agent API
version: 1.0.0
paths:
/api/process:
post:
operationId: processRequest
summary: Process a request asynchronously
description: |
Processes a request in the background.
Requires an Idempotency-Key header.
parameters:
- name: Idempotency-Key
in: header
required: true
schema:
type: string
format: uuid
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
data:
type: string
required: [data]
responses:
'200':
description: Request accepted
content:
application/json:
schema:
$ref: '#/components/schemas/ProcessResult'
'429':
description: Rate limit exceeded
content:
application/json:
schema:
$ref: '#/components/schemas/RateLimitError'
components:
schemas:
ProcessResult:
type: object
properties:
id:
type: string
status:
type: string
enum: [queued, processing, completed, failed]
required: [id, status]
RateLimitError:
type: object
properties:
error:
type: object
properties:
code:
type: string
enum: [RATE_LIMIT_EXCEEDED]
retryAfter:
type: integer
Use operationId consistently. Agents need it to find the endpoint they want.
Rate Limiting Agents: Token Bucket, Cost-Aware
Human users trigger one request. An agent might trigger 10,000. Rate limiting must account for cost, not just count.
class CostAwareRateLimiter {
async checkLimit(agentId, tokens) {
const bucket = await redis.get(`bucket:${agentId}`);
const now = Date.now();
let { capacity, refillRate, lastRefill } = bucket || {
capacity: 1000000,
refillRate: 100000, // tokens per minute
lastRefill: now
};
// Refill tokens based on time elapsed
const elapsed = (now - lastRefill) / 60000; // minutes
capacity = Math.min(
capacity + (refillRate * elapsed),
1000000
);
if (capacity < tokens) {
return { allowed: false, retryAfter: Math.ceil((tokens - capacity) / refillRate) * 60 };
}
capacity -= tokens;
await redis.set(`bucket:${agentId}`, {
capacity,
refillRate,
lastRefill: now
});
return { allowed: true };
}
}
Rate limits should be transparent and generous for legitimate agents. Punish abusers, not normal traffic.
Versioning for Agent Stability
Agents depend on field names and types. Changing your API breaks them.
// Version in header
GET /api/users HTTP/1.1
API-Version: 2026-03-01
// Or in URL
GET /api/v2/users
// Or both: support multiple versions
app.get('/api/:version/users', async (req, res) => {
const version = req.params.version;
if (version === 'v1') {
// Old format: userId, firstName, lastName
return res.json({
data: users.map(u => ({
userId: u.id,
firstName: u.first_name,
lastName: u.last_name
}))
});
}
if (version === 'v2') {
// New format: id, name, email
return res.json({
data: users.map(u => ({
id: u.id,
name: `${u.first_name} ${u.last_name}`,
email: u.email
}))
});
}
});
Deprecate old versions slowly. Announce deprecation months in advance. Support v1 and v2 simultaneously for 6+ months.
Checklist
- Every state-changing endpoint requires
Idempotency-Keyheader - Error responses include
code,message,details, andrequestId - Pagination uses cursors, not offsets
- Offer both webhook and polling patterns for async operations
- OpenAPI spec is complete and up-to-date
- Rate limiting considers cost, not just request count
- API supports multiple versions simultaneously
- All enum values are documented in OpenAPI spec
- Bulk endpoints support 10K+ records per request
- Response shapes are deterministic (no optional fields without warning)
Conclusion
APIs for AI agents are contracts with code, not humans. They demand precision, determinism, and completeness. Idempotency keys, structured errors, cursor-based pagination, and versioning are non-negotiable. Build for agents, and humans will benefit from your precision too.