- Published on
OpenAI Responses API — The New Standard for Stateful AI Interactions
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
OpenAI''s Responses API replaces Chat Completions with built-in conversation state, tool management, and resumable responses. Unlike Chat Completions where you manage message history manually, Responses API handles state automatically. This post covers the migration path, key differences, and production patterns for stateful AI interactions.
- Responses API vs Chat Completions API
- Built-in Tools and Capabilities
- Streaming Responses
- Response Resumption After Interruption
- Response IDs for Conversation Threading
- Migrating from Chat Completions to Responses API
- Cost and Rate Limit Differences
- Production Deployment
- Checklist
- Conclusion
Responses API vs Chat Completions API
The Chat Completions API requires you to manage state:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Chat Completions: you manage history
const conversationHistory: { role: 'user' | 'assistant'; content: string }[] = [];
async function chatWithCompletions(userMessage: string): Promise<string> {
conversationHistory.push({
role: 'user',
content: userMessage,
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: conversationHistory, // Must pass entire history
});
const assistantMessage =
response.choices[0].message.content || '';
conversationHistory.push({
role: 'assistant',
content: assistantMessage,
});
return assistantMessage;
}
The Responses API manages state for you:
// Responses API: stateful by default
async function chatWithResponses(
threadId: string,
userMessage: string
): Promise<string> {
// Create conversation thread
if (!threadId) {
const thread = await client.threads.create();
threadId = thread.id;
}
// Add user message
await client.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
});
// Run conversation
const run = await client.threads.runs.create(threadId, {
assistant_id: 'asst_xyz', // Pre-configured assistant
});
// Poll until complete
let response = await client.threads.runs.retrieve(threadId, run.id);
while (response.status === 'in_progress' || response.status === 'queued') {
await new Promise((resolve) => setTimeout(resolve, 1000));
response = await client.threads.runs.retrieve(threadId, run.id);
}
// Get assistant''s response
const messages = await client.threads.messages.list(threadId);
const lastMessage = messages.data[0];
return lastMessage.content[0].type === 'text' ? lastMessage.content[0].text : '';
}
Key differences:
| Aspect | Chat Completions | Responses API |
|---|---|---|
| State management | Manual (you manage history) | Automatic (thread persists) |
| Tool calling | Via tools parameter each request | Pre-configured assistant tools |
| Conversation resumption | Not supported | Built-in via thread ID |
| File context | Pass as text | Attach files to thread |
| Web search | Not available | Built-in capability |
Built-in Tools and Capabilities
Responses API includes tools natively:
// Create assistant with web search capability
const assistant = await client.beta.assistants.create({
name: 'Research Assistant',
model: 'gpt-4o',
description: 'Researches topics and answers questions',
tools: [
{
type: 'web_search', // Built-in web search
},
{
type: 'code_interpreter', // Built-in code execution
},
{
type: 'function', // Custom tools
function: {
name: 'get_stock_price',
description: 'Get current stock price',
parameters: {
type: 'object',
properties: {
symbol: {
type: 'string',
description: 'Stock ticker symbol (e.g., AAPL)',
},
},
required: ['symbol'],
},
},
},
],
});
// Define function handler
async function handleToolCall(
toolName: string,
toolInput: Record<string, unknown>
): Promise<string> {
if (toolName === 'get_stock_price') {
const symbol = toolInput.symbol as string;
const price = await fetchStockPrice(symbol);
return JSON.stringify({ symbol, price });
}
throw new Error(`Unknown tool: ${toolName}`);
}
// Run with tool calling
async function runAssistantWithTools(
threadId: string,
userMessage: string
): Promise<string> {
await client.beta.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
});
let run = await client.beta.threads.runs.create(threadId, {
assistant_id: assistant.id,
});
// Handle tool calls in a loop
while (
run.status === 'in_progress' ||
run.status === 'queued' ||
run.status === 'requires_action'
) {
if (run.status === 'requires_action') {
const toolCalls =
run.required_action?.submit_tool_outputs?.tool_calls || [];
const toolResults = [];
for (const toolCall of toolCalls) {
const result = await handleToolCall(
toolCall.function.name,
JSON.parse(toolCall.function.arguments)
);
toolResults.push({
tool_call_id: toolCall.id,
output: result,
});
}
// Submit tool results
run = await client.beta.threads.runs.submitToolOutputs(
threadId,
run.id,
{
tool_outputs: toolResults,
}
);
} else {
await new Promise((resolve) => setTimeout(resolve, 1000));
run = await client.beta.threads.runs.retrieve(threadId, run.id);
}
}
// Get final message
const messages = await client.beta.threads.messages.list(threadId);
const lastMessage = messages.data[0];
return lastMessage.content[0].type === 'text'
? lastMessage.content[0].text
: '';
}
Streaming Responses
Stream response tokens as they arrive:
async function streamResponseTokens(
threadId: string,
userMessage: string,
onToken: (token: string) => void
): Promise<void> {
await client.beta.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
});
// Create stream
const stream = await client.beta.threads.runs.stream(threadId, {
assistant_id: assistant.id,
});
// Handle stream events
stream.on('message_delta', (event) => {
if (
event.delta.content &&
event.delta.content[0].type === 'text_delta'
) {
onToken(event.delta.content[0].text);
}
});
await stream.finalMessage();
}
// Usage with streaming to client
app.post('/chat/stream', async (req, res) => {
const { threadId, message } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
try {
await streamResponseTokens(threadId, message, (token) => {
res.write(`data: ${JSON.stringify({ token })}\n\n`);
});
res.write('data: [DONE]\n\n');
res.end();
} catch (error) {
res.write(`data: ${JSON.stringify({ error: (error as Error).message })}\n\n`);
res.end();
}
});
Response Resumption After Interruption
Resume interrupted conversations:
interface ConversationState {
threadId: string;
runId?: string;
status: 'active' | 'interrupted' | 'completed';
lastMessageId?: string;
createdAt: Date;
resumedAt?: Date;
}
const conversationStates = new Map<string, ConversationState>();
async function startConversation(
userId: string
): Promise<{ threadId: string; conversationId: string }> {
// Check for existing active conversation
let existingState = Array.from(conversationStates.values()).find(
(s) => s.status === 'active'
);
if (!existingState) {
// Create new thread
const thread = await client.beta.threads.create();
existingState = {
threadId: thread.id,
status: 'active',
createdAt: new Date(),
};
const conversationId = crypto.randomUUID();
conversationStates.set(conversationId, existingState);
return { threadId: thread.id, conversationId };
}
return {
threadId: existingState.threadId,
conversationId: Array.from(conversationStates.entries()).find(
([, state]) => state === existingState
)![0],
};
}
async function interruptConversation(
conversationId: string
): Promise<void> {
const state = conversationStates.get(conversationId);
if (!state) return;
state.status = 'interrupted';
// Cancel any in-progress run
if (state.runId) {
await client.beta.threads.runs.cancel(state.threadId, state.runId);
}
}
async function resumeConversation(
conversationId: string,
userMessage: string
): Promise<string> {
const state = conversationStates.get(conversationId);
if (!state) {
throw new Error('Conversation not found');
}
state.status = 'active';
state.resumedAt = new Date();
// Continue conversation from where it was interrupted
return await runAssistantWithTools(state.threadId, userMessage);
}
Response IDs for Conversation Threading
Track individual responses:
interface ThreadedResponse {
responseId: string;
threadId: string;
parentResponseId?: string;
content: string;
createdAt: Date;
metadata?: Record<string, unknown>;
}
const responses = new Map<string, ThreadedResponse>();
async function getMessageWithThreading(
threadId: string,
messageId: string
): Promise<ThreadedResponse | null> {
const message = await client.beta.threads.messages.retrieve(
threadId,
messageId
);
if (message.role !== 'assistant') {
return null;
}
const responseId = `resp_${messageId}`;
const threadedResponse: ThreadedResponse = {
responseId,
threadId,
content:
message.content[0].type === 'text' ? message.content[0].text : '',
createdAt: new Date(message.created_at * 1000),
};
responses.set(responseId, threadedResponse);
return threadedResponse;
}
async function getBranchHistory(
threadId: string,
responseId: string
): Promise<ThreadedResponse[]> {
const messages = await client.beta.threads.messages.list(threadId);
// Build response chain
const chain: ThreadedResponse[] = [];
for (const msg of messages.data) {
if (msg.role === 'assistant') {
const resp = await getMessageWithThreading(threadId, msg.id);
if (resp) {
chain.push(resp);
}
}
}
return chain;
}
Migrating from Chat Completions to Responses API
// Step 1: Create assistant from system prompt
async function createAssistantFromPrompt(systemPrompt: string) {
return await client.beta.assistants.create({
name: 'Migrated Assistant',
model: 'gpt-4o',
instructions: systemPrompt,
});
}
// Step 2: Migrate existing conversation
async function migrateConversation(
oldMessages: { role: 'user' | 'assistant'; content: string }[],
assistantId: string
): Promise<string> {
// Create new thread
const thread = await client.beta.threads.create();
// Re-add all previous messages
for (const msg of oldMessages) {
await client.beta.threads.messages.create(thread.id, {
role: msg.role,
content: msg.content,
});
}
return thread.id;
}
// Step 3: Update API calls
async function chatComparison() {
// Old way
const oldResponse = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
// New way
const assistant = await client.beta.assistants.create({
model: 'gpt-4o',
instructions: 'You are a helpful assistant.',
});
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: 'user',
content: 'Hello',
});
const run = await client.beta.threads.runs.create(thread.id, {
assistant_id: assistant.id,
});
let finalRun = run;
while (finalRun.status === 'in_progress' || finalRun.status === 'queued') {
await new Promise((resolve) => setTimeout(resolve, 1000));
finalRun = await client.beta.threads.runs.retrieve(thread.id, run.id);
}
const messages = await client.beta.threads.messages.list(thread.id);
const newResponse = messages.data[0];
}
Cost and Rate Limit Differences
Track costs for Responses API:
interface ResponsesCost {
threadId: string;
assistantId: string;
inputTokens: number;
outputTokens: number;
cost: number;
createdAt: Date;
}
const costLog: ResponsesCost[] = [];
async function trackResponsesCost(
threadId: string,
assistantId: string,
inputTokens: number,
outputTokens: number
): Promise<void> {
const modelPricing: Record<string, { input: number; output: number }> = {
'gpt-4o': { input: 0.005, output: 0.015 },
'gpt-4-turbo': { input: 0.01, output: 0.03 },
};
// Actual cost calculation would use run metadata
const pricing = modelPricing['gpt-4o'];
const cost =
(inputTokens * pricing.input + outputTokens * pricing.output) / 1000;
costLog.push({
threadId,
assistantId,
inputTokens,
outputTokens,
cost,
createdAt: new Date(),
});
}
// Rate limit headers
async function checkRateLimits(response: any) {
const rateLimitInfo = {
requestsPerMinute: response.headers['x-ratelimit-limit-requests'],
tokensPerMinute: response.headers['x-ratelimit-limit-tokens'],
requestsRemaining: response.headers['x-ratelimit-remaining-requests'],
tokensRemaining: response.headers['x-ratelimit-remaining-tokens'],
};
return rateLimitInfo;
}
Production Deployment
Deploy Responses API at scale:
import express, { Express } from 'express';
import { v4 as uuidv4 } from 'uuid';
const app: Express = express();
app.use(express.json());
interface SessionData {
sessionId: string;
threadId: string;
assistantId: string;
userId: string;
createdAt: Date;
lastActivityAt: Date;
}
const sessions = new Map<string, SessionData>();
// Initialize session
app.post('/sessions', async (req, res) => {
const { userId } = req.body;
const thread = await client.beta.threads.create();
const sessionId = uuidv4();
sessions.set(sessionId, {
sessionId,
threadId: thread.id,
assistantId: process.env.OPENAI_ASSISTANT_ID!,
userId,
createdAt: new Date(),
lastActivityAt: new Date(),
});
res.json({ sessionId, threadId: thread.id });
});
// Send message
app.post('/sessions/:sessionId/messages', async (req, res) => {
const { sessionId } = req.params;
const { message } = req.body;
const session = sessions.get(sessionId);
if (!session) {
return res.status(404).json({ error: 'Session not found' });
}
try {
const response = await runAssistantWithTools(
session.threadId,
message
);
session.lastActivityAt = new Date();
res.json({ message: response, sessionId });
} catch (error) {
res.status(500).json({ error: (error as Error).message });
}
});
// Get conversation history
app.get('/sessions/:sessionId/history', async (req, res) => {
const { sessionId } = req.params;
const session = sessions.get(sessionId);
if (!session) {
return res.status(404).json({ error: 'Session not found' });
}
const messages = await client.beta.threads.messages.list(
session.threadId
);
res.json({
sessionId,
messages: messages.data.map((m) => ({
id: m.id,
role: m.role,
content:
m.content[0].type === 'text' ? m.content[0].text : '[file]',
createdAt: new Date(m.created_at * 1000),
})),
});
});
app.listen(3000, () => {
console.log('Responses API server running on port 3000');
});
Checklist
- Understand Chat Completions vs Responses API differences
- Create assistants with pre-configured tools
- Implement streaming responses to clients
- Handle tool calling in multi-step loops
- Manage conversation state with threads
- Support response resumption after interruptions
- Track costs and rate limits
- Deploy session management at scale
Conclusion
OpenAI''s Responses API shifts state management from the client to the service. By using threads, assistants, and built-in tools, you simplify agent implementation and scale more reliably. Start by migrating simple Chat Completions workflows to Responses API, then add tool calling, streaming, and persistence. As your system grows, Responses API''s stateful architecture makes long-lived conversations manageable.