Published on

OpenAI Responses API — The New Standard for Stateful AI Interactions

Authors

Introduction

OpenAI''s Responses API replaces Chat Completions with built-in conversation state, tool management, and resumable responses. Unlike Chat Completions where you manage message history manually, Responses API handles state automatically. This post covers the migration path, key differences, and production patterns for stateful AI interactions.

Responses API vs Chat Completions API

The Chat Completions API requires you to manage state:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Chat Completions: you manage history
const conversationHistory: { role: 'user' | 'assistant'; content: string }[] = [];

async function chatWithCompletions(userMessage: string): Promise<string> {
  conversationHistory.push({
    role: 'user',
    content: userMessage,
  });

  const response = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: conversationHistory, // Must pass entire history
  });

  const assistantMessage =
    response.choices[0].message.content || '';

  conversationHistory.push({
    role: 'assistant',
    content: assistantMessage,
  });

  return assistantMessage;
}

The Responses API manages state for you:

// Responses API: stateful by default
async function chatWithResponses(
  threadId: string,
  userMessage: string
): Promise<string> {
  // Create conversation thread
  if (!threadId) {
    const thread = await client.threads.create();
    threadId = thread.id;
  }

  // Add user message
  await client.threads.messages.create(threadId, {
    role: 'user',
    content: userMessage,
  });

  // Run conversation
  const run = await client.threads.runs.create(threadId, {
    assistant_id: 'asst_xyz', // Pre-configured assistant
  });

  // Poll until complete
  let response = await client.threads.runs.retrieve(threadId, run.id);

  while (response.status === 'in_progress' || response.status === 'queued') {
    await new Promise((resolve) => setTimeout(resolve, 1000));
    response = await client.threads.runs.retrieve(threadId, run.id);
  }

  // Get assistant''s response
  const messages = await client.threads.messages.list(threadId);
  const lastMessage = messages.data[0];

  return lastMessage.content[0].type === 'text' ? lastMessage.content[0].text : '';
}

Key differences:

AspectChat CompletionsResponses API
State managementManual (you manage history)Automatic (thread persists)
Tool callingVia tools parameter each requestPre-configured assistant tools
Conversation resumptionNot supportedBuilt-in via thread ID
File contextPass as textAttach files to thread
Web searchNot availableBuilt-in capability

Built-in Tools and Capabilities

Responses API includes tools natively:

// Create assistant with web search capability
const assistant = await client.beta.assistants.create({
  name: 'Research Assistant',
  model: 'gpt-4o',
  description: 'Researches topics and answers questions',
  tools: [
    {
      type: 'web_search', // Built-in web search
    },
    {
      type: 'code_interpreter', // Built-in code execution
    },
    {
      type: 'function', // Custom tools
      function: {
        name: 'get_stock_price',
        description: 'Get current stock price',
        parameters: {
          type: 'object',
          properties: {
            symbol: {
              type: 'string',
              description: 'Stock ticker symbol (e.g., AAPL)',
            },
          },
          required: ['symbol'],
        },
      },
    },
  ],
});

// Define function handler
async function handleToolCall(
  toolName: string,
  toolInput: Record<string, unknown>
): Promise<string> {
  if (toolName === 'get_stock_price') {
    const symbol = toolInput.symbol as string;
    const price = await fetchStockPrice(symbol);
    return JSON.stringify({ symbol, price });
  }

  throw new Error(`Unknown tool: ${toolName}`);
}

// Run with tool calling
async function runAssistantWithTools(
  threadId: string,
  userMessage: string
): Promise<string> {
  await client.beta.threads.messages.create(threadId, {
    role: 'user',
    content: userMessage,
  });

  let run = await client.beta.threads.runs.create(threadId, {
    assistant_id: assistant.id,
  });

  // Handle tool calls in a loop
  while (
    run.status === 'in_progress' ||
    run.status === 'queued' ||
    run.status === 'requires_action'
  ) {
    if (run.status === 'requires_action') {
      const toolCalls =
        run.required_action?.submit_tool_outputs?.tool_calls || [];

      const toolResults = [];
      for (const toolCall of toolCalls) {
        const result = await handleToolCall(
          toolCall.function.name,
          JSON.parse(toolCall.function.arguments)
        );
        toolResults.push({
          tool_call_id: toolCall.id,
          output: result,
        });
      }

      // Submit tool results
      run = await client.beta.threads.runs.submitToolOutputs(
        threadId,
        run.id,
        {
          tool_outputs: toolResults,
        }
      );
    } else {
      await new Promise((resolve) => setTimeout(resolve, 1000));
      run = await client.beta.threads.runs.retrieve(threadId, run.id);
    }
  }

  // Get final message
  const messages = await client.beta.threads.messages.list(threadId);
  const lastMessage = messages.data[0];

  return lastMessage.content[0].type === 'text'
    ? lastMessage.content[0].text
    : '';
}

Streaming Responses

Stream response tokens as they arrive:

async function streamResponseTokens(
  threadId: string,
  userMessage: string,
  onToken: (token: string) => void
): Promise<void> {
  await client.beta.threads.messages.create(threadId, {
    role: 'user',
    content: userMessage,
  });

  // Create stream
  const stream = await client.beta.threads.runs.stream(threadId, {
    assistant_id: assistant.id,
  });

  // Handle stream events
  stream.on('message_delta', (event) => {
    if (
      event.delta.content &&
      event.delta.content[0].type === 'text_delta'
    ) {
      onToken(event.delta.content[0].text);
    }
  });

  await stream.finalMessage();
}

// Usage with streaming to client
app.post('/chat/stream', async (req, res) => {
  const { threadId, message } = req.body;

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');

  try {
    await streamResponseTokens(threadId, message, (token) => {
      res.write(`data: ${JSON.stringify({ token })}\n\n`);
    });

    res.write('data: [DONE]\n\n');
    res.end();
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: (error as Error).message })}\n\n`);
    res.end();
  }
});

Response Resumption After Interruption

Resume interrupted conversations:

interface ConversationState {
  threadId: string;
  runId?: string;
  status: 'active' | 'interrupted' | 'completed';
  lastMessageId?: string;
  createdAt: Date;
  resumedAt?: Date;
}

const conversationStates = new Map<string, ConversationState>();

async function startConversation(
  userId: string
): Promise<{ threadId: string; conversationId: string }> {
  // Check for existing active conversation
  let existingState = Array.from(conversationStates.values()).find(
    (s) => s.status === 'active'
  );

  if (!existingState) {
    // Create new thread
    const thread = await client.beta.threads.create();
    existingState = {
      threadId: thread.id,
      status: 'active',
      createdAt: new Date(),
    };

    const conversationId = crypto.randomUUID();
    conversationStates.set(conversationId, existingState);
    return { threadId: thread.id, conversationId };
  }

  return {
    threadId: existingState.threadId,
    conversationId: Array.from(conversationStates.entries()).find(
      ([, state]) => state === existingState
    )![0],
  };
}

async function interruptConversation(
  conversationId: string
): Promise<void> {
  const state = conversationStates.get(conversationId);
  if (!state) return;

  state.status = 'interrupted';

  // Cancel any in-progress run
  if (state.runId) {
    await client.beta.threads.runs.cancel(state.threadId, state.runId);
  }
}

async function resumeConversation(
  conversationId: string,
  userMessage: string
): Promise<string> {
  const state = conversationStates.get(conversationId);
  if (!state) {
    throw new Error('Conversation not found');
  }

  state.status = 'active';
  state.resumedAt = new Date();

  // Continue conversation from where it was interrupted
  return await runAssistantWithTools(state.threadId, userMessage);
}

Response IDs for Conversation Threading

Track individual responses:

interface ThreadedResponse {
  responseId: string;
  threadId: string;
  parentResponseId?: string;
  content: string;
  createdAt: Date;
  metadata?: Record<string, unknown>;
}

const responses = new Map<string, ThreadedResponse>();

async function getMessageWithThreading(
  threadId: string,
  messageId: string
): Promise<ThreadedResponse | null> {
  const message = await client.beta.threads.messages.retrieve(
    threadId,
    messageId
  );

  if (message.role !== 'assistant') {
    return null;
  }

  const responseId = `resp_${messageId}`;

  const threadedResponse: ThreadedResponse = {
    responseId,
    threadId,
    content:
      message.content[0].type === 'text' ? message.content[0].text : '',
    createdAt: new Date(message.created_at * 1000),
  };

  responses.set(responseId, threadedResponse);
  return threadedResponse;
}

async function getBranchHistory(
  threadId: string,
  responseId: string
): Promise<ThreadedResponse[]> {
  const messages = await client.beta.threads.messages.list(threadId);

  // Build response chain
  const chain: ThreadedResponse[] = [];
  for (const msg of messages.data) {
    if (msg.role === 'assistant') {
      const resp = await getMessageWithThreading(threadId, msg.id);
      if (resp) {
        chain.push(resp);
      }
    }
  }

  return chain;
}

Migrating from Chat Completions to Responses API

// Step 1: Create assistant from system prompt
async function createAssistantFromPrompt(systemPrompt: string) {
  return await client.beta.assistants.create({
    name: 'Migrated Assistant',
    model: 'gpt-4o',
    instructions: systemPrompt,
  });
}

// Step 2: Migrate existing conversation
async function migrateConversation(
  oldMessages: { role: 'user' | 'assistant'; content: string }[],
  assistantId: string
): Promise<string> {
  // Create new thread
  const thread = await client.beta.threads.create();

  // Re-add all previous messages
  for (const msg of oldMessages) {
    await client.beta.threads.messages.create(thread.id, {
      role: msg.role,
      content: msg.content,
    });
  }

  return thread.id;
}

// Step 3: Update API calls
async function chatComparison() {
  // Old way
  const oldResponse = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello' }],
  });

  // New way
  const assistant = await client.beta.assistants.create({
    model: 'gpt-4o',
    instructions: 'You are a helpful assistant.',
  });

  const thread = await client.beta.threads.create();
  await client.beta.threads.messages.create(thread.id, {
    role: 'user',
    content: 'Hello',
  });

  const run = await client.beta.threads.runs.create(thread.id, {
    assistant_id: assistant.id,
  });

  let finalRun = run;
  while (finalRun.status === 'in_progress' || finalRun.status === 'queued') {
    await new Promise((resolve) => setTimeout(resolve, 1000));
    finalRun = await client.beta.threads.runs.retrieve(thread.id, run.id);
  }

  const messages = await client.beta.threads.messages.list(thread.id);
  const newResponse = messages.data[0];
}

Cost and Rate Limit Differences

Track costs for Responses API:

interface ResponsesCost {
  threadId: string;
  assistantId: string;
  inputTokens: number;
  outputTokens: number;
  cost: number;
  createdAt: Date;
}

const costLog: ResponsesCost[] = [];

async function trackResponsesCost(
  threadId: string,
  assistantId: string,
  inputTokens: number,
  outputTokens: number
): Promise<void> {
  const modelPricing: Record<string, { input: number; output: number }> = {
    'gpt-4o': { input: 0.005, output: 0.015 },
    'gpt-4-turbo': { input: 0.01, output: 0.03 },
  };

  // Actual cost calculation would use run metadata
  const pricing = modelPricing['gpt-4o'];
  const cost =
    (inputTokens * pricing.input + outputTokens * pricing.output) / 1000;

  costLog.push({
    threadId,
    assistantId,
    inputTokens,
    outputTokens,
    cost,
    createdAt: new Date(),
  });
}

// Rate limit headers
async function checkRateLimits(response: any) {
  const rateLimitInfo = {
    requestsPerMinute: response.headers['x-ratelimit-limit-requests'],
    tokensPerMinute: response.headers['x-ratelimit-limit-tokens'],
    requestsRemaining: response.headers['x-ratelimit-remaining-requests'],
    tokensRemaining: response.headers['x-ratelimit-remaining-tokens'],
  };

  return rateLimitInfo;
}

Production Deployment

Deploy Responses API at scale:

import express, { Express } from 'express';
import { v4 as uuidv4 } from 'uuid';

const app: Express = express();
app.use(express.json());

interface SessionData {
  sessionId: string;
  threadId: string;
  assistantId: string;
  userId: string;
  createdAt: Date;
  lastActivityAt: Date;
}

const sessions = new Map<string, SessionData>();

// Initialize session
app.post('/sessions', async (req, res) => {
  const { userId } = req.body;

  const thread = await client.beta.threads.create();
  const sessionId = uuidv4();

  sessions.set(sessionId, {
    sessionId,
    threadId: thread.id,
    assistantId: process.env.OPENAI_ASSISTANT_ID!,
    userId,
    createdAt: new Date(),
    lastActivityAt: new Date(),
  });

  res.json({ sessionId, threadId: thread.id });
});

// Send message
app.post('/sessions/:sessionId/messages', async (req, res) => {
  const { sessionId } = req.params;
  const { message } = req.body;

  const session = sessions.get(sessionId);
  if (!session) {
    return res.status(404).json({ error: 'Session not found' });
  }

  try {
    const response = await runAssistantWithTools(
      session.threadId,
      message
    );

    session.lastActivityAt = new Date();

    res.json({ message: response, sessionId });
  } catch (error) {
    res.status(500).json({ error: (error as Error).message });
  }
});

// Get conversation history
app.get('/sessions/:sessionId/history', async (req, res) => {
  const { sessionId } = req.params;

  const session = sessions.get(sessionId);
  if (!session) {
    return res.status(404).json({ error: 'Session not found' });
  }

  const messages = await client.beta.threads.messages.list(
    session.threadId
  );

  res.json({
    sessionId,
    messages: messages.data.map((m) => ({
      id: m.id,
      role: m.role,
      content:
        m.content[0].type === 'text' ? m.content[0].text : '[file]',
      createdAt: new Date(m.created_at * 1000),
    })),
  });
});

app.listen(3000, () => {
  console.log('Responses API server running on port 3000');
});

Checklist

  • Understand Chat Completions vs Responses API differences
  • Create assistants with pre-configured tools
  • Implement streaming responses to clients
  • Handle tool calling in multi-step loops
  • Manage conversation state with threads
  • Support response resumption after interruptions
  • Track costs and rate limits
  • Deploy session management at scale

Conclusion

OpenAI''s Responses API shifts state management from the client to the service. By using threads, assistants, and built-in tools, you simplify agent implementation and scale more reliably. Start by migrating simple Chat Completions workflows to Responses API, then add tool calling, streaming, and persistence. As your system grows, Responses API''s stateful architecture makes long-lived conversations manageable.