OpenAI Assistants API — Build AI Agents

Sanjeev SharmaSanjeev Sharma
7 min read

Advertisement

Introduction

The OpenAI Assistants API enables building AI agents that can interact with external tools, maintain state across conversations, and perform complex workflows. Unlike basic chat completions, Assistants handle conversation state management, tool integration, and multi-step processes automatically. This guide covers building production-ready AI agents using the Assistants API.

Understanding Assistants vs Chat Completions

The Assistants API handles state and tool calling differently than basic chat completions:

Chat Completions API: You manage conversation history, tool execution, and retry logic manually. Simpler but requires more code.

Assistants API: Manages conversation threads, tool calling, and retry logic automatically. More complex setup but cleaner for multi-turn interactions.

Use Assistants API when building:

  • Stateful conversational agents
  • Tools that call external APIs or functions
  • Applications requiring persistent conversation history
  • Complex workflows with multiple steps

Use Chat Completions API when building:

  • Simpler applications where stateless is acceptable
  • High-volume applications where managed state adds latency
  • Simple code generation or analysis tasks

Creating Your First Assistant

An Assistant requires:

  1. A system prompt (instructions for behavior)
  2. Optional tools (functions the assistant can call)
  3. Optional knowledge files (documents to reference)
from openai import OpenAI

client = OpenAI(api_key="your-api-key")

# Step 1: Create an Assistant
assistant = client.beta.assistants.create(
    name="Code Review Assistant",
    description="Analyzes code for quality and security issues",
    instructions="""You are an expert code reviewer. Your job is to:
1. Identify security vulnerabilities
2. Suggest performance improvements
3. Point out violations of best practices
4. Recommend refactoring opportunities

Be constructive and explain your reasoning.""",
    model="gpt-4o",
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_code_metrics",
                "description": "Get complexity metrics for code",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "code": {
                            "type": "string",
                            "description": "The code to analyze"
                        }
                    },
                    "required": ["code"]
                }
            }
        }
    ]
)

print(f"Assistant created: {assistant.id}")

Managing Conversations with Threads

Threads maintain conversation history automatically. Instead of managing messages yourself, you create a thread and add messages to it:

from openai import OpenAI

client = OpenAI()

# Create or retrieve an assistant
assistant_id = "asst_abc123"

# Create a new thread for each conversation
thread = client.beta.threads.create()

# Add a message to the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Review this code: def add(a, b): return a+b"
)

# Run the assistant on the thread
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant_id
)

# Wait for completion
import time

while run.status != "completed":
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )
    if run.status == "failed":
        print(f"Run failed: {run.last_error}")
        break
    time.sleep(1)

# Get messages from the thread
messages = client.beta.threads.messages.list(thread_id=thread.id)

for msg in messages.data:
    if msg.role == "assistant":
        print(f"Assistant: {msg.content[0].text}")

Implementing Tool Calling

When an Assistant needs to call a tool, it returns a tool_calls run status. You must execute the tool and submit the results:

import json

def get_weather(location: str) -> str:
    # Simulate weather API call
    weather_data = {
        "New York": "72°F, Sunny",
        "Los Angeles": "85°F, Clear",
        "London": "55°F, Rainy"
    }
    return weather_data.get(location, "Unknown location")

def handle_tool_calls(tool_calls):
    """Execute tools and return results"""
    results = []

    for tool_call in tool_calls:
        if tool_call.function.name == "get_weather":
            args = json.loads(tool_call.function.arguments)
            result = get_weather(args.get("location"))

            results.append({
                "tool_call_id": tool_call.id,
                "output": result
            })

    return results

def run_agent_with_tools(thread_id, assistant_id, user_message):
    """Run assistant and handle tool calls"""
    client = OpenAI()

    # Add user message
    client.beta.threads.messages.create(
        thread_id=thread_id,
        role="user",
        content=user_message
    )

    # Start run
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )

    # Handle tool calling loop
    while run.status in ["queued", "in_progress", "requires_action"]:
        if run.status == "requires_action":
            # Execute required tools
            tool_results = handle_tool_calls(
                run.required_action.submit_tool_outputs.tool_calls
            )

            # Submit tool results
            run = client.beta.threads.runs.submit_tool_outputs(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_results
            )
        else:
            # Wait for next status
            import time
            time.sleep(1)
            run = client.beta.threads.runs.retrieve(
                thread_id=thread_id,
                run_id=run.id
            )

    # Get final response
    messages = client.beta.threads.messages.list(thread_id=thread_id)
    return messages.data[0].content[0].text

# Usage
thread = client.beta.threads.create()
response = run_agent_with_tools(
    thread.id,
    "asst_abc123",
    "What's the weather in New York?"
)
print(response)

File Search and Knowledge Base

Assistants can search through files you provide:

from openai import OpenAI

client = OpenAI()

# Upload a file
file = client.files.create(
    file=open("documentation.pdf", "rb"),
    purpose="assistants"
)

# Create assistant with file search
assistant = client.beta.assistants.create(
    name="Documentation Assistant",
    model="gpt-4o",
    instructions="Answer questions based on the provided documentation.",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": ["vs_abc123"]
        }
    }
)

# Now the assistant can search documents when answering questions

Building a Multi-Step Workflow

Combine tools to create complex workflows:

def create_data_analysis_assistant():
    """Create assistant for multi-step data analysis"""
    client = OpenAI()

    return client.beta.assistants.create(
        name="Data Analysis Agent",
        model="gpt-4o",
        instructions="""You are a data analyst. When given a dataset:
1. Use load_dataset to retrieve the data
2. Use analyze_data to compute statistics
3. Use generate_report to create a summary
4. Provide insights and recommendations""",
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "load_dataset",
                    "description": "Load a dataset",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "file_path": {"type": "string"}
                        },
                        "required": ["file_path"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "analyze_data",
                    "description": "Analyze data and return statistics",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "data_id": {"type": "string"}
                        },
                        "required": ["data_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "generate_report",
                    "description": "Generate analysis report",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "analysis_results": {"type": "string"}
                        },
                        "required": ["analysis_results"]
                    }
                }
            }
        ]
    )

Error Handling and Retries

Production Assistants require robust error handling:

def run_with_retry(thread_id, assistant_id, max_retries=3):
    """Run assistant with retry logic"""
    client = OpenAI()

    for attempt in range(max_retries):
        try:
            run = client.beta.threads.runs.create(
                thread_id=thread_id,
                assistant_id=assistant_id
            )

            # Wait for completion with timeout
            import time
            start_time = time.time()
            timeout = 120  # 2 minutes

            while run.status != "completed":
                if time.time() - start_time > timeout:
                    raise TimeoutError("Assistant run timed out")

                if run.status == "failed":
                    raise Exception(f"Run failed: {run.last_error}")

                time.sleep(1)
                run = client.beta.threads.runs.retrieve(
                    thread_id=thread_id,
                    run_id=run.id
                )

            # Success
            messages = client.beta.threads.messages.list(
                thread_id=thread_id
            )
            return messages.data[0].content[0].text

        except Exception as e:
            if attempt < max_retries - 1:
                print(f"Attempt {attempt + 1} failed: {e}. Retrying...")
                time.sleep(2 ** attempt)  # Exponential backoff
            else:
                raise

Cost Considerations

Assistants API charges per token like the chat completion API, but adds costs for:

File storage: $0.20 per GB per day

Vector store usage: For file search, managed vector stores cost $0.10 per GB per day

API calls: Standard per-token charges apply

To optimize costs:

  • Delete unused files and vector stores
  • Reuse threads instead of creating new ones for related conversations
  • Use cheaper models when possible
  • Batch operations

Practical Examples

Customer Support Bot: Create an assistant trained on your documentation that answers customer questions and escalates to humans when needed.

Code Analysis Tool: Build an agent that analyzes repositories, identifies issues, and suggests improvements.

Research Assistant: Create an agent that searches papers, summarizes findings, and synthesizes conclusions.

Limitations

  • Tools are called one at a time (no parallel execution)
  • No streaming responses yet (responses only available after completion)
  • File search has a 20MB file size limit per file
  • Threads can grow large if not managed

Conclusion

The Assistants API enables building sophisticated AI agents that maintain state, call tools, and handle complex workflows. It's the right choice for production conversational applications where reliability and state management matter. Start simple with basic assistants, then add tools and complexity as your use case demands.

FAQ

Q: Should I use Assistants API or Chat Completions API? A: Use Assistants for stateful multi-turn conversations. Use Chat Completions for simpler, stateless interactions. Assistants handle complexity but add latency.

Q: How do I delete a thread to save costs? A: Call client.beta.threads.delete(thread_id=thread.id). Deleting frees resources but you lose conversation history.

Q: Can Assistants handle real-time interactions? A: Assistants aren't ideal for real-time (sub-second response) applications due to latency. Use Chat Completions API for that.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro