ChatGPT API Integration — Build Your First App

Sanjeev SharmaSanjeev Sharma
6 min read

Advertisement

Introduction

Building with the ChatGPT API opens up possibilities for creating intelligent applications, chatbots, content generators, and assistants. This guide walks through the fundamentals of ChatGPT API integration, from setup to your first production-ready application. By the end, you'll have a working example you can extend for your specific needs.

Setting Up the OpenAI API

First, you need an OpenAI account and API key. Visit platform.openai.com, create an account, and generate an API key from the account settings. Add billing information to enable API calls—you're charged per token used.

Install the OpenAI Python client:

pip install openai

Or for Node.js:

npm install openai

Store your API key securely in an environment variable, never in source code:

export OPENAI_API_KEY="your-api-key-here"

Your First ChatGPT API Call

Here's a minimal example that sends a message to ChatGPT and gets a response:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Explain quantum computing in simple terms"
        }
    ]
)

print(response.choices[0].message.content)

The core API call uses the chat.completions.create() method. You specify:

  • model: Which model to use (gpt-4o for the latest, gpt-3.5-turbo for cheaper)
  • messages: The conversation history with roles (system, user, assistant)
  • Optional parameters like temperature (randomness), max_tokens (length limit)

Understanding Message Roles

The messages parameter is an array where each message has a role:

  • system: Sets the context and behavior for the assistant
  • user: Messages from the end user
  • assistant: Previous responses from ChatGPT (used for multi-turn conversations)
# Example with system prompt for a customer service bot
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": """You are a helpful customer service representative.
Be friendly, professional, and concise.
If you cannot help, suggest contacting support."""
        },
        {
            "role": "user",
            "content": "I received a damaged product. What should I do?"
        }
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Building a Multi-Turn Conversation

Real applications need to maintain conversation history. Here's how to build a simple chatbot:

from openai import OpenAI

def chat_with_history():
    client = OpenAI()
    messages = []

    system_prompt = {
        "role": "system",
        "content": "You are a helpful Python programming assistant."
    }

    while True:
        user_input = input("You: ")

        if user_input.lower() in ["exit", "quit"]:
            break

        # Build message array with system prompt and all previous messages
        messages.append({"role": "user", "content": user_input})

        # Create API call with full conversation history
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[system_prompt] + messages,
            temperature=0.7
        )

        assistant_message = response.choices[0].message.content
        messages.append({"role": "assistant", "content": assistant_message})

        print(f"Assistant: {assistant_message}\n")

# Run the chatbot
if __name__ == "__main__":
    chat_with_history()

Handling API Errors

Production applications must handle errors gracefully:

from openai import OpenAI, RateLimitError, APIError
import time

def call_with_retry(messages, max_retries=3):
    client = OpenAI()

    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                temperature=0.7
            )
            return response
        except RateLimitError:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise
        except APIError as e:
            print(f"API error: {e}")
            raise

# Example usage
messages = [{"role": "user", "content": "Hello"}]
response = call_with_retry(messages)
print(response.choices[0].message.content)

Streaming Responses

For better user experience, stream responses word-by-word instead of waiting for the full response:

from openai import OpenAI

def stream_response(user_message):
    client = OpenAI()

    with client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_message}],
        stream=True
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
    print()  # New line after streaming completes

# Use it
stream_response("Write a short haiku about programming")

Managing Costs

API costs add up quickly. Strategies to minimize them:

Use GPT-3.5 Turbo for simpler tasks where quality is less critical—it costs about 90% less than GPT-4o.

Set max_tokens to prevent unexpectedly long responses that increase costs.

Cache frequently used system prompts using the OpenAI cache feature.

Monitor usage through the OpenAI dashboard to catch unexpected spikes.

# Example: Cost-aware API calls
def estimate_cost(model, input_tokens, output_tokens):
    pricing = {
        "gpt-4o": {"input": 0.005, "output": 0.015},
        "gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015}
    }

    rates = pricing.get(model, {"input": 0, "output": 0})
    cost = (input_tokens * rates["input"] +
            output_tokens * rates["output"]) / 1000

    return cost

# Check before making expensive calls
print(f"Estimated cost: ${estimate_cost('gpt-4o', 1000, 500):.4f}")

Building a Simple Web Application

Here's a basic Flask app that provides a ChatGPT interface:

from flask import Flask, request, jsonify
from openai import OpenAI

app = Flask(__name__)
client = OpenAI()

@app.route('/api/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message')

    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": user_message}
            ],
            max_tokens=500
        )

        return jsonify({
            "response": response.choices[0].message.content
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

Parameters That Matter

The main parameters that affect API behavior:

  • temperature (0-2): Lower values (0-0.5) produce deterministic, focused responses. Higher values (0.8-1.5) produce more creative, varied responses.
  • max_tokens: Maximum length of the response. Set this to avoid unexpectedly long outputs.
  • top_p (0-1): Alternative to temperature for controlling diversity. Use either temperature or top_p, not both.
  • frequency_penalty (0-2): Reduces repetition of common words.
  • presence_penalty (0-2): Encourages the model to discuss new topics.

Common Pitfalls

Including the entire conversation in each request isn't necessary—you can optimize by only including relevant recent messages.

Not validating user input opens security vulnerabilities. Sanitize user messages before using them.

Forgetting about token limits can cause requests to fail if context gets too long.

Hardcoding API keys in source code is a critical security error.

Conclusion

The ChatGPT API enables building intelligent applications quickly. Start simple with basic chat functionality, then extend with error handling, cost management, and specialized prompts. The API is powerful enough for production applications but requires thoughtful implementation around cost, error handling, and security.

FAQ

Q: Which model should I use for cost-sensitive applications? A: Start with GPT-3.5 Turbo for simple tasks. Use GPT-4o only when you need superior reasoning or code generation quality. Monitor usage and adjust based on results.

Q: How long should I keep conversation history? A: For cost and performance, keep only recent messages (last 10-20 depending on length). Too much history increases API costs and can cause context confusion.

Q: Can I use the ChatGPT API for real-time applications? A: Yes, especially with streaming enabled. For very high-traffic applications, consider rate limiting and caching to manage costs effectively.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro