ChatGPT API Integration — Build Your First App
Advertisement
Introduction
Building with the ChatGPT API opens up possibilities for creating intelligent applications, chatbots, content generators, and assistants. This guide walks through the fundamentals of ChatGPT API integration, from setup to your first production-ready application. By the end, you'll have a working example you can extend for your specific needs.
- Setting Up the OpenAI API
- Your First ChatGPT API Call
- Understanding Message Roles
- Building a Multi-Turn Conversation
- Handling API Errors
- Streaming Responses
- Managing Costs
- Building a Simple Web Application
- Parameters That Matter
- Common Pitfalls
- Conclusion
- FAQ
Setting Up the OpenAI API
First, you need an OpenAI account and API key. Visit platform.openai.com, create an account, and generate an API key from the account settings. Add billing information to enable API calls—you're charged per token used.
Install the OpenAI Python client:
pip install openai
Or for Node.js:
npm install openai
Store your API key securely in an environment variable, never in source code:
export OPENAI_API_KEY="your-api-key-here"
Your First ChatGPT API Call
Here's a minimal example that sends a message to ChatGPT and gets a response:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
]
)
print(response.choices[0].message.content)
The core API call uses the chat.completions.create() method. You specify:
model: Which model to use (gpt-4o for the latest, gpt-3.5-turbo for cheaper)messages: The conversation history with roles (system, user, assistant)- Optional parameters like
temperature(randomness),max_tokens(length limit)
Understanding Message Roles
The messages parameter is an array where each message has a role:
- system: Sets the context and behavior for the assistant
- user: Messages from the end user
- assistant: Previous responses from ChatGPT (used for multi-turn conversations)
# Example with system prompt for a customer service bot
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """You are a helpful customer service representative.
Be friendly, professional, and concise.
If you cannot help, suggest contacting support."""
},
{
"role": "user",
"content": "I received a damaged product. What should I do?"
}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
Building a Multi-Turn Conversation
Real applications need to maintain conversation history. Here's how to build a simple chatbot:
from openai import OpenAI
def chat_with_history():
client = OpenAI()
messages = []
system_prompt = {
"role": "system",
"content": "You are a helpful Python programming assistant."
}
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
break
# Build message array with system prompt and all previous messages
messages.append({"role": "user", "content": user_input})
# Create API call with full conversation history
response = client.chat.completions.create(
model="gpt-4o",
messages=[system_prompt] + messages,
temperature=0.7
)
assistant_message = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_message})
print(f"Assistant: {assistant_message}\n")
# Run the chatbot
if __name__ == "__main__":
chat_with_history()
Handling API Errors
Production applications must handle errors gracefully:
from openai import OpenAI, RateLimitError, APIError
import time
def call_with_retry(messages, max_retries=3):
client = OpenAI()
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.7
)
return response
except RateLimitError:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited. Waiting {wait_time} seconds...")
time.sleep(wait_time)
else:
raise
except APIError as e:
print(f"API error: {e}")
raise
# Example usage
messages = [{"role": "user", "content": "Hello"}]
response = call_with_retry(messages)
print(response.choices[0].message.content)
Streaming Responses
For better user experience, stream responses word-by-word instead of waiting for the full response:
from openai import OpenAI
def stream_response(user_message):
client = OpenAI()
with client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
stream=True
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # New line after streaming completes
# Use it
stream_response("Write a short haiku about programming")
Managing Costs
API costs add up quickly. Strategies to minimize them:
Use GPT-3.5 Turbo for simpler tasks where quality is less critical—it costs about 90% less than GPT-4o.
Set max_tokens to prevent unexpectedly long responses that increase costs.
Cache frequently used system prompts using the OpenAI cache feature.
Monitor usage through the OpenAI dashboard to catch unexpected spikes.
# Example: Cost-aware API calls
def estimate_cost(model, input_tokens, output_tokens):
pricing = {
"gpt-4o": {"input": 0.005, "output": 0.015},
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015}
}
rates = pricing.get(model, {"input": 0, "output": 0})
cost = (input_tokens * rates["input"] +
output_tokens * rates["output"]) / 1000
return cost
# Check before making expensive calls
print(f"Estimated cost: ${estimate_cost('gpt-4o', 1000, 500):.4f}")
Building a Simple Web Application
Here's a basic Flask app that provides a ChatGPT interface:
from flask import Flask, request, jsonify
from openai import OpenAI
app = Flask(__name__)
client = OpenAI()
@app.route('/api/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message')
if not user_message:
return jsonify({"error": "No message provided"}), 400
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": user_message}
],
max_tokens=500
)
return jsonify({
"response": response.choices[0].message.content
})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)
Parameters That Matter
The main parameters that affect API behavior:
- temperature (0-2): Lower values (0-0.5) produce deterministic, focused responses. Higher values (0.8-1.5) produce more creative, varied responses.
- max_tokens: Maximum length of the response. Set this to avoid unexpectedly long outputs.
- top_p (0-1): Alternative to temperature for controlling diversity. Use either temperature or top_p, not both.
- frequency_penalty (0-2): Reduces repetition of common words.
- presence_penalty (0-2): Encourages the model to discuss new topics.
Common Pitfalls
Including the entire conversation in each request isn't necessary—you can optimize by only including relevant recent messages.
Not validating user input opens security vulnerabilities. Sanitize user messages before using them.
Forgetting about token limits can cause requests to fail if context gets too long.
Hardcoding API keys in source code is a critical security error.
Conclusion
The ChatGPT API enables building intelligent applications quickly. Start simple with basic chat functionality, then extend with error handling, cost management, and specialized prompts. The API is powerful enough for production applications but requires thoughtful implementation around cost, error handling, and security.
FAQ
Q: Which model should I use for cost-sensitive applications? A: Start with GPT-3.5 Turbo for simple tasks. Use GPT-4o only when you need superior reasoning or code generation quality. Monitor usage and adjust based on results.
Q: How long should I keep conversation history? A: For cost and performance, keep only recent messages (last 10-20 depending on length). Too much history increases API costs and can cause context confusion.
Q: Can I use the ChatGPT API for real-time applications? A: Yes, especially with streaming enabled. For very high-traffic applications, consider rate limiting and caching to manage costs effectively.
Advertisement