Together AI — Run Open Source LLMs in Cloud
Advertisement
Introduction
Together AI provides cloud infrastructure for running open-source LLMs at scale. This guide covers setup and integration for production deployments.
- Getting Started
- Text Completion
- Chat Completion
- Streaming
- Integration with LangChain
- Batch Processing
- Available Models
- Fine-tuning
- Embeddings
- Comparison
- Deployment Example
- Conclusion
- FAQ
Getting Started
pip install together
from together import Together
client = Together(api_key="your-api-key")
# List available models
response = client.models.list()
for model in response:
print(model)
Text Completion
response = client.completions.create(
model="togethercomputer/llama-2-7b-chat",
prompt="Explain machine learning in one sentence",
max_tokens=256,
temperature=0.7
)
print(response.output["choices"][0]["text"])
Chat Completion
response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is quantum computing?"}
],
max_tokens=256
)
print(response.choices[0].message.content)
Streaming
stream = client.completions.create(
model="meta-llama/Llama-2-70b-chat-hf",
prompt="Write a short story",
max_tokens=512,
stream=True
)
for chunk in stream:
print(chunk.output["choices"][0]["text"], end="", flush=True)
Integration with LangChain
from langchain_community.llms import Together
llm = Together(
model="mistralai/Mistral-7B-Instruct-v0.1",
together_api_key="your-key"
)
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("Explain {topic}")
chain = prompt | llm
result = chain.invoke({"topic": "neural networks"})
print(result)
Batch Processing
# Process multiple requests efficiently
prompts = [
"What is AI?",
"Explain machine learning",
"What is deep learning?"
]
responses = []
for prompt in prompts:
response = client.completions.create(
model="meta-llama/Llama-2-7b-chat-hf",
prompt=prompt,
max_tokens=256
)
responses.append(response.output["choices"][0]["text"])
for prompt, response in zip(prompts, responses):
print(f"Q: {prompt}")
print(f"A: {response}\n")
Available Models
Together AI hosts 100+ open-source models including:
- Llama 2 (7B, 13B, 70B)
- Mistral variants
- Code Llama
- And many more
Fine-tuning
# Together AI supports fine-tuning
# Upload training data
training_file = client.files.create(
file=open("training_data.jsonl", "rb")
)
# Create fine-tuning job
fine_tune = client.fine_tuning.create(
model="meta-llama/Llama-2-7b-hf",
training_file=training_file.id
)
# Monitor progress
print(fine_tune.id)
Embeddings
# Generate embeddings
embeddings = client.embeddings.create(
model="BAAI/bge-base-en-v1.5",
input=["Hello world", "How are you?"]
)
for text, embedding in zip(["Hello world", "How are you?"], embeddings.data):
print(f"{text}: {embedding.embedding[:5]}...")
Comparison
Provider | Models | Cost | Speed
Together | 100+ (open) | Low | Good
Groq | 3-5 (cloud) | Moderate | Best
OpenAI | API-only | High | Good
Local (GPU) | Any model | Hardware | Variable
Deployment Example
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
client = Together()
class GenerateRequest(BaseModel):
prompt: str
max_tokens: int = 256
@app.post("/generate")
async def generate(request: GenerateRequest):
try:
response = client.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
prompt=request.prompt,
max_tokens=request.max_tokens
)
return {"response": response.output["choices"][0]["text"]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# Run: uvicorn app:app
Conclusion
Together AI democratizes access to open-source LLM infrastructure. Perfect for teams wanting control and flexibility without managing hardware.
FAQ
Q: How does Together AI pricing compare? A: Generally cheaper than OpenAI for the same quality open-source models.
Q: Can I fine-tune models on Together? A: Yes, they support fine-tuning with your data.
Q: What models are available? A: 100+ open-source models including Llama, Mistral, and specialized variants.
Advertisement