Together AI — Run Open Source LLMs in Cloud

Introduction

Together AI provides cloud infrastructure for running open-source LLMs at scale. This guide covers setup and integration for production deployments.

Getting Started
Text Completion
Chat Completion
Streaming
Integration with LangChain
Batch Processing
Available Models
Fine-tuning
Embeddings
Comparison
Deployment Example
Conclusion
FAQ

Getting Started

pip install together

from together import Together

client = Together(api_key="your-api-key")

# List available models
response = client.models.list()
for model in response:
    print(model)

Text Completion

response = client.completions.create(
    model="togethercomputer/llama-2-7b-chat",
    prompt="Explain machine learning in one sentence",
    max_tokens=256,
    temperature=0.7
)

print(response.output["choices"][0]["text"])

Chat Completion

response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is quantum computing?"}
    ],
    max_tokens=256
)

print(response.choices[0].message.content)

Streaming

stream = client.completions.create(
    model="meta-llama/Llama-2-70b-chat-hf",
    prompt="Write a short story",
    max_tokens=512,
    stream=True
)

for chunk in stream:
    print(chunk.output["choices"][0]["text"], end="", flush=True)

Integration with LangChain

from langchain_community.llms import Together

llm = Together(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    together_api_key="your-key"
)

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Explain {topic}")
chain = prompt | llm

result = chain.invoke({"topic": "neural networks"})
print(result)

Batch Processing

# Process multiple requests efficiently
prompts = [
    "What is AI?",
    "Explain machine learning",
    "What is deep learning?"
]

responses = []
for prompt in prompts:
    response = client.completions.create(
        model="meta-llama/Llama-2-7b-chat-hf",
        prompt=prompt,
        max_tokens=256
    )
    responses.append(response.output["choices"][0]["text"])

for prompt, response in zip(prompts, responses):
    print(f"Q: {prompt}")
    print(f"A: {response}\n")

Available Models

Together AI hosts 100+ open-source models including:

Llama 2 (7B, 13B, 70B)
Mistral variants
Code Llama
And many more

Fine-tuning

# Together AI supports fine-tuning
# Upload training data
training_file = client.files.create(
    file=open("training_data.jsonl", "rb")
)

# Create fine-tuning job
fine_tune = client.fine_tuning.create(
    model="meta-llama/Llama-2-7b-hf",
    training_file=training_file.id
)

# Monitor progress
print(fine_tune.id)

Embeddings

# Generate embeddings
embeddings = client.embeddings.create(
    model="BAAI/bge-base-en-v1.5",
    input=["Hello world", "How are you?"]
)

for text, embedding in zip(["Hello world", "How are you?"], embeddings.data):
    print(f"{text}: {embedding.embedding[:5]}...")

Comparison

Provider      | Models      | Cost  | Speed
Together      | 100+ (open) | Low   | Good
Groq          | 3-5 (cloud) | Moderate | Best
OpenAI        | API-only    | High  | Good
Local (GPU)   | Any model   | Hardware | Variable

Deployment Example

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()
client = Together()

class GenerateRequest(BaseModel):
    prompt: str
    max_tokens: int = 256

@app.post("/generate")
async def generate(request: GenerateRequest):
    try:
        response = client.completions.create(
            model="mistralai/Mistral-7B-Instruct-v0.1",
            prompt=request.prompt,
            max_tokens=request.max_tokens
        )
        return {"response": response.output["choices"][0]["text"]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# Run: uvicorn app:app

Conclusion

Together AI democratizes access to open-source LLM infrastructure. Perfect for teams wanting control and flexibility without managing hardware.

FAQ

Q: How does Together AI pricing compare? A: Generally cheaper than OpenAI for the same quality open-source models.

Q: Can I fine-tune models on Together? A: Yes, they support fine-tuning with your data.

Q: What models are available? A: 100+ open-source models including Llama, Mistral, and specialized variants.