DeepSeek — China AI Model Complete Guide
Advertisement
Introduction
DeepSeek offers competitive open-source models from a Chinese AI startup. Their models demonstrate that frontier-quality LLMs can come from outside traditional players. This guide covers setup and deployment.
- DeepSeek Model Variants
- Installation
- Chat Template
- Code Generation
- Quantization
- Integration with LangChain
- Performance Benchmarks
- DeepSeek vs Competitors
- Multi-turn Conversation
- Deployment
- Conclusion
- FAQ
DeepSeek Model Variants
DeepSeek-7B: Efficient baseline
DeepSeek-33B: Most popular
DeepSeek-67B: High quality
DeepSeek-Coder: Code-specialized variants
DeepSeek-MoE: Mixture of Experts version
Installation
pip install transformers torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "deepseek-ai/deepseek-7b-chat"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
Chat Template
def chat_deepseek(messages: list) -> str:
"""Chat with DeepSeek."""
# Format messages for DeepSeek chat template
formatted = ""
for msg in messages:
if msg["role"] == "user":
formatted += f"<|user|>{msg['content']}<|end|>\n"
elif msg["role"] == "assistant":
formatted += f"<|assistant|>{msg['content']}<|end|>\n"
formatted += "<|assistant|>"
inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<|assistant|>")[-1]
# Usage
messages = [{"role": "user", "content": "What is AI?"}]
print(chat_deepseek(messages))
Code Generation
def generate_code(prompt: str) -> str:
"""Generate code with DeepSeek-Coder."""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3 # Lower for consistent code
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example
code_prompt = "Write Python function to calculate fibonacci"
code = generate_code(code_prompt)
print(code)
Quantization
from transformers import BitsAndBytesConfig
# 8-bit
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/deepseek-33b-chat",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
Integration with LangChain
from langchain.llms import HuggingFacePipeline
from langchain_core.prompts import ChatPromptTemplate
llm = HuggingFacePipeline(
model_id="deepseek-ai/deepseek-7b-chat"
)
prompt = ChatPromptTemplate.from_template(
"Explain {topic} to a {audience}"
)
chain = prompt | llm
result = chain.invoke({
"topic": "quantum computing",
"audience": "beginner"
})
Performance Benchmarks
import time
models = [
"deepseek-ai/deepseek-7b-chat",
"deepseek-ai/deepseek-33b-chat",
]
for model_id in models:
print(f"\n{model_id}")
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
prompt = "Explain machine learning"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
start = time.time()
outputs = model.generate(**inputs, max_new_tokens=100)
elapsed = time.time() - start
print(f"Time: {elapsed:.2f}s")
DeepSeek vs Competitors
Model | Size | Capability | Source
DeepSeek-33B | 33B | Very High | China
Mistral-7B | 7B | Excellent | France
Llama2-70B | 70B | Excellent | US
Phi-3-small | 7B | Good | US
Multi-turn Conversation
class DeepSeekChat:
def __init__(self):
self.history = []
def chat(self, user_message: str) -> str:
self.history.append({
"role": "user",
"content": user_message
})
# Build formatted message
formatted = ""
for msg in self.history:
if msg["role"] == "user":
formatted += f"<|user|>{msg['content']}<|end|>\n"
else:
formatted += f"<|assistant|>{msg['content']}<|end|>\n"
formatted += "<|assistant|>"
inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
assistant_response = response.split("<|assistant|>")[-1]
self.history.append({
"role": "assistant",
"content": assistant_response
})
return assistant_response
# Usage
bot = DeepSeekChat()
print(bot.chat("What is AI?"))
print(bot.chat("Give me an example"))
Deployment
# Docker deployment
cat > Dockerfile << 'EOF'
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
WORKDIR /app
RUN apt-get update && apt-get install -y python3.10 python3-pip
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
CMD ["python3", "app.py"]
EOF
Conclusion
DeepSeek demonstrates global capability in LLM development. Their models are competitive, accessible, and excellent for applications prioritizing cost and capability balance.
FAQ
Q: Are DeepSeek models suitable for production? A: Yes, they're well-maintained and reliable for production deployment.
Q: How do DeepSeek models compare to OpenAI? A: DeepSeek-33B rivals GPT-3.5 on many tasks. GPT-4 is still superior for very complex reasoning.
Q: Can I use DeepSeek commercially? A: Yes, check their model cards for specific licensing terms.
Advertisement