DeepSeek — China AI Model Complete Guide

Introduction

DeepSeek offers competitive open-source models from a Chinese AI startup. Their models demonstrate that frontier-quality LLMs can come from outside traditional players. This guide covers setup and deployment.

DeepSeek Model Variants
Installation
Chat Template
Code Generation
Quantization
Integration with LangChain
Performance Benchmarks
DeepSeek vs Competitors
Multi-turn Conversation
Deployment
Conclusion
FAQ

DeepSeek Model Variants

DeepSeek-7B: Efficient baseline
DeepSeek-33B: Most popular
DeepSeek-67B: High quality
DeepSeek-Coder: Code-specialized variants
DeepSeek-MoE: Mixture of Experts version

Installation

pip install transformers torch

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "deepseek-ai/deepseek-7b-chat"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

Chat Template

def chat_deepseek(messages: list) -> str:
    """Chat with DeepSeek."""
    # Format messages for DeepSeek chat template
    formatted = ""

    for msg in messages:
        if msg["role"] == "user":
            formatted += f"<｜user｜>{msg['content']}<｜end｜>\n"
        elif msg["role"] == "assistant":
            formatted += f"<｜assistant｜>{msg['content']}<｜end｜>\n"

    formatted += "<｜assistant｜>"

    inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<｜assistant｜>")[-1]

# Usage
messages = [{"role": "user", "content": "What is AI?"}]
print(chat_deepseek(messages))

Code Generation

def generate_code(prompt: str) -> str:
    """Generate code with DeepSeek-Coder."""
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.3  # Lower for consistent code
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
code_prompt = "Write Python function to calculate fibonacci"
code = generate_code(code_prompt)
print(code)

Quantization

from transformers import BitsAndBytesConfig

# 8-bit
bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/deepseek-33b-chat",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

Integration with LangChain

from langchain.llms import HuggingFacePipeline
from langchain_core.prompts import ChatPromptTemplate

llm = HuggingFacePipeline(
    model_id="deepseek-ai/deepseek-7b-chat"
)

prompt = ChatPromptTemplate.from_template(
    "Explain {topic} to a {audience}"
)

chain = prompt | llm

result = chain.invoke({
    "topic": "quantum computing",
    "audience": "beginner"
})

Performance Benchmarks

import time

models = [
    "deepseek-ai/deepseek-7b-chat",
    "deepseek-ai/deepseek-33b-chat",
]

for model_id in models:
    print(f"\n{model_id}")
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )

    prompt = "Explain machine learning"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    start = time.time()
    outputs = model.generate(**inputs, max_new_tokens=100)
    elapsed = time.time() - start

    print(f"Time: {elapsed:.2f}s")

DeepSeek vs Competitors

Model         | Size | Capability | Source
DeepSeek-33B  | 33B  | Very High  | China
Mistral-7B    | 7B   | Excellent  | France
Llama2-70B    | 70B  | Excellent  | US
Phi-3-small   | 7B   | Good       | US

Multi-turn Conversation

class DeepSeekChat:
    def __init__(self):
        self.history = []

    def chat(self, user_message: str) -> str:
        self.history.append({
            "role": "user",
            "content": user_message
        })

        # Build formatted message
        formatted = ""
        for msg in self.history:
            if msg["role"] == "user":
                formatted += f"<｜user｜>{msg['content']}<｜end｜>\n"
            else:
                formatted += f"<｜assistant｜>{msg['content']}<｜end｜>\n"

        formatted += "<｜assistant｜>"

        inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
        outputs = model.generate(**inputs, max_new_tokens=512)

        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        assistant_response = response.split("<｜assistant｜>")[-1]

        self.history.append({
            "role": "assistant",
            "content": assistant_response
        })

        return assistant_response

# Usage
bot = DeepSeekChat()
print(bot.chat("What is AI?"))
print(bot.chat("Give me an example"))

Deployment

# Docker deployment
cat > Dockerfile << 'EOF'
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y python3.10 python3-pip

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .

CMD ["python3", "app.py"]
EOF

Conclusion

DeepSeek demonstrates global capability in LLM development. Their models are competitive, accessible, and excellent for applications prioritizing cost and capability balance.

FAQ

Q: Are DeepSeek models suitable for production? A: Yes, they're well-maintained and reliable for production deployment.

Q: How do DeepSeek models compare to OpenAI? A: DeepSeek-33B rivals GPT-3.5 on many tasks. GPT-4 is still superior for very complex reasoning.

Q: Can I use DeepSeek commercially? A: Yes, check their model cards for specific licensing terms.