DeepSeek — China AI Model Complete Guide

Sanjeev SharmaSanjeev Sharma
3 min read

Advertisement

Introduction

DeepSeek offers competitive open-source models from a Chinese AI startup. Their models demonstrate that frontier-quality LLMs can come from outside traditional players. This guide covers setup and deployment.

DeepSeek Model Variants

DeepSeek-7B: Efficient baseline
DeepSeek-33B: Most popular
DeepSeek-67B: High quality
DeepSeek-Coder: Code-specialized variants
DeepSeek-MoE: Mixture of Experts version

Installation

pip install transformers torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "deepseek-ai/deepseek-7b-chat"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

Chat Template

def chat_deepseek(messages: list) -> str:
    """Chat with DeepSeek."""
    # Format messages for DeepSeek chat template
    formatted = ""

    for msg in messages:
        if msg["role"] == "user":
            formatted += f"<|user|>{msg['content']}<|end|>\n"
        elif msg["role"] == "assistant":
            formatted += f"<|assistant|>{msg['content']}<|end|>\n"

    formatted += "<|assistant|>"

    inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<|assistant|>")[-1]

# Usage
messages = [{"role": "user", "content": "What is AI?"}]
print(chat_deepseek(messages))

Code Generation

def generate_code(prompt: str) -> str:
    """Generate code with DeepSeek-Coder."""
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.3  # Lower for consistent code
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
code_prompt = "Write Python function to calculate fibonacci"
code = generate_code(code_prompt)
print(code)

Quantization

from transformers import BitsAndBytesConfig

# 8-bit
bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/deepseek-33b-chat",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

Integration with LangChain

from langchain.llms import HuggingFacePipeline
from langchain_core.prompts import ChatPromptTemplate

llm = HuggingFacePipeline(
    model_id="deepseek-ai/deepseek-7b-chat"
)

prompt = ChatPromptTemplate.from_template(
    "Explain {topic} to a {audience}"
)

chain = prompt | llm

result = chain.invoke({
    "topic": "quantum computing",
    "audience": "beginner"
})

Performance Benchmarks

import time

models = [
    "deepseek-ai/deepseek-7b-chat",
    "deepseek-ai/deepseek-33b-chat",
]

for model_id in models:
    print(f"\n{model_id}")
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )

    prompt = "Explain machine learning"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    start = time.time()
    outputs = model.generate(**inputs, max_new_tokens=100)
    elapsed = time.time() - start

    print(f"Time: {elapsed:.2f}s")

DeepSeek vs Competitors

Model         | Size | Capability | Source
DeepSeek-33B  | 33B  | Very High  | China
Mistral-7B    | 7B   | Excellent  | France
Llama2-70B    | 70B  | Excellent  | US
Phi-3-small   | 7B   | Good       | US

Multi-turn Conversation

class DeepSeekChat:
    def __init__(self):
        self.history = []

    def chat(self, user_message: str) -> str:
        self.history.append({
            "role": "user",
            "content": user_message
        })

        # Build formatted message
        formatted = ""
        for msg in self.history:
            if msg["role"] == "user":
                formatted += f"<|user|>{msg['content']}<|end|>\n"
            else:
                formatted += f"<|assistant|>{msg['content']}<|end|>\n"

        formatted += "<|assistant|>"

        inputs = tokenizer(formatted, return_tensors="pt").to("cuda")
        outputs = model.generate(**inputs, max_new_tokens=512)

        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        assistant_response = response.split("<|assistant|>")[-1]

        self.history.append({
            "role": "assistant",
            "content": assistant_response
        })

        return assistant_response

# Usage
bot = DeepSeekChat()
print(bot.chat("What is AI?"))
print(bot.chat("Give me an example"))

Deployment

# Docker deployment
cat > Dockerfile << 'EOF'
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

WORKDIR /app

RUN apt-get update && apt-get install -y python3.10 python3-pip

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .

CMD ["python3", "app.py"]
EOF

Conclusion

DeepSeek demonstrates global capability in LLM development. Their models are competitive, accessible, and excellent for applications prioritizing cost and capability balance.

FAQ

Q: Are DeepSeek models suitable for production? A: Yes, they're well-maintained and reliable for production deployment.

Q: How do DeepSeek models compare to OpenAI? A: DeepSeek-33B rivals GPT-3.5 on many tasks. GPT-4 is still superior for very complex reasoning.

Q: Can I use DeepSeek commercially? A: Yes, check their model cards for specific licensing terms.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro