System Design Interview Guide 2026: URL Shortener to Netflix Scale

System Design Interviews 2026: The Complete Playbook

System design is where senior engineers are made or broken. It tests your ability to think at scale, make trade-offs, and communicate complex ideas clearly. This guide covers the framework and seven canonical problems every FAANG candidate must master.

The RADIO Framework
Problem 1: URL Shortener (like bit.ly)
Problem 2: Design Instagram
Problem 3: Design WhatsApp
Problem 4: Design Netflix
Problem 5: Design a Rate Limiter
Problem 6: Design a Key-Value Store
How to Practice System Design

The RADIO Framework

Structure every system design interview with RADIO:

R — Requirements
  Functional:     What does the system do?
  Non-functional: Scale, availability, latency, consistency

A — API Design
  Define endpoints before diving into internals
  REST, GraphQL, or gRPC?

D — Data Model
  What entities exist? How are they related?
  SQL vs NoSQL decision

I — High-Level Design
  Components: clients, servers, databases, caches, queues
  Draw the box diagram

O — Deep Dives
  Pick 2-3 interesting/risky components
  Go deep on the interviewer's cues

Problem 1: URL Shortener (like bit.ly)

Requirements:
- Shorten URLs: given longUrl, return shortUrl (e.g., sho.rt/abc123)
- Redirect: GET /abc123 → 301/302 to longUrl
- Analytics: click count, unique visitors
- Scale: 100M URLs created/month, 10:1 read/write ratio

Capacity Estimation:
- Write: 100M / 30 days / 86400s ≈ 40 writes/sec
- Read:  400 reads/sec
- Storage: 100M URLs × 500 bytes = 50GB/month, 600GB/year
- Cache: top 20% URLs = 80% reads → cache ~10GB/day

Short URL generation:
- 7 characters from [a-zA-Z0-9] = 62^7 = 3.5 trillion URLs
- Approach 1: Hash (MD5/SHA256 → take first 7 chars, handle collisions)
- Approach 2: Base62 encode auto-increment ID (simpler, no collisions)

// URL shortener core logic
class UrlShortener {
  private readonly BASE62 = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

  // Encode auto-increment ID to base62
  encode(id: number): string {
    let result = ''
    while (id > 0) {
      result = this.BASE62[id % 62] + result
      id = Math.floor(id / 62)
    }
    return result.padStart(7, '0')
  }

  decode(shortCode: string): number {
    return shortCode.split('').reduce((acc, char) => {
      return acc * 62 + this.BASE62.indexOf(char)
    }, 0)
  }
}

// Database schema
// urls table:
// id BIGSERIAL PRIMARY KEY
// short_code VARCHAR(10) UNIQUE
// long_url TEXT NOT NULL
// user_id BIGINT
// created_at TIMESTAMP
// expires_at TIMESTAMP
// click_count BIGINT DEFAULT 0

// Redirect handler
async function redirect(shortCode: string) {
  // 1. Check cache (Redis)
  const cached = await redis.get(`url:${shortCode}`)
  if (cached) {
    await redis.incr(`clicks:${shortCode}`)  // Async click tracking
    return { statusCode: 301, location: cached }
  }

  // 2. DB lookup
  const url = await db.query(
    'SELECT long_url FROM urls WHERE short_code = $1 AND (expires_at IS NULL OR expires_at > NOW())',
    [shortCode]
  )

  if (!url.rows[0]) return { statusCode: 404 }

  // 3. Cache it
  await redis.setex(`url:${shortCode}`, 86400, url.rows[0].long_url)

  return { statusCode: 301, location: url.rows[0].long_url }
}

Architecture:
Client → CDN → Load Balancer → App Servers (stateless, horizontal)
App Servers → PostgreSQL (write) + Read Replicas (read)
App Servers → Redis Cluster (URL cache, click counters)
Analytics: Batch flush click counters to DB every minute

Problem 2: Design Instagram

Requirements:
- Upload photos/videos
- Follow users, news feed
- Like, comment
- Scale: 1B users, 100M daily active, 100M photos uploaded/day

Key decisions:
1. Photo storage: S3 (not DB), store URL in DB
2. News feed: precompute (push model) vs compute on read (pull model)
   - Push (fan-out on write): fast reads, expensive for celebrities (1M followers)
   - Pull (fan-out on read): slow reads, simple writes
   - Hybrid: push for users with <1M followers, pull for celebrities

Feed table (precomputed):
user_id | post_id | created_at | (INDEX on user_id, created_at)

Follow graph:
followers: user_id → follower_id (indexed both ways)
Use a graph DB (Neo4j) for complex social graph queries
Or: denormalize into Redis sorted sets for fast feed generation

Photo Upload Flow:
1. Client requests presigned S3 URL from API server
2. Client uploads directly to S3 (bypasses API server)
3. S3 triggers Lambda → add to processing queue
4. Workers: resize to 3 sizes, extract metadata, update DB
5. CDN in front of S3 for reads

Feed Generation (push model):
Post created → fan-out service reads follower list
→ writes post_id to each follower's feed in Redis (sorted set by timestamp)
→ Feed API reads from Redis (fast, O(log n))
→ Paginate with cursor (last seen post_id)

Problem 3: Design WhatsApp

Requirements:
- 1:1 messaging, group chats (up to 256 members)
- Online/offline status, read receipts
- Message delivery: at-least-once with deduplication
- End-to-end encryption (E2E)
- Scale: 100B messages/day = 1.1M messages/sec

Core insight: use long-polling or WebSocket for real-time delivery

Message states:
Sent → Delivered → Read
(Single check → Double check → Blue double check in WhatsApp)

Message Flow:
1. Sender sends message via WebSocket to server
2. Server stores in DB (Cassandra — write-heavy, horizontal scale)
3. Server pushes to recipient via their WebSocket connection
4. If recipient offline: store in "inbox" → deliver on reconnect
5. Recipient ACKs delivery → server sends "delivered" status to sender
6. Recipient opens chat → sends "read" event → server notifies sender

Database Choice: Cassandra
- Partition key: (chat_id) — all messages for a chat on same node
- Cluster key: message_timestamp (DESC) — latest messages first
- Why not SQL? Write throughput too high, Cassandra handles 1M writes/sec

Presence Service:
- Heartbeat every 30 seconds
- Redis pub/sub: subscribe to your contacts' presence channels
- "Last seen": store timestamp, expire after 60 seconds = offline

Problem 4: Design Netflix

Scale:
- 250M subscribers
- 15% of global internet traffic
- 10,000+ titles, each encoded in 20+ formats

Key problems:
1. Video storage and encoding
2. Fast streaming start
3. Global CDN (Open Connect)
4. Recommendation engine

Video Processing Pipeline:
Upload (raw file) → Transcoding Farm
→ H.264/H.265/AV1 encoding
→ Multiple resolutions: 240p, 360p, 480p, 720p, 1080p, 4K
→ Multiple bitrates: adaptive bitrate (ABR) streaming (HLS/DASH)
→ Split into 2-4 second segments
→ Store on Open Connect (Netflix's own CDN, 17,000+ servers in ISPs)

Adaptive Bitrate Streaming:
- Player monitors download speed and buffer health
- Switches quality mid-stream to avoid buffering
- DASH (Dynamic Adaptive Streaming over HTTP)

Cold Start (under 1 second):
1. DNS resolves to nearest CDN node
2. Player downloads manifest file (list of segments)
3. Downloads first few segments immediately (pre-buffer)
4. Plays while fetching rest in background

Problem 5: Design a Rate Limiter

Algorithms:
1. Token Bucket — allows burst, common for APIs
2. Leaking Bucket — smooth constant rate, no burst
3. Fixed Window — simple, but boundary problem
4. Sliding Window — accurate, memory-heavy
5. Sliding Window Counter — good balance

Token Bucket (most common for APIs):
- Each user gets a bucket of N tokens
- Tokens refill at rate R per second
- Each request consumes 1 token
- If bucket empty: rate limit (429 Too Many Requests)

// Token bucket with Redis (distributed rate limiter)
class RateLimiter {
  constructor(
    private redis: Redis,
    private maxTokens: number,      // Bucket capacity
    private refillRate: number,     // Tokens per second
  ) {}

  async isAllowed(key: string): Promise<boolean> {
    const now = Date.now() / 1000  // Unix timestamp in seconds

    // Lua script for atomic check-and-consume
    const script = `
      local key = KEYS[1]
      local max_tokens = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])

      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or max_tokens
      local last_refill = tonumber(bucket[2]) or now

      -- Add tokens based on time elapsed
      local elapsed = now - last_refill
      tokens = math.min(max_tokens, tokens + elapsed * refill_rate)

      if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, 3600)
        return 1
      else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 0
      end
    `

    const result = await this.redis.eval(script, 1, key,
      this.maxTokens, this.refillRate, now)

    return result === 1
  }
}

// Usage: 100 requests per minute per user
const limiter = new RateLimiter(redis, 100, 100/60)

app.use(async (req, res, next) => {
  const key = `rate:${req.ip}:${req.user?.id ?? 'anon'}`
  const allowed = await limiter.isAllowed(key)

  if (!allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' })
  }
  next()
})

Problem 6: Design a Key-Value Store

Requirements:
- get(key), put(key, value), delete(key)
- Distributed (data > single machine)
- Fault tolerant, available

Design: Consistent Hashing
- N virtual nodes per physical node (prevents hot spots on remove/add)
- When node joins: takes keys from successor
- When node leaves: successor takes its keys

Replication: 3 replicas (N=3)
- Write quorum W=2: confirmed by 2 nodes → write succeeds
- Read quorum R=2: read from 2 nodes, compare → consistent
- W + R > N → strong consistency
- W=1, R=1 → high availability but eventual consistency

Conflict Resolution:
- Last-Write-Wins (LWW): use timestamps
- Vector clocks: track causality across nodes
- CRDTs: mathematically mergeable data types

Real-world: DynamoDB, Cassandra, Redis Cluster

How to Practice System Design

Top resources:
1. "Designing Data-Intensive Applications" — Alex Rodnegas (best book)
2. System Design Interview by Alex Xu (vols 1 & 2)
3. ByteByteGo newsletter + YouTube (Alex Xu)
4. Exponent.dev — mock interviews

Practice schedule:
- 1 system per day × 30 days = ready for interviews
- Draw diagrams (Excalidraw, draw.io)
- Estimate numbers out loud (practice the math)
- Record yourself explaining designs

Common mistakes:
- Jumping to implementation before clarifying requirements
- Not estimating scale first
- Choosing technologies without justifying trade-offs
- Going too deep too early (miss the high-level architecture)
- Not asking about bottlenecks and failure modes

System design interviews reward engineers who have thought deeply about real systems. Read post-mortems. Study how Netflix, Uber, Airbnb built their systems. Build things that scale. Every hour of building is worth five hours of studying.