System Design Interview Guide 2026: URL Shortener to Netflix Scale

Sanjeev SharmaSanjeev Sharma
9 min read

Advertisement

System Design Interviews 2026: The Complete Playbook

System design is where senior engineers are made or broken. It tests your ability to think at scale, make trade-offs, and communicate complex ideas clearly. This guide covers the framework and seven canonical problems every FAANG candidate must master.

The RADIO Framework

Structure every system design interview with RADIO:

RRequirements
  Functional:     What does the system do?
  Non-functional: Scale, availability, latency, consistency

AAPI Design
  Define endpoints before diving into internals
  REST, GraphQL, or gRPC?

DData Model
  What entities exist? How are they related?
  SQL vs NoSQL decision

IHigh-Level Design
  Components: clients, servers, databases, caches, queues
  Draw the box diagram

ODeep Dives
  Pick 2-3 interesting/risky components
  Go deep on the interviewer's cues

Problem 1: URL Shortener (like bit.ly)

Requirements:
- Shorten URLs: given longUrl, return shortUrl (e.g., sho.rt/abc123)
- Redirect: GET /abc123 → 301/302 to longUrl
- Analytics: click count, unique visitors
- Scale: 100M URLs created/month, 10:1 read/write ratio

Capacity Estimation:
- Write: 100M / 30 days / 86400s ≈ 40 writes/sec
- Read:  400 reads/sec
- Storage: 100M URLs × 500 bytes = 50GB/month, 600GB/year
- Cache: top 20% URLs = 80% reads → cache ~10GB/day

Short URL generation:
- 7 characters from [a-zA-Z0-9] = 62^7 = 3.5 trillion URLs
- Approach 1: Hash (MD5/SHA256 → take first 7 chars, handle collisions)
- Approach 2: Base62 encode auto-increment ID (simpler, no collisions)
// URL shortener core logic
class UrlShortener {
  private readonly BASE62 = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

  // Encode auto-increment ID to base62
  encode(id: number): string {
    let result = ''
    while (id > 0) {
      result = this.BASE62[id % 62] + result
      id = Math.floor(id / 62)
    }
    return result.padStart(7, '0')
  }

  decode(shortCode: string): number {
    return shortCode.split('').reduce((acc, char) => {
      return acc * 62 + this.BASE62.indexOf(char)
    }, 0)
  }
}

// Database schema
// urls table:
// id BIGSERIAL PRIMARY KEY
// short_code VARCHAR(10) UNIQUE
// long_url TEXT NOT NULL
// user_id BIGINT
// created_at TIMESTAMP
// expires_at TIMESTAMP
// click_count BIGINT DEFAULT 0

// Redirect handler
async function redirect(shortCode: string) {
  // 1. Check cache (Redis)
  const cached = await redis.get(`url:${shortCode}`)
  if (cached) {
    await redis.incr(`clicks:${shortCode}`)  // Async click tracking
    return { statusCode: 301, location: cached }
  }

  // 2. DB lookup
  const url = await db.query(
    'SELECT long_url FROM urls WHERE short_code = $1 AND (expires_at IS NULL OR expires_at > NOW())',
    [shortCode]
  )

  if (!url.rows[0]) return { statusCode: 404 }

  // 3. Cache it
  await redis.setex(`url:${shortCode}`, 86400, url.rows[0].long_url)

  return { statusCode: 301, location: url.rows[0].long_url }
}
Architecture:
ClientCDNLoad BalancerApp Servers (stateless, horizontal)
App ServersPostgreSQL (write) + Read Replicas (read)
App ServersRedis Cluster (URL cache, click counters)
Analytics: Batch flush click counters to DB every minute

Problem 2: Design Instagram

Requirements:
- Upload photos/videos
- Follow users, news feed
- Like, comment
- Scale: 1B users, 100M daily active, 100M photos uploaded/day

Key decisions:
1. Photo storage: S3 (not DB), store URL in DB
2. News feed: precompute (push model) vs compute on read (pull model)
   - Push (fan-out on write): fast reads, expensive for celebrities (1M followers)
   - Pull (fan-out on read): slow reads, simple writes
   - Hybrid: push for users with <1M followers, pull for celebrities

Feed table (precomputed):
user_id | post_id | created_at | (INDEX on user_id, created_at)

Follow graph:
followers: user_id → follower_id (indexed both ways)
Use a graph DB (Neo4j) for complex social graph queries
Or: denormalize into Redis sorted sets for fast feed generation
Photo Upload Flow:
1. Client requests presigned S3 URL from API server
2. Client uploads directly to S3 (bypasses API server)
3. S3 triggers Lambda → add to processing queue
4. Workers: resize to 3 sizes, extract metadata, update DB
5. CDN in front of S3 for reads

Feed Generation (push model):
Post created → fan-out service reads follower list
→ writes post_id to each follower's feed in Redis (sorted set by timestamp)
Feed API reads from Redis (fast, O(log n))
Paginate with cursor (last seen post_id)

Problem 3: Design WhatsApp

Requirements:
- 1:1 messaging, group chats (up to 256 members)
- Online/offline status, read receipts
- Message delivery: at-least-once with deduplication
- End-to-end encryption (E2E)
- Scale: 100B messages/day = 1.1M messages/sec

Core insight: use long-polling or WebSocket for real-time delivery

Message states:
SentDeliveredRead
(Single check → Double check → Blue double check in WhatsApp)
Message Flow:
1. Sender sends message via WebSocket to server
2. Server stores in DB (Cassandra — write-heavy, horizontal scale)
3. Server pushes to recipient via their WebSocket connection
4. If recipient offline: store in "inbox" → deliver on reconnect
5. Recipient ACKs delivery → server sends "delivered" status to sender
6. Recipient opens chat → sends "read" event → server notifies sender

Database Choice: Cassandra
- Partition key: (chat_id) — all messages for a chat on same node
- Cluster key: message_timestamp (DESC) — latest messages first
- Why not SQL? Write throughput too high, Cassandra handles 1M writes/sec

Presence Service:
- Heartbeat every 30 seconds
- Redis pub/sub: subscribe to your contacts' presence channels
- "Last seen": store timestamp, expire after 60 seconds = offline

Problem 4: Design Netflix

Scale:
- 250M subscribers
- 15% of global internet traffic
- 10,000+ titles, each encoded in 20+ formats

Key problems:
1. Video storage and encoding
2. Fast streaming start
3. Global CDN (Open Connect)
4. Recommendation engine

Video Processing Pipeline:
Upload (raw file)Transcoding Farm
H.264/H.265/AV1 encoding
Multiple resolutions: 240p, 360p, 480p, 720p, 1080p, 4K
Multiple bitrates: adaptive bitrate (ABR) streaming (HLS/DASH)
Split into 2-4 second segments
Store on Open Connect (Netflix's own CDN, 17,000+ servers in ISPs)

Adaptive Bitrate Streaming:
- Player monitors download speed and buffer health
- Switches quality mid-stream to avoid buffering
- DASH (Dynamic Adaptive Streaming over HTTP)

Cold Start (under 1 second):
1. DNS resolves to nearest CDN node
2. Player downloads manifest file (list of segments)
3. Downloads first few segments immediately (pre-buffer)
4. Plays while fetching rest in background

Problem 5: Design a Rate Limiter

Algorithms:
1. Token Bucket — allows burst, common for APIs
2. Leaking Bucket — smooth constant rate, no burst
3. Fixed Window — simple, but boundary problem
4. Sliding Window — accurate, memory-heavy
5. Sliding Window Counter — good balance

Token Bucket (most common for APIs):
- Each user gets a bucket of N tokens
- Tokens refill at rate R per second
- Each request consumes 1 token
- If bucket empty: rate limit (429 Too Many Requests)
// Token bucket with Redis (distributed rate limiter)
class RateLimiter {
  constructor(
    private redis: Redis,
    private maxTokens: number,      // Bucket capacity
    private refillRate: number,     // Tokens per second
  ) {}

  async isAllowed(key: string): Promise<boolean> {
    const now = Date.now() / 1000  // Unix timestamp in seconds

    // Lua script for atomic check-and-consume
    const script = `
      local key = KEYS[1]
      local max_tokens = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])

      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or max_tokens
      local last_refill = tonumber(bucket[2]) or now

      -- Add tokens based on time elapsed
      local elapsed = now - last_refill
      tokens = math.min(max_tokens, tokens + elapsed * refill_rate)

      if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        redis.call('EXPIRE', key, 3600)
        return 1
      else
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 0
      end
    `

    const result = await this.redis.eval(script, 1, key,
      this.maxTokens, this.refillRate, now)

    return result === 1
  }
}

// Usage: 100 requests per minute per user
const limiter = new RateLimiter(redis, 100, 100/60)

app.use(async (req, res, next) => {
  const key = `rate:${req.ip}:${req.user?.id ?? 'anon'}`
  const allowed = await limiter.isAllowed(key)

  if (!allowed) {
    return res.status(429).json({ error: 'Rate limit exceeded' })
  }
  next()
})

Problem 6: Design a Key-Value Store

Requirements:
- get(key), put(key, value), delete(key)
- Distributed (data > single machine)
- Fault tolerant, available

Design: Consistent Hashing
- N virtual nodes per physical node (prevents hot spots on remove/add)
- When node joins: takes keys from successor
- When node leaves: successor takes its keys

Replication: 3 replicas (N=3)
- Write quorum W=2: confirmed by 2 nodes → write succeeds
- Read quorum R=2: read from 2 nodes, compare → consistent
- W + R > N → strong consistency
- W=1, R=1 → high availability but eventual consistency

Conflict Resolution:
- Last-Write-Wins (LWW): use timestamps
- Vector clocks: track causality across nodes
- CRDTs: mathematically mergeable data types

Real-world: DynamoDB, Cassandra, Redis Cluster

How to Practice System Design

Top resources:
1. "Designing Data-Intensive Applications"Alex Rodnegas (best book)
2. System Design Interview by Alex Xu (vols 1 & 2)
3. ByteByteGo newsletter + YouTube (Alex Xu)
4. Exponent.dev — mock interviews

Practice schedule:
- 1 system per day × 30 days = ready for interviews
- Draw diagrams (Excalidraw, draw.io)
- Estimate numbers out loud (practice the math)
- Record yourself explaining designs

Common mistakes:
- Jumping to implementation before clarifying requirements
- Not estimating scale first
- Choosing technologies without justifying trade-offs
- Going too deep too early (miss the high-level architecture)
- Not asking about bottlenecks and failure modes

System design interviews reward engineers who have thought deeply about real systems. Read post-mortems. Study how Netflix, Uber, Airbnb built their systems. Build things that scale. Every hour of building is worth five hours of studying.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro