System Design Interview Guide 2026: URL Shortener to Netflix Scale
Advertisement
System Design Interviews 2026: The Complete Playbook
System design is where senior engineers are made or broken. It tests your ability to think at scale, make trade-offs, and communicate complex ideas clearly. This guide covers the framework and seven canonical problems every FAANG candidate must master.
- The RADIO Framework
- Problem 1: URL Shortener (like bit.ly)
- Problem 2: Design Instagram
- Problem 3: Design WhatsApp
- Problem 4: Design Netflix
- Problem 5: Design a Rate Limiter
- Problem 6: Design a Key-Value Store
- How to Practice System Design
The RADIO Framework
Structure every system design interview with RADIO:
R — Requirements
Functional: What does the system do?
Non-functional: Scale, availability, latency, consistency
A — API Design
Define endpoints before diving into internals
REST, GraphQL, or gRPC?
D — Data Model
What entities exist? How are they related?
SQL vs NoSQL decision
I — High-Level Design
Components: clients, servers, databases, caches, queues
Draw the box diagram
O — Deep Dives
Pick 2-3 interesting/risky components
Go deep on the interviewer's cues
Problem 1: URL Shortener (like bit.ly)
Requirements:
- Shorten URLs: given longUrl, return shortUrl (e.g., sho.rt/abc123)
- Redirect: GET /abc123 → 301/302 to longUrl
- Analytics: click count, unique visitors
- Scale: 100M URLs created/month, 10:1 read/write ratio
Capacity Estimation:
- Write: 100M / 30 days / 86400s ≈ 40 writes/sec
- Read: 400 reads/sec
- Storage: 100M URLs × 500 bytes = 50GB/month, 600GB/year
- Cache: top 20% URLs = 80% reads → cache ~10GB/day
Short URL generation:
- 7 characters from [a-zA-Z0-9] = 62^7 = 3.5 trillion URLs
- Approach 1: Hash (MD5/SHA256 → take first 7 chars, handle collisions)
- Approach 2: Base62 encode auto-increment ID (simpler, no collisions)
// URL shortener core logic
class UrlShortener {
private readonly BASE62 = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
// Encode auto-increment ID to base62
encode(id: number): string {
let result = ''
while (id > 0) {
result = this.BASE62[id % 62] + result
id = Math.floor(id / 62)
}
return result.padStart(7, '0')
}
decode(shortCode: string): number {
return shortCode.split('').reduce((acc, char) => {
return acc * 62 + this.BASE62.indexOf(char)
}, 0)
}
}
// Database schema
// urls table:
// id BIGSERIAL PRIMARY KEY
// short_code VARCHAR(10) UNIQUE
// long_url TEXT NOT NULL
// user_id BIGINT
// created_at TIMESTAMP
// expires_at TIMESTAMP
// click_count BIGINT DEFAULT 0
// Redirect handler
async function redirect(shortCode: string) {
// 1. Check cache (Redis)
const cached = await redis.get(`url:${shortCode}`)
if (cached) {
await redis.incr(`clicks:${shortCode}`) // Async click tracking
return { statusCode: 301, location: cached }
}
// 2. DB lookup
const url = await db.query(
'SELECT long_url FROM urls WHERE short_code = $1 AND (expires_at IS NULL OR expires_at > NOW())',
[shortCode]
)
if (!url.rows[0]) return { statusCode: 404 }
// 3. Cache it
await redis.setex(`url:${shortCode}`, 86400, url.rows[0].long_url)
return { statusCode: 301, location: url.rows[0].long_url }
}
Architecture:
Client → CDN → Load Balancer → App Servers (stateless, horizontal)
App Servers → PostgreSQL (write) + Read Replicas (read)
App Servers → Redis Cluster (URL cache, click counters)
Analytics: Batch flush click counters to DB every minute
Problem 2: Design Instagram
Requirements:
- Upload photos/videos
- Follow users, news feed
- Like, comment
- Scale: 1B users, 100M daily active, 100M photos uploaded/day
Key decisions:
1. Photo storage: S3 (not DB), store URL in DB
2. News feed: precompute (push model) vs compute on read (pull model)
- Push (fan-out on write): fast reads, expensive for celebrities (1M followers)
- Pull (fan-out on read): slow reads, simple writes
- Hybrid: push for users with <1M followers, pull for celebrities
Feed table (precomputed):
user_id | post_id | created_at | (INDEX on user_id, created_at)
Follow graph:
followers: user_id → follower_id (indexed both ways)
Use a graph DB (Neo4j) for complex social graph queries
Or: denormalize into Redis sorted sets for fast feed generation
Photo Upload Flow:
1. Client requests presigned S3 URL from API server
2. Client uploads directly to S3 (bypasses API server)
3. S3 triggers Lambda → add to processing queue
4. Workers: resize to 3 sizes, extract metadata, update DB
5. CDN in front of S3 for reads
Feed Generation (push model):
Post created → fan-out service reads follower list
→ writes post_id to each follower's feed in Redis (sorted set by timestamp)
→ Feed API reads from Redis (fast, O(log n))
→ Paginate with cursor (last seen post_id)
Problem 3: Design WhatsApp
Requirements:
- 1:1 messaging, group chats (up to 256 members)
- Online/offline status, read receipts
- Message delivery: at-least-once with deduplication
- End-to-end encryption (E2E)
- Scale: 100B messages/day = 1.1M messages/sec
Core insight: use long-polling or WebSocket for real-time delivery
Message states:
Sent → Delivered → Read
(Single check → Double check → Blue double check in WhatsApp)
Message Flow:
1. Sender sends message via WebSocket to server
2. Server stores in DB (Cassandra — write-heavy, horizontal scale)
3. Server pushes to recipient via their WebSocket connection
4. If recipient offline: store in "inbox" → deliver on reconnect
5. Recipient ACKs delivery → server sends "delivered" status to sender
6. Recipient opens chat → sends "read" event → server notifies sender
Database Choice: Cassandra
- Partition key: (chat_id) — all messages for a chat on same node
- Cluster key: message_timestamp (DESC) — latest messages first
- Why not SQL? Write throughput too high, Cassandra handles 1M writes/sec
Presence Service:
- Heartbeat every 30 seconds
- Redis pub/sub: subscribe to your contacts' presence channels
- "Last seen": store timestamp, expire after 60 seconds = offline
Problem 4: Design Netflix
Scale:
- 250M subscribers
- 15% of global internet traffic
- 10,000+ titles, each encoded in 20+ formats
Key problems:
1. Video storage and encoding
2. Fast streaming start
3. Global CDN (Open Connect)
4. Recommendation engine
Video Processing Pipeline:
Upload (raw file) → Transcoding Farm
→ H.264/H.265/AV1 encoding
→ Multiple resolutions: 240p, 360p, 480p, 720p, 1080p, 4K
→ Multiple bitrates: adaptive bitrate (ABR) streaming (HLS/DASH)
→ Split into 2-4 second segments
→ Store on Open Connect (Netflix's own CDN, 17,000+ servers in ISPs)
Adaptive Bitrate Streaming:
- Player monitors download speed and buffer health
- Switches quality mid-stream to avoid buffering
- DASH (Dynamic Adaptive Streaming over HTTP)
Cold Start (under 1 second):
1. DNS resolves to nearest CDN node
2. Player downloads manifest file (list of segments)
3. Downloads first few segments immediately (pre-buffer)
4. Plays while fetching rest in background
Problem 5: Design a Rate Limiter
Algorithms:
1. Token Bucket — allows burst, common for APIs
2. Leaking Bucket — smooth constant rate, no burst
3. Fixed Window — simple, but boundary problem
4. Sliding Window — accurate, memory-heavy
5. Sliding Window Counter — good balance
Token Bucket (most common for APIs):
- Each user gets a bucket of N tokens
- Tokens refill at rate R per second
- Each request consumes 1 token
- If bucket empty: rate limit (429 Too Many Requests)
// Token bucket with Redis (distributed rate limiter)
class RateLimiter {
constructor(
private redis: Redis,
private maxTokens: number, // Bucket capacity
private refillRate: number, // Tokens per second
) {}
async isAllowed(key: string): Promise<boolean> {
const now = Date.now() / 1000 // Unix timestamp in seconds
// Lua script for atomic check-and-consume
const script = `
local key = KEYS[1]
local max_tokens = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or max_tokens
local last_refill = tonumber(bucket[2]) or now
-- Add tokens based on time elapsed
local elapsed = now - last_refill
tokens = math.min(max_tokens, tokens + elapsed * refill_rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return 1
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
return 0
end
`
const result = await this.redis.eval(script, 1, key,
this.maxTokens, this.refillRate, now)
return result === 1
}
}
// Usage: 100 requests per minute per user
const limiter = new RateLimiter(redis, 100, 100/60)
app.use(async (req, res, next) => {
const key = `rate:${req.ip}:${req.user?.id ?? 'anon'}`
const allowed = await limiter.isAllowed(key)
if (!allowed) {
return res.status(429).json({ error: 'Rate limit exceeded' })
}
next()
})
Problem 6: Design a Key-Value Store
Requirements:
- get(key), put(key, value), delete(key)
- Distributed (data > single machine)
- Fault tolerant, available
Design: Consistent Hashing
- N virtual nodes per physical node (prevents hot spots on remove/add)
- When node joins: takes keys from successor
- When node leaves: successor takes its keys
Replication: 3 replicas (N=3)
- Write quorum W=2: confirmed by 2 nodes → write succeeds
- Read quorum R=2: read from 2 nodes, compare → consistent
- W + R > N → strong consistency
- W=1, R=1 → high availability but eventual consistency
Conflict Resolution:
- Last-Write-Wins (LWW): use timestamps
- Vector clocks: track causality across nodes
- CRDTs: mathematically mergeable data types
Real-world: DynamoDB, Cassandra, Redis Cluster
How to Practice System Design
Top resources:
1. "Designing Data-Intensive Applications" — Alex Rodnegas (best book)
2. System Design Interview by Alex Xu (vols 1 & 2)
3. ByteByteGo newsletter + YouTube (Alex Xu)
4. Exponent.dev — mock interviews
Practice schedule:
- 1 system per day × 30 days = ready for interviews
- Draw diagrams (Excalidraw, draw.io)
- Estimate numbers out loud (practice the math)
- Record yourself explaining designs
Common mistakes:
- Jumping to implementation before clarifying requirements
- Not estimating scale first
- Choosing technologies without justifying trade-offs
- Going too deep too early (miss the high-level architecture)
- Not asking about bottlenecks and failure modes
System design interviews reward engineers who have thought deeply about real systems. Read post-mortems. Study how Netflix, Uber, Airbnb built their systems. Build things that scale. Every hour of building is worth five hours of studying.
Advertisement