Published on

DDoS vs Legit Traffic Confusion — How to Tell a Viral Moment From an Attack

Authors

Introduction

A sudden traffic spike is simultaneously the best and worst thing that can happen to your service. The best: viral growth, press coverage, HN front page. The worst: DDoS attack, scraper flood, credential stuffing surge. The response to each is opposite — viral traffic should be served as fast as possible, DDoS traffic should be dropped as early as possible. Getting this wrong in either direction has a high cost. The signals that distinguish them are measurable, but you have to look for them before the incident, not during it.

Signals: DDoS vs Legitimate Spike

Traffic spike analysis framework:

Signals pointing to DDoS / attack:
Single IP or ASN sending > 10% of total traffic
Traffic uniformly distributed (no time-of-day variation)
Requests targeting a single endpoint (not browsing behavior)
High request rate but no session continuation (no cookies, no follow-up requests)
User-agent strings are uniform or empty
No browser fingerprint diversity (all requests look identical)
Geographic concentration (99% from one country or datacenter ASN)
Payload is identical or near-identical across requests

Signals pointing to legitimate viral spike:
Traffic from many unique IPs across diverse ASNs
Organic geographic distribution matching your user base
Multiple endpoints hit (landing page → signup → product → checkout)
Varied user agents (Chrome, Safari, Firefox, mobile)
Session-like behavior (cookies, referrer chains)
Traffic originated from social media, press sites, HN
Conversion rate similar to normal traffic

Fix 1: Traffic Analysis Dashboard (Before Incident)

// traffic-classifier.ts — real-time traffic quality analysis
async function analyzeTrafficQuality(windowMinutes = 5): Promise<TrafficQualityReport> {
  const cutoff = new Date(Date.now() - windowMinutes * 60 * 1000)

  const metrics = await Promise.all([
    // IP diversity: how many unique IPs?
    db.query(`
      SELECT COUNT(DISTINCT ip_address) as unique_ips,
             COUNT(*) as total_requests,
             COUNT(DISTINCT ip_address)::float / COUNT(*) as diversity_ratio
      FROM request_logs
      WHERE created_at > $1
    `, [cutoff]),

    // Top IPs: does any single IP dominate?
    db.query(`
      SELECT ip_address, COUNT(*) as requests,
             COUNT(*)::float / (SELECT COUNT(*) FROM request_logs WHERE created_at > $1) as share
      FROM request_logs
      WHERE created_at > $1
      GROUP BY ip_address
      ORDER BY requests DESC
      LIMIT 10
    `, [cutoff, cutoff]),

    // Endpoint distribution: concentrated or varied?
    db.query(`
      SELECT path, COUNT(*) as requests,
             COUNT(*)::float / (SELECT COUNT(*) FROM request_logs WHERE created_at > $1) as share
      FROM request_logs
      WHERE created_at > $1
      GROUP BY path
      ORDER BY requests DESC
      LIMIT 10
    `, [cutoff, cutoff]),

    // User agent diversity
    db.query(`
      SELECT COUNT(DISTINCT user_agent) as unique_uas
      FROM request_logs
      WHERE created_at > $1
    `, [cutoff]),

    // Geographic distribution
    db.query(`
      SELECT country_code, COUNT(*) as requests
      FROM request_logs
      WHERE created_at > $1
      GROUP BY country_code
      ORDER BY requests DESC
      LIMIT 10
    `, [cutoff]),
  ])

  const [ipDiversity, topIPs, endpoints, uaDiversity, geoDistribution] = metrics.map(r => r.rows)

  const topIPShare = topIPs[0]?.share ?? 0
  const topEndpointShare = endpoints[0]?.share ?? 0

  // Score: 0 = likely DDoS, 100 = likely legitimate
  let legitimacyScore = 100
  if (topIPShare > 0.1) legitimacyScore -= 40   // Single IP > 10% of traffic
  if (topEndpointShare > 0.7) legitimacyScore -= 30  // >70% to one endpoint
  if (uaDiversity[0].unique_uas < 5) legitimacyScore -= 20  // Very few user agents

  return {
    legitimacyScore,
    totalRequests: ipDiversity[0].total_requests,
    uniqueIPs: ipDiversity[0].unique_ips,
    topIPShare,
    topEndpointShare,
    uaDiversity: uaDiversity[0].unique_uas,
    suspectedType: legitimacyScore > 60 ? 'legitimate' : 'attack',
    recommendation: legitimacyScore > 60
      ? 'Scale up — this looks like organic traffic'
      : 'Activate DDoS mitigation — traffic patterns indicate attack',
  }
}

Fix 2: Tiered Response — Don't Block Legitimate Visitors

// Response strategy based on traffic classification
type MitigationLevel = 'none' | 'rate_limit' | 'challenge' | 'block'

function getMitigationLevel(botScore: number, ipReputation: number): MitigationLevel {
  // High confidence attack signals → block
  if (botScore > 90 || ipReputation < 10) return 'block'

  // Suspicious but uncertain → challenge (CAPTCHA, JS challenge)
  if (botScore > 60 || ipReputation < 40) return 'challenge'

  // Possibly bot → aggressive rate limit
  if (botScore > 30 || ipReputation < 70) return 'rate_limit'

  return 'none'
}

// Response tiers:
async function applyMitigation(req: Request, res: Response, next: NextFunction) {
  const botScore = calculateBotScore(req)
  const ipRep = await getIPReputation(req.ip)  // AbuseIPDB, IPQualityScore, etc.
  const mitigation = getMitigationLevel(botScore, ipRep)

  switch (mitigation) {
    case 'block':
      return res.status(403).json({ error: 'Access denied' })

    case 'challenge':
      // Return a JS challenge page that real browsers pass and bots fail
      // Or redirect to CAPTCHA
      if (!req.headers['x-challenge-passed']) {
        return res.status(429).send(generateJSChallengePage())
      }
      break

    case 'rate_limit':
      // 5x more restrictive limits
      if (!await strictRateLimit(req.ip)) {
        return res.status(429).json({ error: 'Rate limit exceeded' })
      }
      break
  }

  next()
}

Fix 3: Cloudflare Under Attack Mode (Emergency Response)

// When a real DDoS is underway, escalate to Cloudflare's mitigation
import Cloudflare from 'cloudflare'

const cf = new Cloudflare({ token: process.env.CLOUDFLARE_API_TOKEN })

async function activateUnderAttackMode(): Promise<void> {
  const zoneId = process.env.CLOUDFLARE_ZONE_ID!

  // Enable "I'm Under Attack" mode — all visitors get JS challenge
  await cf.zones.settings.edit('security_level', zoneId, {
    value: 'under_attack',
  })

  logger.warn('Cloudflare Under Attack mode activated')
  await alerting.critical('DDoS mitigation activated — Cloudflare Under Attack mode enabled')
}

async function deactivateUnderAttackMode(): Promise<void> {
  const zoneId = process.env.CLOUDFLARE_ZONE_ID!

  await cf.zones.settings.edit('security_level', zoneId, {
    value: 'medium',  // Back to normal security level
  })

  logger.info('Cloudflare Under Attack mode deactivated')
}

// Automatic DDoS escalation
async function monitorAndRespondToDDoS(): Promise<void> {
  const report = await analyzeTrafficQuality(5)

  if (report.suspectedType === 'attack' && report.totalRequests > 10000) {
    await activateUnderAttackMode()

    // Auto-deactivate after 30 minutes if traffic normalizes
    setTimeout(async () => {
      const currentReport = await analyzeTrafficQuality(5)
      if (currentReport.legitimacyScore > 70) {
        await deactivateUnderAttackMode()
      }
    }, 30 * 60 * 1000)
  }
}

Fix 4: Origin IP Protection During DDoS

// During a DDoS, attackers often bypass CDN and hit origin directly
// Protect by only accepting traffic from Cloudflare IPs

// cloudflare-only-middleware.ts
const CLOUDFLARE_IPS = [
  // https://www.cloudflare.com/ips-v4/
  '103.21.244.0/22',
  '103.22.200.0/22',
  '103.31.4.0/22',
  '104.16.0.0/13',
  '104.24.0.0/14',
  '108.162.192.0/18',
  '131.0.72.0/22',
  '141.101.64.0/18',
  '162.158.0.0/15',
  '172.64.0.0/13',
  '173.245.48.0/20',
  '188.114.96.0/20',
  '190.93.240.0/20',
  '197.234.240.0/22',
  '198.41.128.0/17',
]

// In your server's firewall rules (not application code — this should be at network level)
// iptables -A INPUT -p tcp --dport 443 -m set ! --match-set cloudflare-ips src -j DROP

// If using nginx:
// allow 103.21.244.0/22;
// allow 103.22.200.0/22;
// ... (all Cloudflare ranges)
// deny all;  # Block everyone else from hitting origin directly

DDoS Response Playbook

# DDoS Response Playbook

## Detection (T+0 to T+5 min)
1. Check traffic classifier: legitimacy score < 60?
2. Check Cloudflare dashboard: unusual request volume?
3. Check error rate: 5xx spike?
4. Announce in incident channel: "@oncall potential DDoS, investigating"

## Assessment (T+5 to T+10 min)
Questions to answer:
- Single IP or distributed? (check top IPs)
- Concentrated endpoint or spread? (check endpoint distribution)
- Any referrer from press/HN? (could be viral)
- Bot scores elevated?

## Mitigation (T+10 min)
If attack confirmed:
- Level 1: Enable Cloudflare "High" security level
- Level 2: Enable "Under Attack" mode (JS challenge for all)
- Level 3: Geo-block if attack is concentrated (with business approval)
- Level 4: Activate DDoS protection service (if subscribed)

## Recovery
- Monitor legitimacy score every 5 minutes
- Deactivate mitigations when score returns above 70
- Post-mortem: what enabled this attack? How do we prevent it?

DDoS Readiness Checklist

  • ✅ Traffic quality dashboard monitors legitimacy score in real time
  • ✅ Runbook distinguishes viral spike from DDoS (different responses)
  • ✅ Cloudflare or WAF configured — can escalate security level in one click
  • ✅ Origin IP not publicly known — all traffic through CDN
  • ✅ Tiered mitigation: rate limit → challenge → block (not binary allow/block)
  • ✅ On-call engineer knows the difference and knows the escalation steps
  • ✅ Auto-detection runs every 5 minutes during elevated traffic

Conclusion

Distinguishing DDoS from viral success in real time requires pre-built analysis tools and a response runbook that distinguishes the two. The key signals are IP diversity, endpoint concentration, user agent diversity, and session continuity — legitimate viral traffic has all of them; attacks typically lack them. With tiered responses (rate limit → challenge → block), you avoid the false positive of blocking your Hacker News crowd. With Cloudflare Under Attack mode, you have a fast mitigation that lets through real browsers while dropping scripted attacks. The decision tree needs to be ready before the incident — it's too noisy to reason through during one.