Published on

Product Launch With No Load Testing — When the Press Release Causes the Outage

Authors

Introduction

Product launches are the most dangerous moment in a startup's engineering life. Marketing has promised the world, press has been lined up, and the team is exhausted from shipping the feature. What nobody has done is ask: "What happens if 50,000 people try to sign up in the first hour?" Load testing before a launch is not a luxury — it's the difference between a successful launch story and a "company couldn't handle demand" headline that follows you for years.

The Launch Traffic Pattern

Typical product launch traffic curve:

T-0:    TechCrunch / Product Hunt article goes live
T+5min: First wave from social sharing — 5x normal
T+15min: Second wave from retweets — 15x normal
T+45min: Third wave from email newsletters — 30x normal
T+2hr:  HN front page pick-up (if you're lucky/unlucky) — 50x normal
T+6hr:  European audience wakes up — sustained 20x
T+24hr: Traffic normalizes to 3-5x new baseline

What each spike tests:
- Signup endpoint: absolutely hammered (everyone tries to sign up at once)
- Marketing pages: highest absolute traffic
- Email verification: 50,000 emails in 1 hour → SendGrid rate limits
- Database: every new user = row write + N reads
- Session storage: 50,000 new sessions simultaneously

Fix 1: Load Test at Launch Scale, Not Normal Scale

// launch-loadtest.js — k6 test modeling launch day traffic
import http from 'k6/http'
import { sleep, check, group } from 'k6'

export const options = {
  scenarios: {
    // Simulate the TechCrunch spike
    launch_spike: {
      executor: 'ramping-vus',
      startVUs: 10,
      stages: [
        { duration: '2m', target: 500 },   // Rapid spike
        { duration: '5m', target: 2000 },  // Peak (50x normal)
        { duration: '5m', target: 2000 },  // Sustained
        { duration: '3m', target: 500 },   // Drop
      ],
      thresholds: {
        http_req_duration: ['p(99)<3000'],  // 99% under 3s
        http_req_failed: ['rate<0.01'],     // <1% errors
      },
    },
  },
}

export default function () {
  group('Signup flow (most critical at launch)', () => {
    // Step 1: Load landing page
    const landing = http.get('https://staging.myapp.com/')
    check(landing, { 'landing: status 200': r => r.status === 200 })

    // Step 2: Submit signup
    const signup = http.post('https://staging.myapp.com/api/auth/signup', JSON.stringify({
      email: `loadtest+${Date.now()}+${Math.random().toString(36)}@example.com`,
      password: 'TestPass123!',
    }), { headers: { 'Content-Type': 'application/json' } })

    check(signup, {
      'signup: status 201': r => r.status === 201,
      'signup: fast': r => r.timings.duration < 2000,
    })
  })

  sleep(Math.random() * 2)
}

Fix 2: The Launch Checklist — 2 Weeks Before

// launch-readiness.ts — audit before every major launch
interface LaunchCheck {
  category: string
  check: string
  status: 'pass' | 'fail' | 'warn'
  notes?: string
}

async function runLaunchReadinessChecks(): Promise<LaunchCheck[]> {
  return [
    // Capacity checks
    {
      category: 'Capacity',
      check: 'Auto-scaling configured and tested',
      status: await isAutoScalingConfigured() ? 'pass' : 'fail',
    },
    {
      category: 'Capacity',
      check: 'Pre-warm scheduled for launch time',
      status: await hasPreWarmScheduled() ? 'pass' : 'warn',
      notes: 'Scale to 5x capacity 30 minutes before launch',
    },
    {
      category: 'Database',
      check: 'Connection pooler (PgBouncer) configured',
      status: await isPgBouncerRunning() ? 'pass' : 'fail',
    },
    {
      category: 'Database',
      check: 'Slow query log reviewed, indexes in place',
      status: await hasReviewedSlowQueries() ? 'pass' : 'warn',
    },
    {
      category: 'Email',
      check: 'SendGrid rate limit will not be hit at launch scale',
      status: await checkEmailRateLimit() ? 'pass' : 'fail',
      notes: 'SendGrid free tier: 100/day. Pro tier: 100/second. Calculate expected signups.',
    },
    {
      category: 'Monitoring',
      check: 'Alerts configured for error rate, latency, DB connections',
      status: await hasAlertsConfigured() ? 'pass' : 'fail',
    },
    {
      category: 'Operations',
      check: 'War room scheduled — engineers available during launch window',
      status: await hasWarRoomScheduled() ? 'pass' : 'warn',
    },
    {
      category: 'Rollback',
      check: 'Rollback plan documented and tested',
      status: await hasRollbackPlan() ? 'pass' : 'fail',
    },
  ]
}

Fix 3: Email Rate Limit Planning

// Email is the most-missed launch scaling bottleneck
// 10,000 signups in 1 hour = 167 emails/minute

// Check your email provider's limits before launch:
// SendGrid free: 100/day → FAILS immediately
// SendGrid Essentials: 100/hour → fails at 2,000 signups/hour
// SendGrid Pro: 1,500/hour → handle ~90,000 signups before throttling

// Queue emails and rate-limit your own sending
import Bull from 'bull'

const emailQueue = new Bull('email', {
  redis: process.env.REDIS_URL,
  limiter: {
    max: 100,       // Max 100 jobs processed
    duration: 60000, // Per 60 seconds
  },
})

// Also: defer non-critical emails during the spike
async function sendWelcomeEmail(userId: string, email: string): Promise<void> {
  // Don't block signup on email — queue it
  await emailQueue.add(
    { type: 'welcome', userId, email },
    {
      delay: Math.random() * 30000,  // Spread over 30 seconds to flatten burst
      attempts: 3,
      backoff: { type: 'exponential', delay: 5000 },
    }
  )
}

// Critical emails (email verification): send immediately, bypass queue
// Non-critical (onboarding series): queue with 24h delay

Fix 4: Feature Flag to Shed Load at Peak

// Launch day: protect the critical path (signup, login)
// by shedding non-critical features under high load

import { getFlag } from './feature-flags'

async function getHomepageData(userId?: string): Promise<HomepageData> {
  const isUnderHighLoad = await getFlag('high-load-mode')

  // Under high load: return minimal page, no recommendations
  if (isUnderHighLoad) {
    return {
      hero: await getHeroContent(),   // Critical: cached
      features: STATIC_FEATURES,       // Critical: static
      // No personalized recommendations — DB too busy
      // No social proof — third-party API bypassed
      // No A/B test variants — feature flag returns default
    }
  }

  return {
    hero: await getHeroContent(),
    features: await getFeatures(userId),
    recommendations: await getRecommendations(userId),
    socialProof: await getSocialProof(),
  }
}

// Flip 'high-load-mode' to true if things get rough — instant effect

Fix 5: War Room Protocol for Launch Day

// launch-war-room.ts — real-time dashboard for launch day
async function launchDayMonitor() {
  const LAUNCH_TIME = new Date('2026-03-15T14:00:00Z')
  const MONITOR_DURATION_HOURS = 4

  const dashboard = {
    signupsLastMinute: 0,
    errorRateLastMinute: 0,
    p95LatencyMs: 0,
    activeUsers: 0,
    dbConnections: 0,
  }

  const interval = setInterval(async () => {
    const [signups, errors, latency, sessions, dbConns] = await Promise.all([
      getMetric('signups', '1m'),
      getMetric('error_rate', '1m'),
      getMetric('p95_latency', '5m'),
      getMetric('active_sessions', 'current'),
      getMetric('db_connections', 'current'),
    ])

    // Log status every minute
    console.log(`[${new Date().toISOString()}] Launch Status:`)
    console.log(`  Signups/min:    ${signups}`)
    console.log(`  Error rate:     ${(errors * 100).toFixed(2)}%`)
    console.log(`  P95 latency:    ${latency}ms`)
    console.log(`  Active users:   ${sessions}`)
    console.log(`  DB connections: ${dbConns}`)

    // Auto-scale if needed
    if (errors > 0.02) {  // >2% error rate
      await alerting.critical('Launch error rate > 2% — consider activating high-load-mode')
    }
  }, 60_000)

  setTimeout(() => clearInterval(interval), MONITOR_DURATION_HOURS * 3600 * 1000)
}

Launch Readiness Checklist

  • ✅ Load tested at 5x and 10x expected launch traffic — with real endpoints
  • ✅ Email provider plan supports expected signup volume per hour
  • ✅ Auto-scaling configured and pre-warm scheduled 30 minutes before launch
  • ✅ Database connection pool sized for launch-scale concurrent users
  • ✅ Feature flag ready to shed non-critical features under load
  • ✅ War room scheduled — engineers on standby during peak window
  • ✅ Rollback plan exists and is tested (can you roll back at launch peak?)
  • ✅ Status page ready — communicate proactively if issues occur

Conclusion

A product launch is a planned traffic spike. The teams that execute them well treat it like a deployment: load test at the expected scale, have engineers on standby, pre-warm capacity, and have a clear protocol for when things go wrong. The outage that happens at launch isn't bad luck — it's the result of never asking "what if 50,000 people arrive in the first hour?" Ask that question two weeks before launch, when there's time to fix the answers.