Published on

Founder Demands "Just Make It Fast" — Translating Business Pressure Into Engineering Work

Authors

Introduction

"The app is slow" is not a performance requirement. It's a complaint with unknown scope, unknown severity, and unknown cause. Engineers who respond to it by guessing — caching everything, rewriting the database layer, adding CDN — usually improve something and miss the thing that actually matters. The engineers who respond well start by measuring: where is it slow, how slow, for whom, and how often? That measurement turns vague pressure into a ranked list of specific, fixable problems.

The "Make It Fast" Trap

Common responses to "the app is slow" — and why they often miss:

1. "Add caching everywhere"
Caches the wrong queries
Misses that the bottleneck is a slow third-party API call
Adds cache invalidation bugs

2. "Upgrade the database"
Expensive
Buys time, doesn't fix the query that caused the problem

3. "Add more servers"
Doesn't help if the bottleneck is a single slow database query
Costs money for no improvement

4. "Rewrite in [faster language/framework]"
6-month project
Still doesn't know what was actually slow

What you need instead:
Real User Monitoring (RUM) data
Database slow query log
API endpoint performance breakdown
Specific route × specific percentile × specific user segment

Fix 1: Measure Before You Optimize — Find the Actual Bottleneck

// Step 1: Instrument every endpoint with timing data
app.use((req, res, next) => {
  const start = Date.now()

  res.on('finish', () => {
    const duration = Date.now() - start

    // Log slow requests for investigation
    if (duration > 1000) {
      logger.warn({
        method: req.method,
        path: req.route?.path ?? req.path,
        statusCode: res.statusCode,
        duration,
        userId: req.user?.id,
        queryCount: req.queryCount ?? 0,
      }, 'Slow request')
    }

    // Track percentiles in Prometheus
    httpDuration.observe(
      { method: req.method, route: req.route?.path ?? 'unknown', status: res.statusCode.toString() },
      duration / 1000
    )
  })

  next()
})
// Step 2: Track query count per request — find N+1 problems
// Middleware that wraps db.query to count calls
function instrumentDatabase(db: Pool) {
  const originalQuery = db.query.bind(db)

  db.query = function(text: string, values?: any[]) {
    const req = getCurrentRequest()  // AsyncLocalStorage
    if (req) {
      req.queryCount = (req.queryCount ?? 0) + 1
      req.queryDuration = (req.queryDuration ?? 0)
    }
    return originalQuery(text, values)
  }

  return db
}

// After one week of data, you can answer:
// - Which endpoints are slowest (p95)?
// - Which endpoints make the most DB queries?
// - Which pages are slow only on mobile?
// - Are specific users affected?

Fix 2: Structured Performance Investigation Process

// performance-audit.ts — systematic approach to "make it fast"

interface PerformanceFinding {
  endpoint: string
  metric: string
  current: string
  target: string
  rootCause: string
  estimatedImpact: string
  effort: 'hours' | 'days' | 'weeks'
  priority: 1 | 2 | 3
}

async function runPerformanceAudit(): Promise<PerformanceFinding[]> {
  const findings: PerformanceFinding[] = []

  // 1. Find slowest endpoints by p95 latency
  const slowEndpoints = await db.query(`
    SELECT
      path,
      COUNT(*) as request_count,
      PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY duration) as p50,
      PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration) as p95,
      PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration) as p99,
      AVG(query_count) as avg_queries_per_request
    FROM request_logs
    WHERE created_at > NOW() - INTERVAL '7 days'
    GROUP BY path
    ORDER BY p95 DESC
    LIMIT 20
  `)

  for (const endpoint of slowEndpoints.rows) {
    if (endpoint.avg_queries_per_request > 10) {
      findings.push({
        endpoint: endpoint.path,
        metric: `${endpoint.avg_queries_per_request.toFixed(0)} queries/request`,
        current: `p95: ${endpoint.p95}ms`,
        target: 'p95: < 200ms',
        rootCause: 'N+1 query pattern — likely missing DataLoader or eager loading',
        estimatedImpact: 'High — affects every request to this endpoint',
        effort: 'hours',
        priority: 1,
      })
    }
  }

  // 2. Find missing database indexes
  const slowQueries = await db.query(`
    SELECT query, calls, mean_exec_time, total_exec_time
    FROM pg_stat_statements
    WHERE mean_exec_time > 100
    ORDER BY total_exec_time DESC
    LIMIT 10
  `)

  // 3. Check for sequential scans on large tables
  const seqScans = await db.query(`
    SELECT schemaname, relname, seq_scan, idx_scan,
           seq_scan::float / NULLIF(seq_scan + idx_scan, 0) as seq_ratio
    FROM pg_stat_user_tables
    WHERE seq_scan > 100
    AND n_live_tup > 10000  -- Only tables with > 10k rows
    ORDER BY seq_ratio DESC
  `)

  return findings.sort((a, b) => a.priority - b.priority)
}

Fix 3: Present Findings as a Prioritized Roadmap

// Turn raw data into a presentation for the founder/CEO
// They need to understand what "fast" means concretely

function generatePerformanceRoadmap(findings: PerformanceFinding[]): string {
  const quick = findings.filter(f => f.effort === 'hours')
  const medium = findings.filter(f => f.effort === 'days')
  const large = findings.filter(f => f.effort === 'weeks')

  return `
# Performance Improvement Roadmap

## Current State
- Homepage p95 load time: 3.2 seconds
- Checkout p95 load time: 4.8 seconds
- Mobile users experiencing 2x slower load times

## Quick Wins (This Week — 80% of improvement)
${quick.map(f => `- **${f.endpoint}**: ${f.rootCause}${f.estimatedImpact}`).join('\n')}

## Medium Work (Next 2 Weeks)
${medium.map(f => `- ${f.endpoint}: ${f.rootCause}`).join('\n')}

## Large Projects (If Needed)
${large.map(f => `- ${f.endpoint}: ${f.rootCause} (${f.effort})`).join('\n')}

## Concrete Targets
- Week 1: Homepage p95 < 1.5s (from 3.2s)
- Week 2: Checkout p95 < 2s (from 4.8s)
- Week 4: All pages p95 < 1s for logged-in users

## What We're NOT Doing (and Why)
- No database upgrade: bottleneck is query structure, not hardware
- No full rewrite: targeted fixes will get 80% improvement in 20% of the time
`
}

Fix 4: The Top 5 Fixes That Cover 80% of Slowness

// Ranked by impact-to-effort ratio:

// 1. Add missing indexes — highest ROI, takes hours
// Run EXPLAIN ANALYZE on your top 10 slowest queries
// Missing index on a 1M row table: query goes from 2s → 2ms

await db.query(`
  -- Find queries doing seq scans on large tables
  EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders
  WHERE user_id = $1 AND status = 'pending'
  -- If this shows "Seq Scan" → add index: CREATE INDEX CONCURRENTLY ...
`)

// 2. Fix N+1 queries — second highest ROI
// Symptom: 100 DB queries for a page that should need 3
// Fix: use DataLoader pattern or SQL JOINs

// 3. Add Redis cache for repeated reads
// Any data that: doesn't change often + is read frequently
// Product catalog, user profiles, config — cache with TTL

// 4. Compress API responses
app.use(compression({
  threshold: 1024,  // Only compress responses > 1KB
  level: 6,        // Balance speed vs compression ratio
}))

// 5. Move heavy work to background queues
// Password hashing, email sending, PDF generation, image resizing
// None of these should block the HTTP response

Performance Work Checklist

  • ✅ Measured before optimizing — know which endpoints are slow, not guessing
  • ✅ Query count per request tracked — N+1 queries visible in logs
  • ✅ Database slow query log enabled and reviewed
  • ✅ Performance findings ranked by p95 impact × request volume
  • ✅ Concrete targets defined: "p95 < 500ms for checkout" not "make it faster"
  • ✅ Quick wins separated from large projects — week 1 vs month 1
  • ✅ Progress tracked with before/after metrics, not vibes

Conclusion

"Make it fast" is a starting point, not a spec. The engineering response is to turn it into measurements: which pages, which percentiles, which user segments, which operations are slow. Then rank by impact — the two or three changes that fix 80% of the perceived slowness are almost always indexing, N+1 query elimination, and caching a handful of hot read paths. A one-week investigation with instrumentation data produces a concrete roadmap that satisfies the business pressure with targeted work, not speculative rewrites.