Published on

The Backend Performance Checklist for 2026 — From Database to Edge

Authors

Introduction

Performance is still the most measurable, underrated optimization. A 100ms faster API isn''t just better user experience—it''s lower error rates, higher conversion, and less infrastructure cost.

This is a comprehensive checklist across all layers: database, application, caching, network, and edge. It''s not theoretical. It''s from production systems handling billions of requests.

Database Layer

Index Coverage

  • All WHERE columns are indexed
  • All JOIN columns are indexed
  • No sequential scans in slow queries (EXPLAIN ANALYZE)
  • Composite indexes for multi-column queries
  • No redundant indexes (tool: pg_stat_statements)

Connection Pooling

  • Using PgBouncer or similar (never raw connections)
  • Pool size tuned (2x CPU cores is often right)
  • Connection idle timeout configured
  • Monitor active connections (should be < pool size)

Query Performance

  • Run EXPLAIN ANALYZE on all slow queries
  • Query plan shows index usage (no full scans)
  • No N+1 queries (batch load or join)
  • Pagination on large result sets (LIMIT, OFFSET)
  • Use SELECT specific columns (not *)

Read Replicas

  • Read-heavy queries go to replicas
  • Replication lag monitored (< 1s ideal)
  • Failover tested and documented
  • Load balanced across replicas

Application Layer

Async I/O

  • No blocking database calls in main thread
  • Database operations wrapped in promises/async-await
  • External API calls are non-blocking
  • Worker threads for CPU-bound work

Streaming

  • Large responses streamed (not buffered in memory)
  • Database result sets streamed
  • File uploads streamed
  • Response size monitored

Worker Threads for CPU

  • Heavy computation offloaded to worker pool
  • Worker pool size tuned (CPU cores = pool size)
  • Monitor queue depth and processing time

Memory Management

  • No memory leaks (heap profiler run regularly)
  • Large objects not kept in memory
  • Proper cleanup in destructors
  • Memory usage stable over time

Caching Layer

In-Process Cache

  • Simple data cached in process (LRU)
  • Cache invalidation on data change
  • TTL configured appropriately
  • Cache size monitored (shouldn''t grow unbounded)

Redis Cache

  • Expensive computations cached (embedding generation, LLM calls)
  • Cache invalidation strategy documented
  • Eviction policy set (usually allkeys-lru)
  • Memory usage monitored
  • Replication configured for HA

HTTP Caching

  • Cache-Control headers set correctly
  • ETag or Last-Modified for revalidation
  • Public vs private cached content
  • No caching private data

CDN / Edge Caching

  • Static assets served from CDN
  • Cache headers optimized per content type
  • Purge strategy for updates
  • Monitor cache hit ratio (< 80% = problem)

Network Layer

HTTP/2 or HTTP/3

  • Upgrade from HTTP/1.1
  • Multiplexing working (faster page loads)
  • Server push configured (optional)
  • QUIC/HTTP/3 if infrastructure supports

Compression

  • Gzip or Brotli enabled
  • Compression ratio monitored
  • CPU cost of compression vs bandwidth savings balanced

Connection Reuse

  • Keep-alive enabled (requests don''t reconnect)
  • Connection pooling on the client side
  • TLS session resumption working

DNS Performance

  • DNS resolution time < 50ms
  • DNS caching on client
  • Separate domain for static assets (parallel connections)

AI Performance

Semantic Caching

  • Identical queries cached (same question < 100 tokens apart)
  • Cache hit rate monitored
  • Saves ~80% on embedding+LLM costs

Streaming Responses

  • Token-level streaming (not buffering full response)
  • Client receives first token < 200ms
  • Reduces perceived latency dramatically

Async LLM Calls

  • LLM inference doesn''t block main request
  • Queue for async processing
  • Poll for results or webhook callback
  • Timeout configured (don''t wait forever)

Model Selection

  • Right-sized model for task (Llama 3 7B often beats GPT-4 for cost)
  • Quantization reduces latency (Q4_K_M is usually good)
  • Batch requests when possible
  • Smaller models for lower-latency paths

Prompt Optimization

  • Prompts are as short as possible (fewer tokens = faster)
  • Few-shot examples included only if needed
  • Avoid over-engineering prompts

Observability for Performance

Latency Percentiles

  • Track p50, p95, p99 latency
  • Not just averages (average hides problems)
  • Disaggregated by endpoint
  • Alerts on p95 < 1s (or your SLA)

Throughput

  • Requests per second by endpoint
  • Concurrent connections monitored
  • Bottlenecks identified (database, cache, network)

Resource Utilization

  • CPU usage < 70% (headroom for spikes)
  • Memory usage stable
  • Disk I/O monitored
  • Network bandwidth headroom

Distributed Tracing

  • End-to-end request flow traced
  • Latency breakdown by service/operation
  • Bottleneck identification
  • Example: Datadog APM, Jaeger

Performance Budget Per Layer

Define acceptable latency per layer:

Total API latency budget: 100ms

  • Database query: 20ms
  • Cache lookup: 5ms
  • Application logic: 30ms
  • External API calls: 30ms
  • Serialization: 15ms

If database is slow, you don''t have 30ms for logic. Enforce budgets in code reviews.

Continuous Performance Testing in CI

Load Testing

  • Run before deployment (baseline)
  • Compare to previous version
  • Fail if performance degrades < 10%
  • Tools: k6, JMeter, Locust

Benchmark Suite

  • Critical path benchmarks
  • Run on every PR
  • Track results over time
  • Alert on regressions

Example (k6 load test):

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 100,
  duration: '30s'
};

export default function () {
  let response = http.get('https://api.example.com/users');
  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency &lt; 100ms': (r) => r.timings.duration &lt; 100,
  });
}

Run in CI on every PR. Fail if latency increases.

Profiling Tools

For Node.js:

  • node --prof (CPU profiling)
  • heapdump (memory profiling)
  • Clinic.js (visualized profiling)

For Databases:

  • EXPLAIN ANALYZE (query plans)
  • pg_stat_statements (slow query log)
  • New Relic or DataDog (APM)

Common Performance Anti-Patterns

Fetching and Discarding

// Bad: fetch all, filter in code
const allUsers = await db.query("SELECT * FROM users");
const active = allUsers.filter(u => u.status === 'active');

// Good: filter in database
const active = await db.query("SELECT * FROM users WHERE status = 'active'");

N+1 Queries

// Bad: 1 + N queries
const users = await db.query("SELECT * FROM users");
for (const user of users) {
  const posts = await db.query("SELECT * FROM posts WHERE user_id = ?", user.id);
}

// Good: 2 queries
const users = await db.query("SELECT * FROM users");
const posts = await db.query("SELECT * FROM posts WHERE user_id IN (?)", [users.map(u => u.id)]);

Serializing Large Objects

// Bad: serializing is slow
const json = JSON.stringify(largeObject);

// Good: stream or chunk
response.write(JSON.stringify(largeObject));

Checklist

Before Optimization:

  • Measured baseline (p50, p95, p99 latency)
  • Identified bottleneck (database? cache? network?)
  • Set performance target (e.g., p95 < 100ms)

Database:

  • Index coverage complete
  • Connection pooling enabled
  • Slow queries optimized
  • Read replicas for heavy reads

Application:

  • Async I/O throughout
  • Streaming for large responses
  • No memory leaks
  • Worker threads for CPU work

Caching:

  • In-process cache for hot data
  • Redis for expensive operations
  • HTTP caching headers set
  • CDN configured

Network:

  • HTTP/2 enabled
  • Compression enabled
  • Keep-alive working
  • DNS fast

Monitoring:

  • Percentile latency tracked (p50, p95, p99)
  • Alerts on SLA breach
  • Distributed tracing enabled
  • Performance regressions caught in CI

Conclusion

Performance is systematic. It''s not magic. Database indexing, async I/O, caching, and monitoring get you 90% of the way. The last 10% comes from profiling and targeted optimization.

Use this checklist before starting. Measure first. Optimize what actually matters. Ship faster, happier users.