The Backend Performance Checklist for 2026 — From Database to Edge

Sanjeev SharmaSanjeev Sharma
8 min read

Advertisement

Introduction

Performance is still the most measurable, underrated optimization. A 100ms faster API isn''t just better user experience—it''s lower error rates, higher conversion, and less infrastructure cost.

This is a comprehensive checklist across all layers: database, application, caching, network, and edge. It''s not theoretical. It''s from production systems handling billions of requests.

Database Layer

Index Coverage

  • All WHERE columns are indexed
  • All JOIN columns are indexed
  • No sequential scans in slow queries (EXPLAIN ANALYZE)
  • Composite indexes for multi-column queries
  • No redundant indexes (tool: pg_stat_statements)

Connection Pooling

  • Using PgBouncer or similar (never raw connections)
  • Pool size tuned (2x CPU cores is often right)
  • Connection idle timeout configured
  • Monitor active connections (should be < pool size)

Query Performance

  • Run EXPLAIN ANALYZE on all slow queries
  • Query plan shows index usage (no full scans)
  • No N+1 queries (batch load or join)
  • Pagination on large result sets (LIMIT, OFFSET)
  • Use SELECT specific columns (not *)

Read Replicas

  • Read-heavy queries go to replicas
  • Replication lag monitored (< 1s ideal)
  • Failover tested and documented
  • Load balanced across replicas

Application Layer

Async I/O

  • No blocking database calls in main thread
  • Database operations wrapped in promises/async-await
  • External API calls are non-blocking
  • Worker threads for CPU-bound work

Streaming

  • Large responses streamed (not buffered in memory)
  • Database result sets streamed
  • File uploads streamed
  • Response size monitored

Worker Threads for CPU

  • Heavy computation offloaded to worker pool
  • Worker pool size tuned (CPU cores = pool size)
  • Monitor queue depth and processing time

Memory Management

  • No memory leaks (heap profiler run regularly)
  • Large objects not kept in memory
  • Proper cleanup in destructors
  • Memory usage stable over time

Caching Layer

In-Process Cache

  • Simple data cached in process (LRU)
  • Cache invalidation on data change
  • TTL configured appropriately
  • Cache size monitored (shouldn''t grow unbounded)

Redis Cache

  • Expensive computations cached (embedding generation, LLM calls)
  • Cache invalidation strategy documented
  • Eviction policy set (usually allkeys-lru)
  • Memory usage monitored
  • Replication configured for HA

HTTP Caching

  • Cache-Control headers set correctly
  • ETag or Last-Modified for revalidation
  • Public vs private cached content
  • No caching private data

CDN / Edge Caching

  • Static assets served from CDN
  • Cache headers optimized per content type
  • Purge strategy for updates
  • Monitor cache hit ratio (< 80% = problem)

Network Layer

HTTP/2 or HTTP/3

  • Upgrade from HTTP/1.1
  • Multiplexing working (faster page loads)
  • Server push configured (optional)
  • QUIC/HTTP/3 if infrastructure supports

Compression

  • Gzip or Brotli enabled
  • Compression ratio monitored
  • CPU cost of compression vs bandwidth savings balanced

Connection Reuse

  • Keep-alive enabled (requests don''t reconnect)
  • Connection pooling on the client side
  • TLS session resumption working

DNS Performance

  • DNS resolution time < 50ms
  • DNS caching on client
  • Separate domain for static assets (parallel connections)

AI Performance

Semantic Caching

  • Identical queries cached (same question < 100 tokens apart)
  • Cache hit rate monitored
  • Saves ~80% on embedding+LLM costs

Streaming Responses

  • Token-level streaming (not buffering full response)
  • Client receives first token < 200ms
  • Reduces perceived latency dramatically

Async LLM Calls

  • LLM inference doesn''t block main request
  • Queue for async processing
  • Poll for results or webhook callback
  • Timeout configured (don''t wait forever)

Model Selection

  • Right-sized model for task (Llama 3 7B often beats GPT-4 for cost)
  • Quantization reduces latency (Q4_K_M is usually good)
  • Batch requests when possible
  • Smaller models for lower-latency paths

Prompt Optimization

  • Prompts are as short as possible (fewer tokens = faster)
  • Few-shot examples included only if needed
  • Avoid over-engineering prompts

Observability for Performance

Latency Percentiles

  • Track p50, p95, p99 latency
  • Not just averages (average hides problems)
  • Disaggregated by endpoint
  • Alerts on p95 < 1s (or your SLA)

Throughput

  • Requests per second by endpoint
  • Concurrent connections monitored
  • Bottlenecks identified (database, cache, network)

Resource Utilization

  • CPU usage < 70% (headroom for spikes)
  • Memory usage stable
  • Disk I/O monitored
  • Network bandwidth headroom

Distributed Tracing

  • End-to-end request flow traced
  • Latency breakdown by service/operation
  • Bottleneck identification
  • Example: Datadog APM, Jaeger

Performance Budget Per Layer

Define acceptable latency per layer:

Total API latency budget: 100ms

  • Database query: 20ms
  • Cache lookup: 5ms
  • Application logic: 30ms
  • External API calls: 30ms
  • Serialization: 15ms

If database is slow, you don''t have 30ms for logic. Enforce budgets in code reviews.

Continuous Performance Testing in CI

Load Testing

  • Run before deployment (baseline)
  • Compare to previous version
  • Fail if performance degrades < 10%
  • Tools: k6, JMeter, Locust

Benchmark Suite

  • Critical path benchmarks
  • Run on every PR
  • Track results over time
  • Alert on regressions

Example (k6 load test):

import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 100,
  duration: '30s'
};

export default function () {
  let response = http.get('https://api.example.com/users');
  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency &lt; 100ms': (r) => r.timings.duration &lt; 100,
  });
}

Run in CI on every PR. Fail if latency increases.

Profiling Tools

For Node.js:

  • node --prof (CPU profiling)
  • heapdump (memory profiling)
  • Clinic.js (visualized profiling)

For Databases:

  • EXPLAIN ANALYZE (query plans)
  • pg_stat_statements (slow query log)
  • New Relic or DataDog (APM)

Common Performance Anti-Patterns

Fetching and Discarding

// Bad: fetch all, filter in code
const allUsers = await db.query("SELECT * FROM users");
const active = allUsers.filter(u => u.status === 'active');

// Good: filter in database
const active = await db.query("SELECT * FROM users WHERE status = 'active'");

N+1 Queries

// Bad: 1 + N queries
const users = await db.query("SELECT * FROM users");
for (const user of users) {
  const posts = await db.query("SELECT * FROM posts WHERE user_id = ?", user.id);
}

// Good: 2 queries
const users = await db.query("SELECT * FROM users");
const posts = await db.query("SELECT * FROM posts WHERE user_id IN (?)", [users.map(u => u.id)]);

Serializing Large Objects

// Bad: serializing is slow
const json = JSON.stringify(largeObject);

// Good: stream or chunk
response.write(JSON.stringify(largeObject));

Checklist

Before Optimization:

  • Measured baseline (p50, p95, p99 latency)
  • Identified bottleneck (database? cache? network?)
  • Set performance target (e.g., p95 < 100ms)

Database:

  • Index coverage complete
  • Connection pooling enabled
  • Slow queries optimized
  • Read replicas for heavy reads

Application:

  • Async I/O throughout
  • Streaming for large responses
  • No memory leaks
  • Worker threads for CPU work

Caching:

  • In-process cache for hot data
  • Redis for expensive operations
  • HTTP caching headers set
  • CDN configured

Network:

  • HTTP/2 enabled
  • Compression enabled
  • Keep-alive working
  • DNS fast

Monitoring:

  • Percentile latency tracked (p50, p95, p99)
  • Alerts on SLA breach
  • Distributed tracing enabled
  • Performance regressions caught in CI

Conclusion

Performance is systematic. It''s not magic. Database indexing, async I/O, caching, and monitoring get you 90% of the way. The last 10% comes from profiling and targeted optimization.

Use this checklist before starting. Measure first. Optimize what actually matters. Ship faster, happier users.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro