Docker Best Practices — Production Checklist

Running Docker in production requires careful attention to security, performance, and reliability. This checklist ensures your containers are production-ready.

Introduction

Production Docker deployments face unique challenges: security vulnerabilities, resource constraints, and operational monitoring. This guide covers essential best practices.

Docker Best Practices — Production Checklist
Image Security
Use Minimal Base Images
Scan Images for Vulnerabilities
Don't Run as Root
Keep Secrets Out of Images
Image Optimization
Multi-Stage Builds
Use .dockerignore
Layer Caching
Resource Management
Set Resource Limits
Health Checks
Logging
Structured Logging
Access Logs
Networking
Expose Only Necessary Ports
Network Security
Data Management
Use Named Volumes
Backup Strategy
Production Deployment
Image Tagging
Restart Policies
Environment Separation
Monitoring
Container Metrics
Log Aggregation
Production Checklist
FAQ

Image Security

Use Minimal Base Images

# Bad: large attack surface
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y nodejs

# Good: minimal base
FROM node:18-alpine

Alpine Linux reduces image size from 500MB+ to 50MB and minimizes vulnerabilities.

Scan Images for Vulnerabilities

# Using Trivy
trivy image myapp:1.0

# Using Docker Scout
docker scout cves myapp:1.0

# Using Snyk
snyk container test myapp:1.0

Don't Run as Root

# Create non-root user
FROM node:18-alpine
RUN addgroup -g 1001 -S nodejs && adduser -S nodejs -u 1001
USER nodejs

COPY --chown=nodejs:nodejs . .
CMD ["node", "server.js"]

Keep Secrets Out of Images

# Bad: secrets in image
ENV DATABASE_PASSWORD=secret123

# Good: pass at runtime
FROM node:18-alpine
# No hardcoded secrets

# Pass secrets at runtime
docker run -e DATABASE_PASSWORD=$(cat /run/secrets/db_password) myapp

Image Optimization

Multi-Stage Builds

# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Runtime stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./

EXPOSE 3000
CMD ["node", "dist/server.js"]

Use .dockerignore

node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.DS_Store
coverage
.next
dist
build
.vscode

Layer Caching

Order Dockerfile instructions from least to most frequently changed:

FROM node:18-alpine

# Stable layers first
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Variable layers last
COPY . .
RUN npm run build

EXPOSE 3000
CMD ["node", "server.js"]

Resource Management

Set Resource Limits

# docker-compose.yml
services:
  web:
    image: myapp:1.0
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

# docker run
docker run -m 512m --cpus 1 myapp:1.0

Health Checks

FROM node:18-alpine

WORKDIR /app
COPY . .
RUN npm ci --only=production

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node healthcheck.js

CMD ["node", "server.js"]

// healthcheck.js
const http = require('http');

const req = http.get('http://localhost:3000/health', (res) => {
  if (res.statusCode === 200) {
    process.exit(0);
  } else {
    process.exit(1);
  }
});

req.on('error', () => process.exit(1));
setTimeout(() => process.exit(1), 5000);

Logging

Structured Logging

FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm ci --only=production

# Ensure logs go to stdout
ENV NODE_ENV=production

CMD ["node", "server.js"]

// Avoid logging to files in containers
// Instead, log to stdout/stderr

console.log(JSON.stringify({
  timestamp: new Date().toISOString(),
  level: 'info',
  message: 'Server started',
  port: 3000
}));

// Docker/Kubernetes will handle log collection

Access Logs

# View container logs
docker logs container_id

# Follow logs
docker logs -f container_id

# Get specific number of lines
docker logs --tail 100 container_id

# Include timestamps
docker logs -t container_id

Networking

Expose Only Necessary Ports

# Only expose application ports
EXPOSE 3000

# Don't expose debug ports in production
# EXPOSE 5858

Network Security

# docker-compose.yml
services:
  web:
    image: myapp:1.0
    ports:
      - "3000:3000"  # Only expose needed port
    networks:
      - internal

  db:
    image: postgres:15
    networks:
      - internal
    # Don't expose port; only web connects to it

networks:
  internal:
    driver: bridge

Data Management

Use Named Volumes

services:
  db:
    image: postgres:15
    volumes:
      # Named volume for backups
      - db_data:/var/lib/postgresql/data

volumes:
  db_data:
    driver: local

Backup Strategy

# Backup named volume
docker run --rm -v db_data:/data -v $(pwd):/backup \
  ubuntu tar czf /backup/db_backup.tar.gz -C /data .

# Restore from backup
docker run --rm -v db_data:/data -v $(pwd):/backup \
  ubuntu tar xzf /backup/db_backup.tar.gz -C /data

Production Deployment

Image Tagging

# Semantic versioning
docker build -t myapp:1.0.0 .
docker tag myapp:1.0.0 myregistry/myapp:1.0.0
docker tag myapp:1.0.0 myregistry/myapp:latest

# Push all tags
docker push myregistry/myapp --all-tags

Restart Policies

services:
  web:
    image: myapp:1.0
    restart: unless-stopped  # Restart unless explicitly stopped

Options:

no: Don't restart automatically
always: Always restart if stopped
unless-stopped: Always restart unless explicitly stopped
on-failure: Restart only on non-zero exit code

Environment Separation

# docker-compose.prod.yml
services:
  web:
    image: myregistry/myapp:1.0.0
    restart: unless-stopped
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Monitoring

Container Metrics

# View container stats
docker stats

# Save stats to file
docker stats --no-stream > metrics.txt

# Monitor specific container
docker stats container_name

Log Aggregation

# docker-compose.yml with logging driver
services:
  app:
    image: myapp:1.0
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Production Checklist

FAQ

Q: Should I use latest tag in production? A: No. Always use specific semantic versions like 1.0.0 to ensure consistency and enable rollbacks.

Q: How do I handle database migrations? A: Run migrations in init containers before the main application starts, or use a separate job/init script.

Q: What's the recommended approach for secrets? A: Use Docker secrets in Swarm mode, or environment variables with a secrets management system in production.