Kubernetes Horizontal Pod Autoscaling

Automatically scale pod replicas based on metrics. Learn HPA configuration and advanced scaling strategies.

Introduction

Horizontal Pod Autoscaler (HPA) automatically scales application replicas based on observed metrics like CPU and memory usage.

Kubernetes Horizontal Pod Autoscaling
Prerequisites: Metrics Server
Basic HPA
Simple CPU-Based Scaling
Memory-Based Scaling
Multiple Metrics
Custom Metrics
Application Metrics
External Metrics
Scaling Policies
Conservative Scaling Down
Aggressive Scaling Up
Advanced Configuration
Managing HPA
Vertical Pod Autoscaling
Troubleshooting
Best Practices
FAQ

Prerequisites: Metrics Server

Install metrics-server to collect metrics:

# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system
kubectl top nodes
kubectl top pods

Basic HPA

Simple CPU-Based Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app

  minReplicas: 2
  maxReplicas: 10

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Memory-Based Scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: data-processor

  minReplicas: 1
  maxReplicas: 5

  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Multiple Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: combined-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app

  minReplicas: 2
  maxReplicas: 20

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

Custom Metrics

Application Metrics

Scale based on application-specific metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: request-rate-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server

  minReplicas: 1
  maxReplicas: 100

  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

External Metrics

Scale based on external system metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker

  minReplicas: 1
  maxReplicas: 50

  metrics:
  - type: External
    external:
      metric:
        name: queue_length
        selector:
          matchLabels:
            queue_name: tasks
      target:
        type: AverageValue
        averageValue: "30"

Scaling Policies

Conservative Scaling Down

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Percent
      value: 10
      periodSeconds: 60
    - type: Pods
      value: 2
      periodSeconds: 60
    selectPolicy: Min

Aggressive Scaling Up

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
    selectPolicy: Max

Advanced Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: advanced-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app

  minReplicas: 3
  maxReplicas: 50

  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      - type: Pods
        value: 5
        periodSeconds: 60
      selectPolicy: Min

    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 10
        periodSeconds: 15
      selectPolicy: Max

Managing HPA

# Create HPA
kubectl apply -f hpa.yaml

# List HPAs
kubectl get hpa

# View HPA details
kubectl describe hpa web-app-hpa

# Current metrics
kubectl get hpa web-app-hpa --watch

# View detailed status
kubectl get hpa web-app-hpa -o yaml

Watch HPA in action:

# Terminal 1: Watch HPA
kubectl get hpa --watch

# Terminal 2: Generate load
kubectl run -it --rm load-generator \
  --image=busybox /bin/sh

# Inside container:
while true; do wget -q -O- http://web-app:3000; done

Vertical Pod Autoscaling

For automatic resource request/limit adjustment:

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: app

  updatePolicy:
    updateMode: "Auto"  # Can be: Off, Initial, Recreate, Auto

Troubleshooting

# Check metrics availability
kubectl top nodes
kubectl top pods

# View HPA events
kubectl describe hpa web-app-hpa

# Check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Manually query metrics
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes

# Debug HPA calculations
kubectl get hpa web-app-hpa -o yaml | grep -A 10 "status:"

Best Practices

Set appropriate resource requests for metrics to work
Use multiple metrics (CPU + memory)
Conservative scale-down to prevent flapping
Monitor scaling events in logs
Test scaling behavior under load
Set reasonable min/max replicas
Consider Pod Disruption Budgets

FAQ

Q: What's the minimum time for HPA to scale? A: Default is 3 minutes to scale down, 0 seconds to scale up. Configurable via stabilizationWindowSeconds.

Q: Why isn't HPA scaling my pods? A: Ensure metrics-server is running and pods have resource requests defined. Check with kubectl top pods.

Q: Can HPA scale to zero? A: No, minimum is 1 replica. Use cluster autoscaling for node-level scaling.