Published on

KEDA — Event-Driven Autoscaling for Kubernetes Workloads

Authors

Introduction

Horizontal Pod Autoscaler (HPA) scales based on CPU and memory. But many workloads don't correlate with resource utilization—they correlate with queue depth, message lag, time of day, or custom business metrics. KEDA (Kubernetes Event Driven Autoscaling) fills this gap, connecting Kubernetes to external event sources and scaling workloads based on their actual demand. This post covers KEDA architecture, common scalers (SQS, Kafka, cron), cold starts, combining KEDA with HPA, and production tuning.

KEDA Architecture

KEDA consists of two core resources: ScaledObject and ScaledJob. ScaledObject wraps a Deployment and scales replicas. ScaledJob manages Jobs that run to completion.

Architecture diagram (conceptual):

  • KEDA Operator watches ScaledObjects and ScaledJobs
  • Periodically queries scalers (SQS, Kafka, custom endpoints)
  • Updates HPA (or scales directly) based on metric values
  • Scalers execute on KEDA's schedule (polling interval)

ScaledObject for continuous workloads:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: job-processor
  namespace: processing
spec:
  scaleTargetRef:
    name: job-processor-deployment
    kind: Deployment
  minReplicaCount: 2
  maxReplicaCount: 100
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/processing"
      queueLength: "10"
      awsRegion: "us-east-1"
      identityOwner: "operator"
    authenticationRef:
      name: aws-credentials
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
          - type: Pods
            value: 4
            periodSeconds: 15
          selectPolicy: Max
  fallbacks:
  - failureType: "all"
    replicas: 5

ScaledJob for batch workloads:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: batch-processor
  namespace: batch
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: batch-processor:v1.2.3
          env:
          - name: BATCH_SIZE
            value: "100"
          - name: TIMEOUT_SECONDS
            value: "1800"
          resources:
            requests:
              cpu: "1"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "1Gi"
        restartPolicy: Never
      backoffLimit: 3
  minReplicaCount: 0
  maxReplicaCount: 50
  pollingInterval: 30
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/batch-jobs"
      queueLength: "20"
      awsRegion: "us-east-1"
      identityOwner: "operator"

SQS Queue Depth Scaler

The most common use case: scale workers based on queue depth.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: queue-worker
    kind: Deployment
  minReplicaCount: 3
  maxReplicaCount: 200
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/tasks"
      queueLength: "5"
      awsRegion: "us-east-1"
      identityOwner: "operator"
    authenticationRef:
      name: keda-aws-credentials

How it works:

  1. KEDA queries SQS: GetQueueAttributes(ApproximateNumberOfMessages)
  2. If messages >= 5 per pod, scale up
  3. If queue is empty, scale down to minReplicaCount

The queueLength parameter is critical. If set to 5 and queue has 50 messages, KEDA scales to 10 replicas (50/5).

Kafka Consumer Lag Scaler

For event streaming workloads, scale based on consumer group lag.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer
  namespace: streaming
spec:
  scaleTargetRef:
    name: kafka-processor
    kind: Deployment
  minReplicaCount: 2
  maxReplicaCount: 50
  pollingInterval: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka-broker-1:9092,kafka-broker-2:9092,kafka-broker-3:9092
      consumerGroup: "order-processor"
      topic: "orders"
      lagThreshold: "100"
      offsetResetPolicy: "latest"
    authenticationRef:
      name: kafka-auth

Example Kafka pod:

apiVersion: v1
kind: Pod
metadata:
  name: kafka-processor
  namespace: streaming
spec:
  containers:
  - name: processor
    image: kafka-processor:v1.2.3
    env:
    - name: KAFKA_BROKERS
      value: "kafka-broker-1:9092,kafka-broker-2:9092"
    - name: KAFKA_CONSUMER_GROUP
      value: "order-processor"
    - name: KAFKA_TOPICS
      value: "orders"
    - name: BATCH_SIZE
      value: "100"
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "2"
        memory: "2Gi"
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10

KEDA calculates lag = (latest offset - committed offset) and scales to keep lag manageable.

Cron-Based Scaler

Scale workloads on a schedule (e.g., scale up before business hours).

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scheduled-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
    kind: Deployment
  minReplicaCount: 2
  maxReplicaCount: 50
  triggers:
  # Business hours: 9 AM - 6 PM weekdays
  - type: cron
    metadata:
      timezone: America/New_York
      start: "0 9 * * 1-5"
      end: "0 18 * * 1-5"
      desiredReplicas: "20"
  # Night and weekends
  - type: cron
    metadata:
      timezone: America/New_York
      start: "0 18 * * 1-5"
      end: "0 9 * * 1-5"
      desiredReplicas: "5"
  # Weekend
  - type: cron
    metadata:
      timezone: America/New_York
      start: "0 0 * * 0,6"
      end: "0 23 * * 0,6"
      desiredReplicas: "3"

At 9 AM ET weekdays, KEDA scales to 20 replicas. At 6 PM, it scales to 5. This saves cost during off-peak hours while maintaining responsiveness during business hours.

Scaling to Zero and Cold Start Implications

One of KEDA's powerful features: scale to zero. But cold starts have latency implications.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: eventless-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
    kind: Deployment
  minReplicaCount: 0  # Scale to zero
  maxReplicaCount: 50
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "..."
      queueLength: "1"
      awsRegion: "us-east-1"

When queue is empty, minReplicaCount: 0 scales all pods down. First message arrives, KEDA detects it, scales up a pod. But pod startup takes 10-30 seconds.

Mitigate cold starts:

Option 1: Keep warm pods:

minReplicaCount: 2  # Always keep 2 pods warm

Option 2: Optimize startup time:

apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  terminationGracePeriodSeconds: 30
  containers:
  - name: api
    image: api:v1
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      failureThreshold: 30
      periodSeconds: 2
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      periodSeconds: 5

The startupProbe allows up to 60 seconds for the container to be ready. Optimize app initialization (lazy loading, async setup).

Combining KEDA with HPA

HPA can work alongside KEDA. HPA scales on CPU/memory; KEDA scales on events.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dual-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
    kind: Deployment
  minReplicaCount: 3
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "..."
      queueLength: "5"
      awsRegion: "us-east-1"
  - type: cpu
    metricType: Utilization
    metadata:
      value: "70"

KEDA creates an HPA internally. Both triggers (SQS queue and CPU) scale independently. The maximum replica count from any trigger wins.

Prometheus Custom Metrics Scaler

Scale based on custom Prometheus metrics.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-server
    kind: Deployment
  minReplicaCount: 2
  maxReplicaCount: 50
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: custom_metric
      query: |
        sum(rate(api_requests_total[5m]))
      threshold: "1000"
    authenticationRef:
      name: prometheus-auth

Query any metric from Prometheus. If sum(rate(api_requests_total[5m])) > 1000, scale up. This enables sophisticated scaling based on business metrics (revenue/sec, orders/sec, etc.).

Production Tuning

pollingInterval: How often KEDA queries scalers. Lower values = faster response to load changes, higher CPU. Default: 30 seconds. For SQS: 30 seconds is reasonable.

cooldownPeriod: How long to wait after scale-down before trying again. Prevents flapping. Default: 300 seconds (5 minutes).

fallbacks: Fallback replica count if scaler fails.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: reliable-scaler
spec:
  scaleTargetRef:
    name: api-server
    kind: Deployment
  minReplicaCount: 2
  maxReplicaCount: 50
  pollingInterval: 15  # Faster response
  cooldownPeriod: 120  # Shorter cooldown for responsive scaling
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: "..."
      queueLength: "10"
  fallbacks:
  - failureType: "all"
    replicas: 5  # If scaler fails, maintain 5 replicas

Advanced HPA behavior (scale-up speed, scale-down speed):

advanced:
  horizontalPodAutoscalerConfig:
    behavior:
      scaleUp:
        stabilizationWindowSeconds: 0
        policies:
        - type: Percent
          value: 200  # Double replicas per period
          periodSeconds: 15
        - type: Pods
          value: 10
          periodSeconds: 15
        selectPolicy: Max  # Pick the policy that scales fastest
      scaleDown:
        stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
        policies:
        - type: Percent
          value: 50  # Cut replicas in half
          periodSeconds: 60

This scales up aggressively (double per 15 seconds) but conservatively scales down (half per 60 seconds, with 5-minute stabilization). This prevents thrashing.

Checklist

  • KEDA operator installed and running
  • ScaledObjects/ScaledJobs configured for all event-driven workloads
  • SQS queue length tuned (test with known queue sizes)
  • Kafka consumer lag scaler set with appropriate thresholds
  • Cron scalers configured for predictable load patterns
  • minReplicaCount balances cost vs cold start latency
  • Fallback replicas set to handle scaler failures gracefully
  • pollingInterval and cooldownPeriod tuned for stability
  • HPA behavior (scale-up/down policies) prevents flapping
  • Monitoring alerts on KEDA scaling decisions and failures
  • Cold start latency measured and acceptable for SLAs
  • Runbooks document how to manually override scaling

Conclusion

KEDA transforms event-driven workloads from static overprovisioning to dynamic, demand-responsive scaling. Queue-based scalers eliminate the guessing game of capacity planning. Combining KEDA with HPA ensures workloads scale on both event demand and resource utilization. Tune pollingInterval, cooldownPeriod, and HPA behavior carefully to balance responsiveness and stability. With KEDA, your clusters become leaner and more cost-efficient while maintaining SLAs.