eBPF for Backend Engineers — Zero-Instrumentation Observability

Introduction

eBPF (extended Berkeley Packet Filter) runs sandboxed programs in the Linux kernel. For observability, eBPF intercepts syscalls, network packets, and kernel events without instrumenting your code. This post covers eBPF concepts, Cilium for networking, Hubble for service flows, and continuous profiling.

What eBPF Is (Kernel Programs Without Kernel Modules)
Cilium for Kubernetes Network Observability
Hubble for Service-to-Service Flow Visibility
Continuous Profiling with Parca/Pyroscope
bpftrace for Ad-Hoc Investigation
TCP Retransmit Tracing
Latency Profiling at Syscall Level
eBPF vs Sidecar Overhead Comparison
Checklist
Conclusion

What eBPF Is (Kernel Programs Without Kernel Modules)

eBPF programs:

Run in kernel (privileged context)
Are sandboxed (can't crash the kernel)
Hook into kernel events (syscalls, network packets, function calls)
Are verified before loading (safe to run)

Unlike kernel modules, eBPF programs don't require:

Recompiling the kernel
Rebooting the system
Kernel version matching

Example: Track all TCP connections without touching application code.

# List all TCP connections on the system
# No code instrumentation, no app restart
tc qdisc add dev eth0 ingress
tc filter add dev eth0 ingress bpf direct-action object-file tcptrack.o section trace_connect

Cilium for Kubernetes Network Observability

Cilium uses eBPF to replace iptables and observe all network flows in Kubernetes. It provides:

Network policy enforcement (no iptables needed)
Service load balancing (faster than kube-proxy)
Network observability (every packet is visible)

# cilium-values.yaml - Helm deploy Cilium
helm repo add cilium https://helm.cilium.io
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set ebpf.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

Cilium monitors:

Pod-to-pod traffic
Pod-to-external traffic
DNS queries
HTTP requests (L7 visibility)
TCP/UDP port scanning

Hubble for Service-to-Service Flow Visibility

Hubble (built on Cilium) visualizes service flows. Export to observability backends (Prometheus, Elasticsearch).

# service-flow.yaml - Observe traffic between services
# Install Hubble in Kubernetes
helm install hubble cilium/hubble-relay -n kube-system

# Hubble CLI: view traffic in real-time
hubble observe --pod-label app=api-server --output json

# Output:
# {
#   "timestamp": "2026-03-15T10:30:00Z",
#   "source": {"namespace": "default", "pod_name": "api-server-1"},
#   "destination": {"namespace": "default", "pod_name": "db-postgres-1"},
#   "l4": {"protocol": "tcp", "destination_port": 5432},
#   "verdict": "ALLOWED",
#   "bytes_sent": 4096
# }

Export to Prometheus:

# hubble-prometheus.yaml - Scrape Hubble metrics
apiVersion: v1
kind: ConfigMap
metadata:
  name: hubble-prometheus-config
data:
  prometheus.yml: |
    scrape_configs:
      - job_name: 'hubble-metrics'
        static_configs:
          - targets: ['localhost:6444']
        relabel_configs:
          - source_labels: [__address__]
            target_label: instance

Continuous Profiling with Parca/Pyroscope

Always-on profiling captures CPU usage at function granularity. eBPF enables zero-instrumentation profiling.

# parca-deployment.yaml - Deploy Parca for continuous profiling
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: parca-agent
spec:
  selector:
    matchLabels:
      app: parca-agent
  template:
    metadata:
      labels:
        app: parca-agent
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: parca-agent
        image: ghcr.io/parca-dev/parca-agent:latest
        args:
          - "--node=$(HOSTNAME)"
          - "--parca-address=parca:7070"
          - "--enable-cpu-profiling"
          - "--enable-memory-profiling"
        env:
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          privileged: true

Query Parca for bottlenecks:

# Find top CPU consumers
parca-cli query select '{cpu_profile}' --start='2h ago'

# Find memory leaks
parca-cli query select '{memory_profile}' --start='24h ago'

# Compare profiles (before/after optimization)
parca-cli diff \
  'select {cpu_profile}' \
  'select {cpu_profile}' \
  --start='2026-03-14T00:00:00Z' \
  --end='2026-03-15T00:00:00Z'

bpftrace for Ad-Hoc Investigation

bpftrace is a high-level tracing language. Write one-liners to investigate system behavior.

# Trace all file opens by nginx
bpftrace -e 'tracepoint:syscalls:sys_enter_open* /comm == "nginx"/ { printf("%s %s\n", comm, str(args->filename)); }'

# Trace slow syscalls (>10ms)
bpftrace -e '
  tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
  tracepoint:raw_syscalls:sys_exit { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }
'

# Find memory allocations by size
bpftrace -e '
  kprobe:kmalloc { @allocs[args->size] = count(); }
  END { print(@allocs); }
'

# Trace HTTP requests to a specific backend
bpftrace -e '
  kprobe:tcp_cleanup_rbuf { printf("Socket %p closed\n", args->sk); }
'

TCP Retransmit Tracing

High TCP retransmits indicate network problems. Use eBPF to detect and locate them.

# Monitor TCP retransmits in real-time
bpftrace -e '
  kprobe:tcp_retransmit_skb {
    printf("Retransmit: %s -> %s\n", 
      ntop(AF_INET, args->skb->__skb_basic_meta->dev_net->ns->inum),
      ntop(AF_INET, args->skb->__skb_basic_meta->transport_header)
    );
  }
'

# Export to Prometheus
# (bpftrace output → custom exporter → Prometheus)

Latency Profiling at Syscall Level

Identify bottlenecks by measuring time spent in syscalls.

# Syscall latencies
bpftrace -e '
  tracepoint:raw_syscalls:sys_enter {
    @syscall_start[tid] = nsecs;
  }
  tracepoint:raw_syscalls:sys_exit {
    if (@syscall_start[tid]) {
      @syscalls[strjoin(", ", @args)] = hist(nsecs - @syscall_start[tid]);
    }
  }
'

# Output:
# @syscalls[sys_epoll_wait]:
#   [0..1ms): 10234 |@@@@@@@@@@@|
#   [1..10ms): 5432  |@@@@@     |
#   [10..100ms): 123 |@         |
#   [100ms+): 5     |          |

eBPF vs Sidecar Overhead Comparison

Approach	CPU Overhead	Memory	Latency Impact	Deployment
eBPF	2-5%	100MB	< 1us	Kernel module
Sidecar (Envoy)	10-20%	1GB+	5-10us	Container per pod
Code instrumentation	5-15%	200MB	2-5us	Recompile app
eBPF + Sidecar	15-25%	1.1GB+	10us+	Both

eBPF wins on efficiency; sidecars win on control.

Checklist

Deploy Cilium for network observability in Kubernetes
Use Hubble to visualize service flows
Set up Parca for always-on CPU/memory profiling
Write bpftrace one-liners for quick investigations
Monitor TCP retransmits as a network health indicator
Profile syscall latency to find bottlenecks
Compare eBPF vs instrumentation (eBPF usually cheaper)
Test eBPF programs in staging before production
Monitor Cilium CPU overhead
Export Hubble/Parca data to long-term storage

Conclusion

eBPF provides observability without instrumenting code. Cilium and Hubble visualize network flows. Parca profiles CPU/memory continuously. For diagnosing production issues, bpftrace is unmatched. Start with Cilium + Hubble for network visibility; add Parca for continuous profiling; use bpftrace for investigations.