OpenTelemetry for AI Systems — Tracing LLM Calls, Token Usage, and Agent Loops
Trace LLM inference with OpenTelemetry semantic conventions. Monitor token counts, latency, agent loops, and RAG pipeline steps with structured observability.
1575 articles
Trace LLM inference with OpenTelemetry semantic conventions. Monitor token counts, latency, agent loops, and RAG pipeline steps with structured observability.
Deploy OpenTelemetry with auto-instrumentation, custom spans, metrics, and the Collector pipeline. Export to Jaeger, Tempo, or Datadog.
Complete OpenTelemetry setup for Node.js, auto-instrumentation, custom spans, trace propagation, OTLP export to Tempo/Jaeger, sampling strategies, and production alerting.
Eliminate dual-write problems with the outbox pattern. Learn polling publishers, CDC with Debezium, and building reliable event-driven systems.
A junior engineer with access to production and insufficient guardrails runs a database migration directly on prod. Or force-pushes to main. Or deletes an S3 bucket thinking it was the staging one. The fix isn''t surveillance — it''s systems that make the catastrophic mistake require extra steps.