Published on

OpenTelemetry Full Setup — Vendor-Neutral Observability for Node.js

Authors

Introduction

OpenTelemetry (OTel) is a vendor-neutral instrumentation standard. It auto-instruments popular libraries (Express, Postgres, Redis), captures custom spans, emits metrics, and pipelines data through the Collector before export. This post covers setup, semantic conventions, trace context propagation, and sampling strategies.

OTel SDK Auto-Instrumentation Setup

OpenTelemetry provides auto-instrumentation packages that hook into popular libraries without code changes. Initialize OTel before importing your app.

// tracing.ts - Initialize OTel
import { NodeSDK } from '@opentelemetry/auto-instrumentations-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { ConsoleSpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';

// Create resource describing your service
const resource = Resource.default().merge(
  new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-api',
    [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
    environment: process.env.NODE_ENV || 'development',
  })
);

// Initialize SDK with auto-instrumentation
const sdk = new NodeSDK({
  resource,
  instrumentations: [getNodeAutoInstrumentations()],
  traceExporter: new ConsoleSpanExporter(), // Replace with OTel Collector
});

sdk.start();

// Graceful shutdown
process.on('SIGTERM', async () => {
  await sdk.shutdown();
  process.exit(0);
});

export { sdk };

This setup auto-instruments:

  • HTTP/gRPC servers (Express, Fastify, etc.)
  • Database drivers (Postgres, MySQL, MongoDB)
  • HTTP clients (fetch, axios, undici)
  • Redis, Memcached, AWS SDKs
  • Message queues (Kafka, SQS, RabbitMQ)

No changes to your application code needed.

Custom Spans with Semantic Conventions

Auto-instrumentation handles standard flows. For business logic, create custom spans with semantic attributes.

// services/order-service.ts - Custom spans
import { trace, context, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('order-service');

export class OrderService {
  async createOrder(input: OrderInput): Promise<Order> {
    // Create named span
    return tracer.startActiveSpan('createOrder', async (span) => {
      try {
        // Set semantic attributes
        span.setAttributes({
          'order.user_id': input.userId,
          'order.total_amount': input.totalAmount,
          'order.item_count': input.items.length,
          'db.system': 'postgresql',
        });

        // Child span for validation
        const order = await tracer.startActiveSpan('validateOrder', (validationSpan) => {
          validationSpan.addEvent('order_validation_started');
          return validateOrder(input);
        });

        // Child span for database write
        const savedOrder = await tracer.startActiveSpan('saveOrder', async (dbSpan) => {
          dbSpan.setAttributes({
            'db.operation': 'insert',
            'db.table': 'orders',
          });
          return saveToDatabase(order);
        });

        span.setStatus({ code: SpanStatusCode.OK });
        return savedOrder;
      } catch (err) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: (err as Error).message,
        });
        span.recordException(err as Error);
        throw err;
      }
    });
  }
}

async function validateOrder(input: OrderInput): Promise<Order> {
  // Validation logic
  return {} as Order;
}

async function saveToDatabase(order: Order): Promise<Order> {
  // Database logic
  return order;
}

interface OrderInput {
  userId: string;
  totalAmount: number;
  items: Array<{ productId: string; quantity: number }>;
}

interface Order extends OrderInput {
  id: string;
  createdAt: Date;
}

Trace Context Propagation via HTTP Headers

Traces span multiple services. Propagate trace context via HTTP headers so downstream services link to the parent trace.

// middleware/trace-context.ts - Propagate across services
import { trace, context, propagation } from '@opentelemetry/api';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
import express from 'express';

const propagator = new W3CTraceContextPropagator();

export function traceContextMiddleware(req: express.Request, res: express.Response, next: express.NextFunction) {
  // Extract trace context from incoming request
  const extractedContext = propagation.extract(context.active(), req.headers);

  // Run handler in extracted context
  context.with(extractedContext, () => {
    next();
  });

  // Propagate context to downstream services
  res.on('finish', () => {
    const span = trace.getActiveSpan();
    if (span) {
      const traceHeaders = {};
      propagator.inject(context.active(), traceHeaders, (carrier, key, value) => {
        (carrier as Record<string, string>)[key] = value;
      });
    }
  });
}

// Usage in your Express app
import { createClient } from 'redis';

const redis = createClient();

export async function fetchUserData(userId: string) {
  const span = trace.getActiveSpan();

  // HTTP call to another service
  const response = await fetch(`https://auth-service.internal/users/${userId}`, {
    headers: {
      'traceparent': span?.spanContext().traceId || 'unknown',
    },
  });

  return response.json();
}

Metrics with OTel (Counters, Histograms, Gauges)

Metrics measure numeric values over time. OpenTelemetry provides three types:

  • Counter: Monotonically increasing value (requests, errors)
  • Histogram: Distribution of values (request duration, payload size)
  • Gauge: Point-in-time value (memory usage, active connections)
// metrics.ts - OTel metrics
import { metrics } from '@opentelemetry/api';
import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { OTLPMetricExporter } from '@opentelemetry/exporter-trace-otlp-http';

// Create meter
const meterProvider = new MeterProvider({
  readers: [
    new PeriodicExportingMetricReader({
      exporter: new OTLPMetricExporter({
        url: 'http://localhost:4318/v1/metrics',
      }),
    }),
  ],
});

const meter = meterProvider.getMeter('api-service');

// Counter: increment on events
export const httpRequestCounter = meter.createCounter('http.requests.total', {
  description: 'Total HTTP requests',
  unit: '1',
});

export const httpErrorCounter = meter.createCounter('http.requests.errors', {
  description: 'Total HTTP errors',
  unit: '1',
});

// Histogram: distribution of values
export const httpDurationHistogram = meter.createHistogram('http.requests.duration', {
  description: 'HTTP request duration',
  unit: 'ms',
});

// Gauge: point-in-time value
export const activeConnectionsGauge = meter.createObservableGauge('http.connections.active', {
  description: 'Active HTTP connections',
});

// Register gauge callback
meter.addBatchObservableCallback((observableResult) => {
  // Query current connection count
  const activeCount = getActiveConnectionCount();
  observableResult.observe(activeConnectionsGauge, activeCount);
});

// Usage in request handler
import express from 'express';

const app = express();

app.use((req, res, next) => {
  const start = Date.now();

  res.on('finish', () => {
    const duration = Date.now() - start;
    
    httpRequestCounter.add(1, {
      method: req.method,
      route: req.route?.path || 'unknown',
      status_code: res.statusCode,
    });

    if (res.statusCode >= 400) {
      httpErrorCounter.add(1, {
        status_code: res.statusCode,
      });
    }

    httpDurationHistogram.record(duration, {
      method: req.method,
      route: req.route?.path || 'unknown',
    });
  });

  next();
});

function getActiveConnectionCount(): number {
  // Implementation: count active connections
  return 0;
}

OTel Collector as Pipeline

The Collector receives telemetry, processes it, and exports to backends. This decouples your app from exporters.

# otel-collector-config.yaml - Production Collector
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    send_batch_size: 1024
    timeout: 10s
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  
  # Filter sensitive data
  attributes:
    actions:
      - key: http.request.body
        action: delete
      - key: db.statement
        action: hash_sha256

exporters:
  jaeger:
    endpoint: http://jaeger:14250
  
  prometheus:
    endpoint: 0.0.0.0:8888
    namespace: app_
  
  otlp:
    client:
      endpoint: localhost:4317

extensions:
  health_check:
    endpoint: 0.0.0.0:13133

service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [jaeger, otlp]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus, otlp]

Exporting to Jaeger/Tempo/Datadog

Configure your SDK to export to the Collector (or directly to backends).

// exporter-config.ts - Export to different backends
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { NodeSDK } from '@opentelemetry/auto-instrumentations-node';

// Export to OTel Collector (handles Jaeger, Tempo, Datadog)
const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318',
  headers: {
    'Authorization': `Bearer ${process.env.OTEL_AUTH_TOKEN}`,
  },
});

const metricExporter = new OTLPMetricExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318',
});

// Or export directly to Datadog
import { DatadogExporter } from '@opentelemetry/exporter-datadog';

const datadogExporter = new DatadogExporter({
  agentHost: process.env.DD_AGENT_HOST || 'localhost',
  agentPort: parseInt(process.env.DD_TRACE_AGENT_PORT || '8126'),
  service: 'api-service',
  env: process.env.NODE_ENV || 'development',
  version: '1.0.0',
});

const sdk = new NodeSDK({
  traceExporter: datadogExporter,
});

sdk.start();

Baggage for Cross-Service Context

Baggage carries metadata across service boundaries without spans. Use for user ID, request ID, or customer tier.

// baggage-context.ts - Propagate user context
import { baggage, context } from '@opentelemetry/api';

export function setUserBaggage(userId: string, customerId: string) {
  let b = baggage.getBaggage(context.active()) || baggage.createBaggage({});
  
  b = b.setEntry('user.id', { value: userId });
  b = b.setEntry('customer.id', { value: customerId });
  
  return baggage.setBaggage(b);
}

export function getUserBaggage() {
  const b = baggage.getBaggage(context.active());
  if (!b) return {};
  
  return {
    userId: b.getEntry('user.id')?.value,
    customerId: b.getEntry('customer.id')?.value,
  };
}

// Usage in middleware
import express from 'express';

app.use((req, res, next) => {
  const userId = req.user?.id || 'anonymous';
  const customerId = req.headers['x-customer-id'] as string || 'unknown';
  
  setUserBaggage(userId, customerId);
  next();
});

// Access in service
export async function processRequest() {
  const { userId, customerId } = getUserBaggage();
  console.log(`Processing for user ${userId}, customer ${customerId}`);
}

Sampling Strategies (Tail-Based for Errors)

Sampling reduces data volume. Tail-based sampling retains all error traces, useful for debugging.

// sampler-config.yaml - Collector tail-based sampling
processors:
  tail_sampling:
    policies:
      # Always sample errors
      - name: error_status
        status:
          status_codes:
            - ERROR
      
      # Always sample slow requests
      - name: slow_requests
        latency:
          threshold_ms: 1000
      
      # Sample 10% of successful requests
      - name: probabilistic
        probabilistic:
          sampling_percentage: 10
      
      # Never sample unimportant requests
      - name: filter_health_checks
        not_sample:
          metric_filters:
            - metric_name: http.request.duration
              attributes:
                - key: http.target
                  values:
                    - /health
                    - /metrics

Checklist

  • Initialize OTel with auto-instrumentation before app start
  • Use semantic conventions for attribute names
  • Propagate trace context via W3C headers across services
  • Create custom spans for business logic (validation, payment)
  • Export to Collector for flexibility in backends
  • Monitor Collector memory usage and buffering
  • Use tail-based sampling to retain error traces
  • Set up Jaeger/Tempo for trace visualization
  • Export metrics to Prometheus/Datadog for alerting
  • Test trace context propagation with multi-service request flows

Conclusion

OpenTelemetry eliminates vendor lock-in and provides observability foundations. Start with auto-instrumentation to get insights without code changes. Add custom spans for business workflows. Export via the Collector to gain flexibility in backends. Combined with metrics and logs, OTel provides complete visibility into your system.