Distributed tracing is the observability tool that microservices force you to care about. Without it, debugging a latency spike that spans five services takes hours. With it, you can pin it down in minutes. Once you set it up properly, you'll wonder how you ever debugged anything without it.

The trace context problem

For distributed tracing to work, every request needs to carry trace context (a trace ID, parent span ID) through every hop. That means your HTTP clients include trace context in headers. Your message queue consumers read it from message headers. Every service creates a child span for each operation. The W3C Trace Context specification (finalized in 2020) defines the standard header format (traceparent, tracestate) so you're not locked into vendor-specific formats. Standards matter here.

Sampling costs money

At high volumes, tracing every single request gets expensive fast. You need a sampling strategy. Head-based sampling (1-10% of requests) is simple and cheap but you'll miss the latency spikes in the tail. Priority sampling is better: always capture requests with errors or high latency, propagate the sampling decision through the trace. Tail-based sampling happens at the collector: buffer all spans, make sampling decisions after the full trace arrives based on the outcome. OpenTelemetry Collector supports tail-based sampling and it's worth learning.

How to deploy Jaeger

Jaeger's all-in-one deployment is a single binary with in-memory storage. Perfect for development and testing. Production needs separation: a dedicated collector for high-throughput span ingestion, a separate query service for the UI, and a durable backend (Elasticsearch for full-text search, Cassandra for cost at scale). Once you hit hundreds of thousands of spans per second, the storage backend becomes your bottleneck.

Connect traces to logs

The magic moment is jumping from a trace in Jaeger directly to the matching log lines in Elasticsearch or Loki. To make this work: inject the trace ID into your structured logs so every line carries it. Use Serilog, NLog, or your language's structured logging. Then in Grafana, click a trace and jump to the correlated logs. It's a small detail that makes debugging microservices feel almost natural.