Azure Service Bus for Event-Driven Microservices

I've seen a lot of teams struggle with tightly coupled microservices. The promise of event-driven architectures is loose coupling and reliable communication, and Azure Service Bus is a solid tool for this. It's not magic, but the patterns for building production-quality systems with it are well-understood.

For distributing events to multiple interested parties, Service Bus topics with subscriptions are the way to go. Imagine an OrderService publishing an OrderPlaced message. The InventoryService, NotificationService, and FulfillmentService can all have their own subscriptions to that topic. Each service gets the message independently. You can even use subscription filter rules to narrow down what each service receives, like a notification subscription that only gets messages for premium customers.

In one production line we ran about 1,200 OrderPlaced events per second across three subscriptions. The filter rule that limited notifications to premium customers was a simple SQL expression, but we quickly learned that each additional rule adds a few milliseconds of evaluation on the broker. When we hit a spike to 2,000 msgs/sec the latency jumped from 30 ms to 120 ms per delivery, and the subscription backlog grew. The fix was to move the premium filter into the consumer code and keep the broker subscription broad, which cut the per‑message overhead back down. We also turned on the dead‑letter forwarding feature so that any message rejected by a filter landed in a dedicated queue we could monitor with Service Bus Explorer.

When you need to process messages in parallel and scale out your consumers, Service Bus queues with multiple instances are your answer. This is the competing consumers pattern. A message from the queue is delivered to only one consumer at a time, allowing for horizontal scaling. The key here is setting the lock duration correctly; it needs to be longer than your longest expected message processing time.

Implementing a multi-step saga with Service Bus involves a choreography approach. Each step publishes a message to the next step's queue upon success. If a step fails, it publishes a compensation message to undo prior work. The state of the saga is distributed across these messages, not held in a central orchestrator. This is wonderfully loosely coupled, but it does make tracing and debugging a challenge because the state is implicit in the message flow.

During a saga that coordinated payment, inventory reservation, and shipping we ran into a subtle race condition. The payment service would complete and publish a PaymentConfirmed message, but the inventory service sometimes took longer than the lock timeout we had set (45 seconds). The message would be abandoned, re‑queued, and the shipping service would see a duplicate PaymentConfirmed and try to ship twice. The cure was to enable automatic lock renewal in the .NET client and to embed a correlation ID in every message header. We also pushed the correlation ID into Application Insights custom dimensions, which let us stitch the whole saga together in a single trace. That extra instrumentation cost us a few extra bytes per message but saved us from costly duplicate shipments.

Service Bus has built-in handling for retries and failed messages. By default, it will attempt to redeliver a message up to ten times. If it still can't be processed, it goes to the dead-letter queue. You should tune this MaxDeliveryCount based on how often you expect transient failures and how long they typically last.

Monitoring the dead-letter queue is crucial. A growing queue depth signifies business operations that have failed. The reason a message ended up there, whether it exceeded the delivery count or expired its time-to-live, tells you how to approach fixing it.