When you split a monolith into microservices, communication becomes one of your hardest problems. Services need to talk to each other reliably. Get this wrong and you end up with slow, fragile systems that are harder to debug than the monolith you replaced.

Here are the strategies that matter.

Service discovery

In a static deployment, you hardcode service addresses. That breaks as soon as services scale up, restart, or move. Service discovery solves this. Services register themselves on startup and look up others by name. Kubernetes has this built in through its DNS and Service abstraction. For non-Kubernetes environments, Consul or Netflix Eureka are common choices.

API gateway

As the number of services grows, you don't want every client to know about every service. An API gateway is the single entry point. It handles routing, load balancing, authentication, and protocol translation. Kong, Ambassador, and Azure API Management are all solid options. The gateway also gives you one place to enforce rate limiting and collect traffic metrics.

Communication protocols

HTTP/HTTPS is the default for synchronous communication between services. Simple and works with most tooling. gRPC is a better choice when you need high throughput or are calling across language boundaries. It uses Protocol Buffers for serialization and is significantly faster than JSON over HTTP. For async communication, use a message broker. Apache Kafka handles high-volume event streams well. RabbitMQ is simpler and works for most workloads.

Synchronous vs. asynchronous

Synchronous communication is simple but creates tight coupling. If Service B is slow or down, Service A blocks. Use it when you need an immediate response. Async messaging decouples services. Service A publishes an event, Service B processes it when ready. This improves resilience and throughput but makes debugging harder because cause and effect are no longer in the same trace.

Circuit breaker

In distributed systems, failures cascade. If Service B is slow, Service A piles up requests. Eventually Service A runs out of threads and fails too. The circuit breaker pattern stops this. When a service repeatedly fails, the circuit "opens" and calls fail fast instead of waiting for a timeout. After a cooldown, it lets a few calls through to check recovery. Netflix Hystrix and resilience4j implement this for JVM services. Polly is the .NET option I use at Microsoft.

Observability

Distributed systems are hard to debug without good tooling. Set up distributed tracing with Jaeger or Azure Application Insights so you can follow a request across service boundaries. Use Prometheus and Grafana for metrics. Structure logs with correlation IDs so you can tie entries from different services to the same request.

Security

Never trust inter-service communication by default. Use mutual TLS (mTLS) so services verify each other's identity. OAuth 2.0 with JWT tokens for authorization. RBAC to limit what each service can do. Encrypt everything in transit. In Kubernetes, network policies let you restrict which pods can talk to which.

Good microservice communication design means being deliberate about each of these decisions, not leaving them as an afterthought.