Kafka has become the default choice for event streaming. It's battle-tested, widely understood, and powerful. But running Kafka in production is where you separate the teams who've thought things through from the teams who are still learning. The patterns are well established by 2021, but knowing which ones to apply when makes the difference between a smooth system and a nightmare at 3am.

The topic partition model

Kafka parallelism is driven by partition count: a topic with N partitions can be consumed in parallel by N consumers in a consumer group. Choosing partition count at topic creation is an important decision, it is difficult to change without disrupting consumers. The rule of thumb: partition count should be at least equal to the maximum number of consumers you expect to run in parallel for that topic. Over-partitioning has costs: more open file handles, longer leader election on failure.

Consumer group offset management

Kafka consumers track their position in each partition via committed offsets. At-least-once delivery (the safe default) means a consumer may process a message more than once if it fails after processing but before committing the offset. Every Kafka consumer must be designed for idempotent message processing. At-exactly-once semantics are available with transactions but add significant complexity; most production systems use at-least-once and handle idempotency at the application layer.

Confluent Cloud and managed Kafka

Operating Kafka requires expertise: partition rebalancing, broker scaling, disk management, and version upgrades are non-trivial operational tasks. Confluent Cloud (Confluent's managed Kafka), Amazon MSK (managed Kafka on AWS), and Azure Event Hubs with Kafka-compatible API offer managed alternatives. The economics favour managed services for organisations without Kafka-specialist operations teams.

Schema Registry for contract enforcement

The Confluent Schema Registry enforces schema contracts for Kafka messages using Avro, Protobuf, or JSON Schema. Producers register their schema; consumers validate incoming messages against the registered schema. Schema evolution rules (backward compatible, forward compatible, full compatible) are enforced by the registry, preventing producers from publishing messages that would break existing consumers. This is essential infrastructure for Kafka topics shared across teams.