Azure Cosmos DB is a globally distributed, multi-model NoSQL database. Its unique capabilities are genuine; so is the operational complexity and cost that come with them.
Cosmos DB replicates data across any number of Azure regions transparently. Adding a read region is a portal toggle, no application changes required. The consistency model, which ranges from strong to eventual, dictates how the application experiences replication. Strong consistency guarantees reads reflect the latest write but increases latency. Eventual consistency offers the lowest latency but may serve stale data. Session consistency, the default, provides consistent reads within a single client session at a reasonable cost.
For example, I have seen strong consistency increase latency by as much as 20 milliseconds for a single region, and up to 100 milliseconds when spanning multiple regions. This may be acceptable for certain use cases, but for real-time applications, this added latency can be detrimental. On the other hand, eventual consistency can provide latency as low as 5 milliseconds, but at the risk of serving stale data. In one production scenario, we opted for session consistency, which provided a good balance between latency and data freshness, with an average latency of 15 milliseconds.
Cosmos DB charges in Request Units (RUs), a metric for compute, memory, and I/O used per operation. A single point read of a 1KB item costs 1 RU; a complex cross-partition query can consume hundreds of RUs. Understanding and optimizing RU consumption is crucial for cost management. Design patterns that minimize RU usage include preferring point reads over queries, denormalizing data to avoid cross-partition joins, and selecting partition keys that distribute reads evenly. We have used tools like Azure Monitor to track RU consumption and identify areas for optimization, often finding that a small change in the data model or query pattern can result in significant cost savings.
The partition key determines how data is distributed across logical partitions, and subsequently, physical partitions within Cosmos. A poor partition key choice, leading to hot partitions (where all writes target a single key) or sparse partitions, will limit throughput and increase costs. The partition key should possess high cardinality (many distinct values), be evenly distributed in access frequency, and align with the primary access pattern, typically the entity's owning ID. In one instance, we found that using a composite partition key consisting of the user ID and a secondary attribute reduced hot partitioning by 90%, resulting in a 30% decrease in RU consumption.
Cosmos DB justifies its complexity when active-active multi-region writes are a requirement, a feature few databases support. It's also suitable when global latency targets are below 10ms, or when its multi-model support for document, key-value, graph, and columnar data provides genuine flexibility. For workloads that don't demand these specific capabilities, Azure SQL, Azure Database for PostgreSQL, or even Redis can often meet requirements with lower cost and less operational overhead.