The Kubernetes operator pattern, introduced by CoreOS in 2016, enables management of complex stateful applications using the same reconciliation model that Kubernetes uses for built-in resources.

What an operator does

A Kubernetes operator extends the Kubernetes API with a custom resource definition and a controller that manages the lifecycle of the corresponding resource. The controller observes the desired state (defined in the custom resource) and the actual state (what is running in the cluster) and takes actions to make actual match desired. This is the same reconciliation loop that Kubernetes controllers use for Deployments and StatefulSets.

When to build an operator

Operators are appropriate for: stateful applications with complex lifecycle management (databases, message queues, monitoring stacks), applications where the operational knowledge of how to upgrade, scale, and recover needs to be encoded in software, and abstractions that your organisation will use across many clusters. Building an operator for a simple deployment that a Helm chart handles adequately adds complexity without benefit.

kubebuilder and controller-runtime

The kubebuilder framework is the standard approach for writing Go-based Kubernetes operators. It generates scaffolding for the controller, reconciliation loop, and CRD schema from markers in the code. controller-runtime, the underlying library, provides the client-go wrappers, informers, and reconciliation queue that make operator development faster. The Operator SDK wraps kubebuilder with additional tooling for testing and publishing.

Production operator patterns

The patterns that production operators implement: idempotent reconciliation (the reconciler should produce the same result regardless of how many times it runs), appropriate use of finalizers (to run cleanup logic before resource deletion), status conditions (to communicate the resource's state to humans and other controllers), and event recording (for debugging and auditing). Operators that skip these patterns are harder to debug and operate.