The Kubernetes operator pattern enables extending Kubernetes with custom resources and controllers that manage complex application lifecycle. Building a production-grade operator requires understanding the reconciliation model.

The reconciliation loop

A Kubernetes controller implements a reconciliation loop: observe the desired state (from the custom resource spec), observe the actual state (from the cluster and external systems), and take actions to make actual match desired. The reconcile function is called when the watched resource changes and on a periodic resync. The loop must be: idempotent (running it multiple times on the same state produces the same result) and safe to run concurrently (the controller may receive multiple reconcile requests in quick succession).

Status subresource

Custom resources should use the status subresource to report the actual state of the managed resource: the Deployment generation it is managing, the current replica count, any error conditions. The status is updated by the controller after reconciliation, providing feedback to users about the current state versus the desired state. Conditions (a standard format for status conditions) provide machine-readable status for tooling and automation.

Finalizers for cleanup

When a custom resource is deleted, the operator may need to clean up external resources before allowing the Kubernetes object to be removed. Finalizers are strings in the resource's metadata.finalizers that prevent deletion until the finalizer is removed by the controller. The operator adds a finalizer on resource creation, performs cleanup when a deletion timestamp is observed, and removes the finalizer when cleanup is complete.

Controller-runtime testing

The controller-runtime envtest package provides a local Kubernetes API server and etcd for integration testing of controllers. The test environment starts a real Kubernetes API server, allowing tests to create custom resources, trigger reconciliations, and assert on the resulting state. Unit testing the reconcile function directly (without the Kubernetes API server) requires mocking the client interface, controller-runtime provides fake client implementations for this.