Kubernetes RBAC in Production

I've seen Kubernetes RBAC controls access to the Kubernetes API, which is crucial for securing production clusters. To implement least-privilege RBAC, you need to systematically analyze what each workload and human operator actually needs.

The default service account in each namespace has no permissions by default in recent Kubernetes versions, but I've come across legacy configurations and some cluster setups that grant broad permissions to the default service account. This is a problem because applications that do not need Kubernetes API access should use a dedicated service account with no RBAC permissions.

Roles and ClusterRoles differ in scope, with Roles being namespace-scoped and ClusterRoles being cluster-scoped. I think application RBAC should use namespace-scoped Roles, so the permissions for a service account apply only within its namespace. ClusterRoles are better suited for cluster-wide resources and operators that need cluster-wide visibility, such as Prometheus or cert-manager.

For example, when using a tool like Prometheus, you would create a ClusterRole that grants get, list, and watch permissions on nodes and pods across all namespaces, and then bind that ClusterRole to the Prometheus service account. This allows Prometheus to scrape metrics from all nodes and pods in the cluster, without granting it unnecessary permissions.

Granting ClusterRole to namespace-level workloads is a common over-permissioning pattern that I've seen. Instead, you should use namespace-scoped Roles to limit the permissions of service accounts to their own namespace. This helps prevent unnecessary access to sensitive resources.

When creating Roles, it's also important to consider the specific verbs that are required for each operation. For instance, a deployment may only need update and patch permissions on its own config map, rather than full edit permissions. Using tools like kubectl auth can-i, you can determine the minimum set of permissions required for each service account, and then create Roles that grant only those permissions.

To implement the principle of least privilege in practice, you need to identify every service account in the cluster, determine what Kubernetes API operations each service actually performs, and create Roles with exactly those permissions. You can use audit logs or the kubectl auth can-i tool to determine the required permissions, and then remove unused ClusterRole bindings.

In one production cluster I managed, we had over 50 service accounts, each with its own set of permissions. By using namespace-scoped Roles and carefully limiting the permissions of each service account, we were able to reduce the number of ClusterRole bindings by over 70%. This significantly reduced the risk of a security breach, and made it easier to manage and audit our RBAC configuration.

The kubectl auth reconcile command helps synchronise the desired RBAC state, which is useful for ensuring that your RBAC configuration is consistent with your intended permissions. Regular RBAC audits using tools like rbac-audit or Rakkess can also help identify over-privileged accounts.

Human operator access should be time-limited and scoped by role. I think the common operator roles are read-only, developer, and platform admin, each with their own set of permissions. For example, a read-only role might have get, list, and watch permissions on namespaced resources for debugging purposes.

Using tools like Kubernetes RBAC manager, you can also automate the process of creating and managing Roles and ClusterRoles, and ensure that your RBAC configuration is consistent across all clusters and namespaces. This can help reduce the administrative burden of managing RBAC, and make it easier to scale your Kubernetes deployment.

Using Azure AD or OIDC integration can help you manage human operator access to your Kubernetes cluster, by tying Kubernetes RBAC to your corporate identity management. This way, you can avoid long-lived kubeconfig files with cluster-admin credentials on developer machines, which is a security risk.

I've seen this approach work well in practice, where developers are granted time-limited access to the cluster for debugging purposes, and then have their access revoked when they're done. This helps prevent unauthorized access to the cluster, and reduces the risk of a security breach.

I recommend avoiding long-lived kubeconfig files with cluster-admin credentials on developer machines, and instead using time-limited and scoped access for human operators. This helps prevent unauthorized access to your cluster and reduces the risk of a security breach.