When it comes to container security, the smallest attack surface often begins with the smallest base image. Google's distroless images are a good example - they contain only the application runtime and its dependencies, leaving out shells, package managers, and other unnecessary system utilities. This makes it harder for an attacker who gains code execution in a distroless container to run a shell or install tools.
Another common minimal base image is Alpine Linux, often used for interpreted languages. For compiled binaries, a scratch image - essentially an empty base image - is the way to go. The key idea is to minimize what's in the base image to reduce the attack surface.
In my experience, the most challenging part of implementing minimal base images is managing dependencies and ensuring that the application still works as expected. Take, for instance, an application that relies on a specific package manager like pip or npm. In a distroless image, you might need to include the package manager and its dependencies to ensure that the application works correctly.
One way to mitigate this is to use a technique called 'vendor isolation', where you isolate the package manager and its dependencies from the rest of the system. This can be achieved using Docker's buildkit, which allows you to build images in isolation and reuse dependencies across multiple images.
One of the most fundamental security best practices for containers is to run them as non-root users. Most containers default to running as root, but you can explicitly override this in a Dockerfile with something like USER 1001. Kubernetes' PodSecurityContext also allows you to enforce non-root execution with settings like runAsNonRoot: true and runAsUser: 1001.
Even with non-root execution, there are still risks of escalation to host root privileges in non-default configurations. To mitigate this, container runtime seccomp profiles can be used to restrict the system calls a container can make.
Container image scanning tools like Trivy, Clair, and Anchore can identify known vulnerabilities in base image packages and application dependencies. Integrating these tools into your CI pipeline with policies that fail builds on high or critical CVEs helps catch vulnerabilities before they reach production.
In practice, this means that you need to carefully configure your CI pipeline to prioritize findings and focus on the most critical issues. For example, if you're using Trivy, you can specify a threshold of 10 or more high or critical findings to trigger a failed build. This helps you avoid false positives and focus on the most critical issues.
While static image scanning is crucial, it doesn't detect runtime attacks. Runtime security tools like Falco, Aqua Security, and Sysdig Secure monitor container behavior, watching for system calls, file access, and network connections. Rules can be set up to fire when a container exhibits unexpected behavior, such as spawning a shell or modifying sensitive files.
Runtime security is the final layer of defense after image hardening and admission control. It's essential to remember that container security is a layered problem, and each component - image build, runtime configuration, and orchestration layer - presents different attack surfaces that require distinct security measures.