eBPF Rewrites Linux

I've been following eBPF's progress and it's clear that 2021 was a breakout year. eBPF became the foundational technology for a new generation of Linux observability and networking tools, allowing safe, programmable code to run in the Linux kernel without modifying kernel source or loading kernel modules.

So what does eBPF actually do? It lets you load small programs into the Linux kernel and attach them to hooks, system calls, network events, function entry and exit points. The kernel verifies these programs before loading to ensure they can't crash the kernel or loop infinitely.

One of the key advantages of eBPF is its performance. Traditional tracing tools like ftrace or SystemTap have high overhead due to the context switches between userspace and kernel space. eBPF eliminates this overhead by running in kernel context, with near-zero overhead compared to userspace tracing. For example, I recall a project where we used eBPF to collect network metrics on a 100-node Kubernetes cluster. The eBPF program added less than 1ms of latency to packet processing, which was negligible compared to the existing iptables overhead.

Once loaded, these programs execute in kernel context with access to kernel data structures, right where events occur, and with near-zero overhead compared to userspace tracing. This capability has significant implications for tools like Cilium, which uses eBPF to implement Kubernetes network policies and service mesh without sidecar proxies.

Traditional Kubernetes networking relies on iptables rules that are evaluated sequentially for every packet. eBPF replaces iptables with kernel-bypass code paths that operate at line rate, making a significant performance difference at cluster scale. The fact that iptables doesn't scale linearly with rule count is a major limitation. For instance, on a cluster with 1000 pods and 10000 iptables rules, the evaluation time increased by a factor of 10. By contrast, eBPF-based Cilium network policies scaled linearly, with minimal performance degradation even at large scale.

Another tool that takes advantage of eBPF is Pixie, which collects telemetry from Kubernetes workloads without any application-level instrumentation. By intercepting system calls, Pixie captures HTTP, gRPC, SQL, Redis, and Kafka traffic, producing request and response traces and latency metrics for all services automatically.

For organisations that haven't instrumented their applications, Pixie provides immediate visibility with no code changes. This is a huge advantage, especially for those with complex, distributed systems. The ability to gain insight into system performance without modifying the application code is a major benefit. However, it's worth noting that Pixie relies on eBPF's ability to intercept system calls, which can be affected by kernel version and configuration.

However, running eBPF programs in kernel space comes with production considerations. While the verification step prevents crashes, a poorly written program can still produce high CPU overhead. The tooling for debugging eBPF programs is also less mature than userspace debugging, which can make troubleshooting more difficult.

Additionally, eBPF requires a recent Linux kernel, at least 5.x for the most capable features, which limits adoption in organisations running older kernels on long-support distributions. Despite these challenges, the benefits of eBPF make it an attractive technology for improving Linux observability and networking.