When you deploy .NET applications to Kubernetes, you'll find it's not quite the same as deploying, say, Node.js or Java. The .NET runtime itself has characteristics – its startup speed, how it manages memory, and its threading model – that mean you need to think about deployment patterns differently.

Optimizing your container images is a good place to start. For .NET, multi-stage Docker builds are key. You use a first stage with the SDK image to build your application, and then a second stage with a much smaller runtime image that only copies the published output. The ASP.NET Core runtime image is significantly leaner than the SDK. If you're aiming for the absolute smallest images, consider self-contained single-file publishing, basing your image on the runtime-deps image instead of the full runtime, and using ReadyToRun compilation. This trades a slightly larger binary for faster startup.

I ran into a nasty surprise when we first switched our CI pipeline to use the multi‑stage approach on a 2 GHz build agent. The SDK stage pulled a 1.5 GB image and the runtime‑deps stage was only 120 MB, but the copy step added about 30 seconds because the default Dockerfile didn't exclude the obj and bin folders. Adding a .dockerignore that skips those folders and enabling BuildKit cut the build time from 2 minutes to under 45 seconds. The trade‑off was a slightly larger image when we turned on self‑contained publishing – we went from 140 MB to 210 MB – but the pod started up 40 % faster under load, which mattered during our nightly scaling tests.

Speaking of startup time, .NET 5 has made big strides. With ReadyToRun compilation, an ASP.NET Core app can start in 100 to 200 milliseconds. Compare that to the 1 to 2 seconds it might take with just-in-time (JIT) compilation from a cold start. ReadyToRun pre-compiles the intermediate language (IL) into native code for the target platform, so there's no JIT overhead when the pod first fires up. This is crucial for Kubernetes workloads that need to scale out quickly; new instances should be ready to handle traffic within seconds, not minutes.

One thing I learned the hard way is that ReadyToRun only helps the first request; subsequent requests still hit the JIT for generic code paths. In a production cluster of 50 pods handling a burst of 5 k requests per second, we saw the CPU spike to 85 % during the first 10 seconds of a scale‑out event, even with ReadyToRun. The fix was to keep a small warm pool of pods using a pre‑stop hook that sleeps for a few seconds, giving the runtime a chance to JIT the hot paths. We also tuned the GC mode to Server and set the thread pool minimum to 4, which shaved another 15 ms off the median latency.

Kubernetes needs to know if your application is healthy and ready to receive traffic, and ASP.NET Core's health check framework is built for this. You can expose /health endpoints that Kubernetes probes can query. A liveness probe checks if your application process is running and hasn't frozen, usually by expecting a simple OK response. A readiness probe goes further, verifying that all your application's dependencies—like databases, caches, or upstream services—are accessible before Kubernetes starts sending traffic to the pod. The Microsoft.Extensions.Diagnostics.HealthChecks NuGet package offers ready-made implementations for common dependencies, including those on Azure.

The health endpoint can become a hidden source of instability if you let it call out to every downstream service. In one incident, a flaky Redis node caused the readiness probe to return 500 for a few minutes, and Kubernetes kept the pod in a NotReady state, starving the service of capacity. We re‑architected the check to only ping Redis with a 50 ms timeout and to fall back to a cached connection string. Adding a separate /health/live endpoint that only returns OK if the process is alive, and keeping /health/ready for dependency checks, gave us the granularity we needed. We also wrapped the external calls with Polly retries to avoid transient failures tripping the probe.

For .NET web APIs, the Horizontal Pod Autoscaler (HPA) works best when you feed it custom metrics. Relying solely on CPU utilization can be misleading for I/O-bound workloads. If your pods are waiting for database responses, their CPU might be low even though they're struggling. Better metrics include the request rate per pod, which you can get from Prometheus and the kube-metrics-adapter, or the queue depth for background worker pods. Business-level throughput metrics are even better. For event-driven workloads, projects like KEDA (Kubernetes Event-Driven Autoscaling) offer HPA capabilities based on metrics like Azure Service Bus queue depth or Kafka lag.