By 2021, serverless had finally settled into a more practical form after years of hype. The use cases where it shines are well understood, as are the pitfalls to avoid.

Serverless functions excel in event-driven, stateless, and intermittent workloads such as processing uploaded files, responding to queue messages, or handling webhook callbacks. The pay-per-invocation model, automatic scaling, and elimination of idle capacity costs make serverless a more economical choice for these patterns.

In my experience, the event-driven model works particularly well with Amazon S3, where an uploaded object can trigger an AWS Lambda function to process the file, and then the result can be stored in a database or another S3 bucket. For instance, I've seen a serverless architecture where an image upload triggers a Lambda function to resize the image, which then stores the resized image in another S3 bucket, all without the need for a dedicated server.

Cold start latency, the initialisation time when a function handles its first request after a period of inactivity, remains the primary technical limitation of serverless. For .NET and JVM runtimes, cold start can add hundreds of milliseconds to the first response. To mitigate this, consider provisioned concurrency, lightweight runtimes, or designing user flows that tolerate the initial latency. I've worked with Node.js and Python runtimes, which have significantly lower cold start times, typically around 10-50 milliseconds. However, these runtimes are not always suitable for every workload, and the trade-off between cold start time and runtime performance must be carefully evaluated.

A serverless architecture that decomposes a workflow into many small functions connected by queues and event buses inherits the complexity of distributed systems without the benefits of stateful recovery mechanisms. Idempotency is crucial: each function must handle re-delivery of the same event. Distributed tracing requires structured trace context propagation, and dead letter queues must be monitored and processed. Tools like AWS X-Ray and New Relic can help with distributed tracing, but they add overhead and cost, which must be factored into the overall system design. In one system I worked on, we used Apache Kafka as the event bus, and we had to carefully tune the producer and consumer settings to avoid message loss and ensure reliable delivery.

For example, when designing a serverless workflow, it's essential to consider the retry policies and error handling mechanisms. A poorly designed system can lead to cascading failures, where a single failed function causes a chain reaction of failures downstream. I've seen this happen in production, where a simple error in a Lambda function caused a queue to grow indefinitely, leading to a significant increase in costs and system instability. To avoid this, it's crucial to implement retry policies with exponential backoff, and to monitor queue lengths and function error rates closely.

When to choose containers instead of serverless is clear: long-running background workloads, applications with tight latency requirements, workloads that need persisted in-memory state, and predictable high throughput workloads are better suited to containers. At sustained high throughput, a container running continuously at capacity is cheaper than paying per-invocation for the same volume. For instance, if you have a workload that requires 1000 requests per second, it's likely more cost-effective to run a container at capacity rather than paying for 1000 serverless invocations per second.