Azure Functions for Event-Driven Workloads

I've seen a significant improvement in Azure Functions as a compute primitive for event-driven workloads over the past couple of years. Its integration with Azure Event Grid, Service Bus, and Cosmos DB change feed provides a robust event-driven programming model.

Durable Functions extend the capabilities of Functions with an orchestration model for stateful, long-running workflows. The orchestrator function defines the workflow, while activity functions perform the actual work. The Durable Functions framework handles state persistence, retries, and other complex patterns, making it possible to manage stateful workflows without a database.

A concrete example of Durable Functions in action is a workflow that processes a batch of images uploaded to Blob Storage. The orchestrator function can coordinate the processing of each image by activity functions, which can be written in different languages, such as C# or Python, and use different tools, like Azure Computer Vision or TensorFlow, to analyze the images and extract relevant information.

Another key aspect of Durable Functions is the trade-off between the number of partitions and the latency of the workflow. Increasing the number of partitions can reduce latency by allowing multiple activity functions to run concurrently, but it also increases the complexity of the workflow and the risk of errors. For instance, if a workflow has 10 activity functions and 5 partitions, the orchestrator function will need to manage 5 concurrent executions, which can be challenging to debug and maintain. In my experience, a good starting point is to use 2-3 partitions and adjust as needed based on the specific requirements of the workflow.

Azure Functions support a variety of triggers, including HTTP, Timer, and connections to Azure services like Blob Storage, Queue Storage, and Service Bus. Each trigger type handles polling, connection management, and message leasing automatically, allowing developers to focus on the function logic without worrying about the underlying integration details. For example, a Function triggered by a Service Bus queue can use the Azure Service Bus SDK to process messages in batches, which can improve performance by reducing the number of requests to the queue. However, this approach requires careful consideration of the batch size, as large batches can increase memory usage and small batches can increase the number of requests.

Output bindings enable Functions to write to Azure services without the need for explicit SDK code in the function body. For example, a Function triggered by a blob upload can write a message to a Service Bus queue by declaring an output binding, eliminating the need for Service Bus client code. However, output bindings can also obscure the dependencies of a function, making complex workflows more challenging to maintain. To mitigate this issue, I recommend using tools like Azure Monitor and Application Insights to track the dependencies and performance of each function, which can help identify bottlenecks and areas for improvement.

One key consideration when using Azure Functions is the Consumption plan's trade-offs. This plan scales to zero and charges only when functions are executing, but it comes with a cold start latency cost. When a function that has been idle receives a request, the Functions host must be loaded, the .NET runtime initialised, and any startup code executed before the function can handle the request. For .NET 3.1 and 5 functions, cold starts typically add 1-3 seconds. In my experience, this can be a significant issue for applications that require low latency, such as real-time analytics or live updates, where every second counts. For example, if an application requires a latency of less than 1 second, the Consumption plan may not be the best choice, and the Premium plan or App Service plan may be more suitable.

The Premium plan mitigates this issue by keeping instances warm, while the App Service plan is always-on. The right plan choice depends on the specific latency requirements of the application. For instance, if an application has a steady stream of requests, the App Service plan may be the most cost-effective option, as it eliminates the cold start latency and provides a fixed cost per instance. On the other hand, if an application has a variable workload with periods of high activity and idle time, the Premium plan may be a better choice, as it provides a balance between cost and performance.

The choice of plan also affects the cost of the application, as each plan has a different pricing model. The Consumption plan charges per execution, while the Premium plan charges per instance, and the App Service plan charges per instance per hour. For example, if an application has 1000 executions per hour, the Consumption plan may be the most cost-effective option, as it charges only for the actual executions. However, if the application has a high number of executions, the Premium plan or App Service plan may be more cost-effective, as they provide a fixed cost per instance or hour, regardless of the number of executions.