Reusable Terraform Modules on Azure

The AzureRM Terraform provider stands out as one of the most actively maintained cloud providers in the Terraform ecosystem. Over time, the patterns for building reusable and maintainable infrastructure modules have evolved alongside it.

When building Terraform modules, it's essential to assign a single, clear responsibility to each module. For instance, a module designed for an AKS cluster might encompass node pool configuration, identity setup, monitoring integration, and network configuration. Similarly, a module for an Azure SQL database could cover the server, database, firewall rules, and diagnostic settings. By composing these modules in a top-level configuration, you can construct a complete environment from independently testable components.

In practice we treat every module as a versioned package stored in a private Git repository and published to the Terraform Registry as a VCS source. By tagging releases with semantic versions—say v1.3.2—we can pin the module version in consuming configurations and avoid accidental breakage when a downstream change adds a new optional variable. Provider version pinning is equally important; in our production pipelines we lock azureRM to 2.93.0 and azuread to 2.30.0, which prevents subtle API changes from surfacing mid‑release. The downside is that upgrading requires a coordinated plan across all environments, but the predictability it gives during a disaster‑recovery drill outweighs the occasional upgrade friction.

Terraform's variable validation block allows you to define constraints that are evaluated before the plan runs. For example, in a module that accepts a VM SKU variable, you can implement a validation check to ensure the SKU is within a set of approved sizes. This helps catch misconfigurations before they reach Azure. Coupled with variable descriptions, this validation provides the necessary documentation and constraint checking, making a module usable without requiring its implementation to be read.

Remote state management is another hidden source of pain. We store the state file in an Azure Storage Account with a dedicated container, enable server‑side encryption, and use the built‑in blob lease mechanism for locking. As the number of environments grew to 45, the state file swelled to roughly 12 MB, and we started seeing plan times double from 8 seconds to 15 seconds because Terraform had to download and lock the larger blob each run. To mitigate this we split the state by logical boundaries—network, compute, data—so each file stays under 4 MB and the lock contention drops dramatically. The trade‑off is an extra layer of configuration, but it saved us from occasional state corruption that would have required a full restore from backup.

Terratest, a Go library, facilitates testing Terraform configurations. A Terratest test provisions actual infrastructure, executes assertions against it, and then destroys it. Although these tests are slow due to infrastructure provisioning taking several minutes and incur costs as real resources are created, their value is directly proportional to the module's reuse. For instance, a module utilized across 20 environments justifies the investment in automated testing that verifies its correct functioning after changes.

In addition to full‑stack Terratest runs we run a lighter suite that parses the generated plan file and feeds it into tfsec and Open Policy Agent checks. The plan‑only approach costs virtually nothing and runs in under a minute, catching 70 % of the security regressions we care about. When a change triggers a failure in the full Terratest run, we have a clear signal that the issue is not just a policy violation but a real runtime problem, such as a missing role assignment that only surfaces after the resource is created. Balancing the cheap static analysis with the expensive end‑to‑end test keeps our CI cost under $0.30 per PR while still giving us confidence for high‑impact modules.

A well-structured CI/CD pipeline for infrastructure involves triggering terraform plan upon a PR, posting the plan output as a comment on the PR for review, and then triggering terraform apply against the target environment upon merge. The state file is stored in Azure Blob Storage with locking. Additionally, Sentinel policies or custom scripting with Open Policy Agent can enforce constraints on the plan before apply is permitted, ensuring a controlled and compliant infrastructure deployment process.