Terraform modules are the building blocks of infrastructure as code at scale. Done right, they eliminate duplication, enforce standards, and let teams self-service infrastructure in minutes. Done wrong, they become an unmaintainable maze of spaghetti HCL that nobody wants to touch. After building a 40+ module library used across multiple client engagements, here's how I approach module design.
Principle 1: One Module, One Purpose
The most common mistake is building "god modules" that try to do everything. A module that creates a VPC, subnets, NAT gateways, route tables, security groups, and an EKS cluster is too large. When something breaks, you can't reason about it. When requirements change, you're afraid to touch it.
I follow the Unix philosophy: each module does one thing well. My VPC module creates networking. My EKS module consumes the VPC outputs and creates the cluster. My RDS module takes subnet IDs and security group IDs as inputs. They compose together cleanly because their boundaries are clear.
Principle 2: Sensible Defaults, Full Override
Good modules work out of the box with zero configuration but let power users customize everything. I achieve this with default variable values that encode best practices:
variable "instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.medium"
}
variable "multi_az" {
description = "Enable Multi-AZ deployment"
type = bool
default = true
}
variable "backup_retention_period" {
description = "Days to retain backups"
type = number
default = 7
}
A new team can use the module with just a database name and credentials. An experienced team can override instance class, enable cross-region replicas, or customize parameter groups without forking the module.
Principle 3: Consistent Naming and Tagging
Every module should enforce a consistent naming convention and tagging strategy. I pass a context object through all modules that includes environment, project, team, and cost center:
module "vpc" {
source = "./modules/vpc"
context = {
environment = "production"
project = "payments"
team = "platform"
cost_center = "eng-001"
}
}
Inside each module, resources are named using this context: ${context.project}-${context.environment}-vpc. Tags are applied automatically. This consistency makes cost allocation, access control, and cleanup straightforward across hundreds of resources.
Principle 4: Test Everything
Untested modules are time bombs. I use a three-layer testing strategy:
- Static analysis:
terraform validate,tflint, andcheckovrun on every PR to catch syntax errors, best-practice violations, and security issues before anything gets deployed. - Unit tests: Terratest (Go) or pytest-terraform deploys the module to a sandbox account, validates the outputs and resource configurations, then tears everything down. Each module has its own test suite.
- Integration tests: Composition tests deploy multiple modules together (VPC + EKS + RDS) to validate they work as a system. These run nightly or before major releases.
The CI pipeline runs static analysis on every commit, unit tests on every PR, and integration tests on merges to main. This catches breaking changes before they reach production.
Principle 5: Version and Document
Modules should be versioned semantically (major.minor.patch) and published to a private registry. Consumers pin to specific versions so that module updates don't break their infrastructure unexpectedly:
module "vpc" {
source = "app.terraform.io/acme/vpc/aws"
version = "~> 2.1"
}
Each module includes a README.md generated by terraform-docs with input/output descriptions, usage examples, and architecture notes. I've found that if a module doesn't have a clear example, engineers will either use it wrong or build their own - neither outcome is good.
Principle 6: State Isolation
Never store all your infrastructure in a single state file. I organize state by environment and by service boundary. Each module invocation gets its own state file in S3 with DynamoDB locking:
backend "s3" {
bucket = "acme-terraform-state"
key = "production/vpc/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
This means a broken state for the VPC doesn't block deployments to EKS, and different teams can work on different services without stepping on each other's state locks.
Putting It All Together
A well-designed module library transforms infrastructure provisioning from a weeks-long process into a 30-minute self-service operation. The investment in testing, documentation, and versioning pays for itself many times over in reduced toil, fewer misconfigurations, and faster onboarding for new team members. Start small, iterate often, and resist the urge to abstract before you have at least three concrete use cases.