#disclaimer: Opinions are my own and not the views of my employer
We all know that code should be checked into version control. Terraform code is no different.
There are so many Git branching strategies available out in the wild, such as the popular Git flow and Github flow.
However, when it comes to collaborating on Terraform code, there is an additional element to manage and consider: Terraform State.
What is Terraform State?
Terraform requires some sort of database to map Terraform config to the real world. When you have a resource resource “aws_instance” “foo” in your configuration, Terraform uses this map to know that instance i-abcd1234 is represented by that resource”
Purpose of State from Terraform Documentation
When collaborating on Terraform code, HashiCorp recommends using Remote State to store State files remotely, outside of Version Control (Git). All team members need to have access to this remote state storage if they want to test or apply their code contributions.
Feature Branch Problem
When feature branches are worked on for a short period of time before merging into the main branch, there are usually no huge integration issues.
However, in the age of Continuous Integration/Continuous Delivery (CI/CD), multiple new features could be worked on in parallel and delivered whenever they are ready. This creates a problem for Terraform and Terraform State management.
Imagine this situation: Alice and Bob are collaborating on a Terraform code repository with the same development
remote backend using Amazon S3. Alice has a new requirement to introduce load balancing capabilities, so she checks out a new feature branch to work on it.
# existing resources
resource "aws_instance" "app_server" {
ami = "ami-0518bb0e75d3619ca"
....
}
# new load balancer resource by Alice
resource "aws_lb" "alice_test_lb" {
subnet_mapping {
...
}
Typically with feature branches, Alice would apply her feature branch changes and provision the load balancer. Terraform state in the remote backend would be updated with Alice’s load balancer.
At the same time, Bob also checks out another new feature branch to modify the existing EC2 instance to test a new AMI as part of immutable infrastructure operations.
# existing resources
resource "aws_instance" "app_server" {
ami = "ami-0fab0953c3bb514a9" # new AMI
....
}
When Bob is ready to test and apply his change, Terraform assesses that Alice’s load balancer had been removed in Bob’s code, generating a planned action to destroy Alice’s load balancer and recreate the existing EC2 instance with the new AMI ID.
This is highly undesirable and an absolute nightmare for integration testing and Terraform plan/apply.
1 State File, 1 Branch
Live Infra Terraform code should represent the live infrastructure provisioned and Terraform state files record the details of those live infrastructure. Every single Terraform state file should have exactly 1 source of truth and should only reference to exactly 1 Git branch.
Avoiding Integration Hell
I first got to realize the problems of feature branches when stumbling across Rod Hilton’s blog post.
Two engineers are happily working away making commit after commit to their own respective feature branches, but neither of their branches are seeing the other’s code. Even if they’re regularly pulling off mainline, they’re still only seeing the commits that make it into the main branch, not each others. Developer A merges their code into mainline, then Developer B pulls and merges theirs, but now they have to deal with tons of merge conflicts. Developer B might not be in the best position to understand and resolve those conflicts if they don’t fully understand what Developer A is doing, and depending on how long these branches have been alive, they might have tons of code to resolve.
A Branching Strategy Simpler than GitFlow: Three-Flow by Rod Hilton
Feature branches provides little value when it comes to integration testing and even more so when Terraform doesn’t work when there are multiple branches to modifying a single state file.
Commit Directly to Development Branch
The simplest way to prevent Terraform code from being Terraform applied on feature branches is to do away with feature branches completely.
In the age of Continuous Integration, pushing and pulling changes to and from the main development branch frequently avoids the inevitable integration hell of merging multiple weeks-old feature branches.
Instead of feature branches, make use of feature toggles to “hide” your features in plain sight until they are ready for testing. Terraform has good support of feature toggle implementation through the usage of count
, for_each
and conditional expressions.
Using the previous example, Alice and Bob could’ve pushed their code directly into the development branch with feature toggles and it would like something like this:
# feature toggles
locals {
lb_feature = false # change to true when ready
ami_feature = false # change to true when ready
}
# existing resources
resource "aws_instance" "app_server" {
ami = local.ami_feature ? "ami-0fab0953c3bb514a9" : "ami-0518bb0e75d3619ca"
....
}
# new load balancer resource by Alice
resource "aws_lb" "alice_test_lb" {
count = local.lb_feature ? 1 : 0
subnet_mapping {
...
}
Some good examples of Git branching strategies that do away with feature branches are the Three-Flow model, Cactus model and Trunk based development. I highly recommend to consider these Git branching strategies for your next Terraform project collaboration.
Some Caveats
Merge Directly into Dev Branch only
Merging directly from local into the main branch should only be for the development environment. Production and staging environments should have a different merge and release processes like some of the Git branching examples I’ve shared in the previous section. I am a strong advocate of having 1 branch per environment and I’ll share more on this in a future blog post.
Code Review Processes
Merging feature code directly into the development/main branch meant that the typical code review processes with pull requests wouldn’t be possible. If code reviews are a necessity, use short-lived feature branches solely for the purpose merging code. You can consider shifting your team’s code review processes to before releasing to your staging environment.
Terraform v0.12
Usage of count and for_each for feature toggles presents some challenges if your team is still on an older Terraform version. Modules do not have native count and for_each support in v0.12 and should be something to keep in mind. You would have to bake the feature toggle into the module itself by passing in true or false values to decide whether to create the resources within the modules. It’s not ideal, but still doable.
Native Module count and for_each is supported in Terraform v0.13 and higher.
Collaborators are Trusted
To commit directly into the main branch, all collaborators must be on the same team and trusted. This Git branching strategy is not suitable for public repositories.
Closing Thoughts
I’d like to end this lengthy blog post with this quote from Rod Hilton.
Thank you for taking the time to read this. Reach out to me if you would like to discuss further.