You Are Not Saved By IaC

Tech as a Tool for Simplifying Human Challenges

Technology exists to simplify human challenges, and as tech professionals, we must also leverage it to solve our own problems. One common area we deal with daily is Infrastructure as Code (IaC), raising frequent questions such as which tool is better—AWS CDK, CloudFormation, Serverless Framework, or Terraform?

However, we often overlook the foundational principles of IaC, like recovery, fast deployment, resiliency, and minimizing time-to-market (TTM). For instance, if you decide to implement a multi-regional failover, can it be deployed effortlessly? How quickly can you recover if your production environment, region, or accounts go down?

Common Roadblocks

To navigate the complexities of IaC, vigilance and discipline are essential. Let’s explore some common roadblocks and how to address them effectively:

  • Historical manual interventions

  • Lack of configurable and parameterized code

  • Lost secrets that cannot be restored

  • Hard dependencies between stacks

  • Circular dependencies

Human Actions

Human intervention is a regular part of our jobs—quick fixes to temporary issues often lead to future automation. Unfortunately, these "notes for later" sometimes get forgotten, turning into major pain points down the road. Identifying these recurring manual actions is key to saving time, effort, and frustration in the future.

Recommendations:

  • Use tags to identify automated resources.

  • Regularly explore untagged resources to detect those not yet automated.

  • Review generated IaC templates to find missing tags.

  • Foster a tech-driven culture within your team.

Configuration Shortcomings

One frequent issue in IaC is hardcoding variables in Stacks or Nested Stacks, which can complicate configuration management. Whether it's a queue name, topic name, or HTTP endpoint, manually searching through different IaC tools like CloudFormation or AWS CDK can slow you down. Centralizing all dependencies simplifies future changes—like shifting from "me.mycompany.com" to "me.mycompany.org"—by allowing you to quickly locate and update configurations.

Losing Secrets

Managing secrets securely is crucial. While storing sensitive information like API keys or credentials in a secret manager is helpful, what happens if your account gets lost? What about your partner’s lost secrets? What about the secrets you shared with your partners ? The solution is to maintain a backup of all necessary secrets outside the software environment, ideally in a dedicated vault—this is more of an organizational best practice.

Managing Dependencies

Dependencies in IaC can be categorized into three levels:

  1. Light Dependencies: Passed to the environment variables (e.g., Lambda), these won’t break your deployment but could affect testing and runtime.

  2. Soft Dependencies: Tied to infrastructure services but manageable—like subscribing to an SNS topic, though permission issues may arise from unautomated historical actions.

  3. Hard Dependencies: These will prevent deployment if not properly handled. For example, an EventBridge rule may require an EventBus that isn’t yet deployed. The key here is identifying priority stacks and documenting these relationships, often using dependency graphs or architecture diagrams.

Circular Dependencies

Over time, as requirements evolve, stacks can develop circular dependencies. Imagine planning a production release only to find that it fails due to a circular dependency between two stacks. For instance, Stack A may require a CloudFront distribution that needs an upstream domain name for CORS, but the record set is managed in another stack—leading to a deadlock.

To avoid such issues, actively manage and mitigate circular dependencies. Divide stacks if needed or apply predictable naming conventions. For example, using "products.mycompany.com" instead of introducing direct dependencies between stacks can eliminate such problems.


By proactively addressing these common challenges, we can build more resilient, efficient, and scalable infrastructure, reducing downtime and increasing the speed of recovery when issues arise.