Infrastructure as Code, Security Blind Spots, and the Messy Reality of DevOps

I’ve spent most of my career in infrastructure and reliability engineering, from years as an SRE at Google to leading teams at Facebook and now building Spacelift. Along the way, I’ve watched entire paradigms rise and fall, seen technologies mature, and learned some hard lessons about where DevOps and security meet—and where they clash.

The interesting thing is that despite all the noise, infrastructure as code (IaC) hasn’t really changed in a decade. Terraform and CloudFormation are still the foundations. Yes, new features get added, and yes, we now have layers of automation, wrappers, and commercial tooling, but the core workflow remains constant. That stability isn’t a weakness; it’s actually a strength. It means the skills you build are transferable across jobs and organizations, unlike in front-end development where the framework of the week can make yesterday’s expertise obsolete.

But stability doesn’t mean simplicity. The reality in most organizations is far messier than the marketing slides suggest. There is no linear progression from one technology to the next; instead, you find overlapping eras, conflicting tools, and processes that look beautiful on paper but fall apart in production. And when you add security into the mix, the friction becomes obvious.

The Ceremony of Audit

Let’s start with audits. In theory, audits are supposed to strengthen organizations by identifying weaknesses and validating controls. In practice, they often turn into ceremonies. Think of it as a school play: no one enjoys it, but everyone goes through the motions because regulators and customers demand the certification.

If your organization isn’t already in a decent place, audits don’t teach you much. They just become processes written for auditors rather than engineers. I’ve seen countless cases where teams follow the letter of the audit process while ignoring the spirit. People fix problems quickly under pressure, then backfill paperwork later to make it look like the official procedure was followed.

This doesn’t mean audits are useless. When teams have trust and good communication, audits can provide genuine learning opportunities. But to get there, the gap between stated policies and reality has to be small enough that findings are actionable rather than overwhelming.

Security Processes vs. Human Behavior

Security teams often design processes that are logical, comprehensive, and compliant—yet completely impractical when the production deployment is burning down at 3 a.m. on a holiday weekend. Humans under stress take the path of least resistance. If the secure path is the easiest path, people will follow it. If not, they’ll bypass it.

This is the reason why so many DevOps and security teams end up at odds. The security processes are optimized for auditors, not operators. In truth, procedures aren’t just about compliance. They’re also job insurance. If you can point to the plan you reviewed and the checklist you followed, you’re protected even if production still went down.

But the only way processes consistently work in real incidents is if they are simple, automated, and designed with the operator in mind.

Automation as a Lifeline

Humans are terrible at multitasking, yet incidents demand juggling multiple priorities. That’s why automation matters so much. Offload what you can to machines, whether it’s incident coordination, credential rotation, or infrastructure workflows. Automation doesn’t just reduce toil; it makes the secure way the easy way. If obtaining temporary credentials requires six approvals and three logins, people will find workarounds. If a tool can handle the process in seconds, suddenly compliance isn’t painful.

We’ve built flows that make secure operations simple, because the alternative is predictable, and usually looks like engineers doing “infra archeology,” trying to reconstruct what’s really running in production after drift and ad hoc fixes have accumulated. We’ve also built a new tool that makes it trivial to turn your day two operations into proper products.

Infra Archeology: When You Don’t Know What’s Running

The aforementioned infra archeology is what happens when your version control history no longer matches your production reality. Maybe someone ran Terraform locally and never checked in the code. Maybe hotfixes were applied manually. Maybe no one even remembers who owns a particular service.

I saw this at Google, where the rule was “if it isn’t checked in, it isn’t in production.” But rules without enforcement don’t hold. Eventually, sometimes the only way to figure out what a service did was to shut it down and see whose pager went off and why. For today’s IaC, I believe strongly in enforcing version control. If infrastructure changes aren’t checked in, they don’t run, period. Customers sometimes ask for exceptions, but for every one legitimate case, there are 99 that would turn into security nightmares. A strict stance saves everyone from painful archeology later. Infrastructure management of tomorrow may not have the same constraints but until it arrives and matures. I’d suggest that your production environment and Git become best buddies.

Context Is Everything in Logging

Another frequent blind spot is audit logging without context. It’s not enough to know that an API call was made at a certain time by a certain role. Without context, you end up playing detective, trying to guess intent from timestamps.

What’s really needed is the ability to backtrace from an API call all the way to the human and reason behind it. You need the run ID, the code version, the pull request, the JIRA ticket, the approver. That kind of chain transforms logs from raw data into an explanation of “why.” Without it, all you know is that something happened, and that’s almost useless.

The Dangers of What You Don’t See

Security professionals often over-engineer the visible parts of their systems while ignoring hidden weaknesses. I’ve seen AWS accounts locked down with meticulous IAM policies, only to be blown wide open by a vendor that required a single unscoped API key for programmatic access.

The biggest risks usually come from what you don’t see: shared credentials in Zapier, secrets passed around Slack, or third-party tools with poor credential hygiene. Those blind spots undermine months of careful architecture.

This is why eliminating static credentials and adopting federated identity is so important. Federated identity systems (like OIDC) remove the need for long-lived keys, yet adoption is slower than it should be simply because many engineers don’t know the concept exists. Awareness is the first step.

Policy as Code and Its Limits: Keep it Simple

Policy as code is a valuable tool, but it’s not a silver bullet. It acts as a force multiplier, enforcing guardrails and preventing obvious mistakes. But no tool replaces understanding. A policy written without comprehension is just another box to check. This applies broadly: whether you’re using Vault, IAM roles, or federated identity, you need to understand the underlying principles. Otherwise, you’ll miss the bigger risks—like vendors whose APIs are fundamentally insecure.

Complex processes invite workarounds. Beautifully designed procedures often fail because real humans don’t have the time or patience to follow them under stress. Every backdoor someone creates to bypass complexity becomes a permanent vulnerability. So, the lesson is, don’t overcomplicate. Design processes that humans will actually use. Favor simplicity, clarity, and automation over ceremony and bureaucracy.

Where AI Fits

Like every other tool, AI is another force multiplier. It can help validate assumptions, suggest optimizations, or flag anomalies. But it won’t save you if you lack the basics. It’s like using a phrasebook in a foreign country, where you might ask the question correctly, but if you don’t understand the answer, you’re still lost. AI won’t replace the need to learn security fundamentals.

Small Wins for Teams

Where should teams start? First, learn the concepts. Read the HashiCorp Vault manual. Understand why static credentials are dangerous. Learn how federated identity works. You’ll also want to eliminate static credentials everywhere you can. Even small steps (like replacing one API key with a scoped role) reduce risk. And keep it simple. Specifically, avoid ‘security theater’. Build processes that are usable in real-world stress.

Also, with regard to AI, automate wherever you can. Let machines handle toil so humans can focus on judgment. Finally, when it comes to vendors, pay attention. Your posture is only as strong as your weakest link. Don’t let someone else’s bad API design become your backdoor.

The Human Factor

At the end of the day, security is about people. Engineers under pressure won’t follow processes that are cumbersome or incomprehensible. Leaders who design systems must respect human limitations. Attention is the scarcest currency in technology.

The job of anyone building infrastructure, platforms, or security is to make the secure path the easiest path. If it isn’t easy, it won’t be followed.

That’s the messy reality of DevOps and security. Not a clean progression of tools, but a patchwork of systems, shortcuts, and trade-offs. Success doesn’t come from ceremonies or checklists; it comes from building processes that work for humans as well as auditors.