Back to Home
DevOps Agents

Terraform + AI: How Agents Are Making Infrastructure as Code Accessible

Diego Herrera

Creative technologist writing about AI agents in design and content.

April 30, 202613 min read

Infrastructure as Code solved a real problem: treating servers like cattle instead of pets. But it created a new one. Writing production-grade Terraform or Pulumi requires deep knowledge of provider A...

How AI Agents Are Democratizing Infrastructure as Code

The IaC Accessibility Problem Nobody Talks About

Infrastructure as Code solved a real problem: treating servers like cattle instead of pets. But it created a new one. Writing production-grade Terraform or Pulumi requires deep knowledge of provider APIs, state management, resource dependencies, and an ever-expanding syntax that changes with every release. The HashiCorp provider registry alone lists over 3,800 providers. The AWS provider has more than 900 resource types, each with dozens of attributes.

The result is a steep learning curve that locks infrastructure management behind a relatively small pool of experienced practitioners. A backend developer who needs a new S3 bucket and CloudFront distribution for their feature branch shouldn't have to spend two weeks learning HCL syntax and IAM policy structure. But they often do.

AI agents are changing this equation — not by replacing IaC expertise, but by compressing the gap between intent and working configuration. This article examines four specific areas where that compression is happening, what tools are driving it, and where the limitations still bite.


Natural Language to Terraform: From Intent to HCL

The most visible application of AI agents in IaC is translating plain English descriptions into working Terraform configurations. The premise is straightforward: describe what you want, get the code.

How It Works in Practice

Tools like HashiCorp's Terraform AI assistant (integrated into HCP Terraform), GitHub Copilot with Terraform extensions, and Amazon Q Developer all offer some form of natural language to HCL translation. The underlying mechanism is typically an LLM fine-tuned or prompted with Terraform documentation, provider schemas, and example configurations.

Here's what the workflow looks like with a tool like Copilot in VS Code. You write a comment describing your intent:

# Create a VPC with public and private subnets across 3 AZs
# Include a NAT gateway, internet gateway, and route tables
# The VPC should use 10.0.0.0/16 CIDR with /20 subnets

The agent generates a multi-resource configuration:

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "main-vpc"
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "main-igw"
  }
}

resource "aws_subnet" "public" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet("10.0.0.0/16", 4, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  map_public_ip_on_launch = true

  tags = {
    Name = "public-subnet-${count.index + 1}"
  }
}

resource "aws_subnet" "private" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet("10.0.0.0/16", 4, count.index + 3)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-subnet-${count.index + 1}"
  }
}

resource "aws_eip" "nat" {
  domain = "vpc"
}

resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id

  tags = {
    Name = "main-nat-gw"
  }
}

This is genuinely useful output. It gets the resource relationships right, uses cidrsubnet() for proper CIDR math, and follows AWS naming conventions. A developer who has never written Terraform can have a working VPC in minutes.

Where This Breaks Down

The problems emerge at the edges — which is exactly where production infrastructure lives.

State management is invisible. The generated code doesn't tell you about terraform import for existing resources, state locking, or remote state backends. A new user might run terraform apply against local state and create a mess that takes hours to untangle.

Security defaults are inconsistent. Some models generate S3 buckets with public access blocks; others don't. Security group rules sometimes include 0.0.0.0/0 ingress because the prompt didn't specify otherwise. The agent optimizes for "does this work" not "is this safe."

Provider version pinning is often missing. Generated code frequently lacks required_providers blocks with version constraints, which means your configuration might break when a provider releases a breaking change.

Complex modules get hallucinated. Ask for something niche — say, a Transit Gateway with route propagation across multiple VPCs and conditional attachment logic — and the model starts inventing attributes that don't exist. I've seen configurations referencing aws_transit_gateway_route_propagation.propagations with attributes that have never appeared in any version of the AWS provider.

The honest assessment: natural language to Terraform works well for standard patterns (VPCs, basic compute, storage, DNS). It degrades rapidly for anything involving complex networking, IAM policy composition, or multi-account architectures. Treat it as a starting point, not a finished product.


AI-Powered Code Review for Infrastructure

Generating infrastructure code is one problem. Reviewing it is another, and arguably more important. A misconfigured security group or an overly permissive IAM role can be the difference between a secure deployment and a breach.

What AI Review Actually Catches

Tools like Bridgecrew/Prisma Cloud, tfsec (now integrated into Trivy), Checkov, and newer AI-native tools like Snyk's AI-powered IaC scanning analyze Terraform configurations against security policies and best practices.

The traditional tools use static analysis with predefined rules. The AI-enhanced versions add contextual understanding. Here's the difference:

A traditional linter catches this:

resource "aws_s3_bucket" "data" {
  bucket = "my-data-bucket"
}

Flag: "S3 bucket does not have server-side encryption enabled." That's useful but mechanical.

An AI-enhanced reviewer can look at the broader context and flag more nuanced issues:

resource "aws_s3_bucket" "data" {
  bucket = "my-data-bucket"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
      kms_master_key_id = aws_kms_key.s3_key.arn
    }
  }
}

resource "aws_s3_bucket_public_access_block" "data" {
  bucket = aws_s3_bucket.data.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# But then later in the same configuration:
resource "aws_s3_bucket_policy" "data" {
  bucket = aws_s3_bucket.data.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = "*"
        Action    = "s3:GetObject"
        Resource  = "${aws_s3_bucket.data.arn}/*"
      }
    ]
  })
}

The AI agent can flag the contradiction: you've set block_public_buckets = true but then attached a policy with Principal = "*". The PublicAccessBlock will prevent the policy from taking effect, which might be intentional, but probably isn't. A static rule might flag each piece independently; an AI agent can reason about the interaction.

The Terraform Plan Review Pattern

The most powerful application I've seen is AI agents that review terraform plan output rather than just the source code. This catches a different class of issues — things that are syntactically valid but operationally dangerous.

Several teams are building internal tooling that pipes terraform plan -json output into an LLM with a prompt like:

Review this Terraform plan output. Flag any resources that will be 
destroyed and recreated (rather than updated in-place), any changes 
to IAM policies that expand permissions, and any modifications to 
production databases. For each flagged item, explain the risk and 
suggest a safer alternative.

This catches things like:

  • A change to an RDS instance parameter group that will trigger a database restart during business hours
  • An EC2 instance replacement because someone changed the AMI ID
  • A security group rule change that opens a port to the entire internet

The AI agent doesn't replace the human reviewer's judgment, but it surfaces the right information in the right context. Instead of scrolling through 500 lines of plan output, the reviewer gets a focused summary of the three changes that actually matter.

Limitations Worth Acknowledging

AI code review for IaC has real blind spots. The models struggle with:

  • Multi-file context. A security issue might span a VPC definition in networking.tf, a security group in compute.tf, and an IAM role in iam.tf. Current tools often analyze files in isolation.
  • Custom module internals. If your organization wraps AWS resources in internal modules, the AI can't see inside the module to verify what it actually provisions.
  • Intent. The model can't distinguish between "I deliberately want this port open to the internet because it's a public web server" and "I accidentally left this open." Context matters, and models don't have it.

Drift Detection and Remediation

Infrastructure drift — where the actual state of resources diverges from what's defined in code — is one of the most persistent operational headaches in IaC. Someone makes a manual change in the AWS console to debug an issue, forgets to update the Terraform code, and six months later nobody knows what the actual configuration is supposed to be.

Traditional Drift Detection

The standard approach is terraform plan run on a schedule, which compares the state file to real infrastructure. But interpreting the output requires expertise. A plan might show 47 differences, of which 45 are benign tag updates and 2 are critical security changes. Without context, they all look the same.

How AI Agents Improve This

AI agents add value in drift detection through classification and prioritization. Instead of presenting raw diff output, they categorize changes by severity and suggest remediation paths.

The emerging pattern looks like this:

import json
import subprocess
from anthropic import Anthropic

def detect_drift():
    """Run terraform plan and capture structured output."""
    result = subprocess.run(
        ["terraform", "plan", "-detailed-exitcode", "-json"],
        capture_output=True, text=True
    )
    
    changes = []
    for line in result.stdout.strip().split('\n'):
        try:
            event = json.loads(line)
            if event.get("@message") == "diff":
                changes.append(event["resource"])
        except json.JSONDecodeError:
            continue
    
    return changes

def classify_drift(changes):
    """Use an AI agent to classify and prioritize drift."""
    client = Anthropic()
    
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": f"""Analyze these infrastructure drift changes and classify each as:
            - CRITICAL: Security or compliance impact (IAM, networking, encryption)
            - OPERATIONAL: May cause service disruption if reverted
            - COSMETIC: Tags, descriptions, naming conventions
            - INTENTIONAL: Likely a deliberate manual override
            
            For each change, recommend: REVERT_TO_CODE, UPDATE_CODE, or INVESTIGATE.
            
            Changes:
            {json.dumps(changes, indent=2)}"""
        }]
    )
    
    return message.content[0].text

This is a simplified example, but the pattern is real. The AI agent examines each drift item, considers the resource type and the nature of the change, and produces a prioritized action list.

The Auto-Remediation Question

Some teams are going further: using AI agents to automatically generate the Terraform code changes needed to either revert drift or update the codebase to match reality. This is where things get interesting — and risky.

The safe version of this workflow:

  1. AI agent detects drift
  2. AI agent classifies the drift
  3. For "cosmetic" drift, AI agent generates a PR to revert the manual change
  4. For "operational" drift, AI agent creates a ticket with context and suggested fix
  5. For "critical" drift, AI agent pages an on-call engineer immediately

The dangerous version skips step 3-5 and auto-applies changes. Don't do this. AI agents can misclassify drift, and auto-reverting a "cosmetic" change that was actually an emergency hotfix is a career-limiting move.


Cost Optimization Through Intelligent Analysis

Cloud cost management is another area where AI agents are providing genuine value, particularly when integrated with IaC workflows.

The Problem: Cost Visibility in Code

Terraform configurations don't have price tags. You can't look at an aws_rds_instance resource and immediately know it costs $1,200/month. The resource definition says db.r6g.2xlarge, but translating that to cost requires knowledge of current AWS pricing, region-specific rates, reserved instance discounts, and usage patterns.

Tools in This Space

Infracost is the leading open-source tool for attaching cost estimates to Terraform. It parses HCL and produces cost breakdowns before you apply changes. The newer Infracost Cloud adds AI-powered recommendations on top of the raw estimates.

AWS Cost Explorer with Amazon Q integration can analyze your actual spend and correlate it with infrastructure changes, but it operates at the billing level rather than the code level.

Kion (formerly cloudtamer) and CloudHealth offer AI-driven recommendations that connect to your IaC pipelines.

What AI Agents Actually Optimize

The most effective cost optimization agents I've seen work at three levels:

1. Right-sizing recommendations based on utilization data

# Before: AI agent identifies this instance averages 12% CPU utilization
resource "aws_instance" "worker" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "m5.4xlarge"  # 16 vCPUs, 64GB RAM — $0.768/hr
}

# After: AI agent recommends downgrade
resource "aws_instance" "worker" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "m5.xlarge"   # 4 vCPUs, 16GB RAM — $0.192/hr
}

The AI agent correlates CloudWatch metrics with the Terraform resource to produce this recommendation. The savings: ~$420/month per instance. This is not new — AWS Compute Optimizer does this — but integrating it into the IaC workflow means the recommendation becomes a PR rather than a report that nobody reads.

2. Identifying unused or orphaned resources

AI agents can analyze your Terraform state and cross-reference it with actual usage metrics to find resources that are provisioned but idle. Detached EBS volumes, unused Elastic IPs, idle NAT Gateways, and empty RDS instances are common findings.

3. Spot and Savings Plan recommendations

More sophisticated agents analyze usage patterns across your infrastructure and recommend purchasing strategies. For example, identifying that your batch processing fleet consistently runs for 8 hours daily and recommending a mix of Savings Plans and Spot Instances.

The Honest Limitation

Cost optimization AI agents work best with historical data and stable workloads. They're much less effective at predicting costs for new architectures or handling bursty, unpredictable workloads. They also can't account for organizational constraints — that "unused" NAT Gateway might exist because of a compliance requirement, not because someone forgot to delete it.


The Current State: What Works and What Doesn't

Let me be direct about where we are in 2025.

What works well:

  • Generating boilerplate infrastructure (VPCs, basic compute, storage, DNS)
  • Catching common security misconfigurations in static analysis
  • Classifying and prioritizing drift events
  • Attaching cost estimates to Terraform plans
  • Generating documentation for existing infrastructure code

What doesn't work reliably yet:

  • Complex multi-resource architectures with intricate dependencies
  • State management strategies and migration planning
  • Multi-cloud configurations (models trained heavily on AWS, less so on GCP/Azure specifics)
  • Understanding organizational context and constraints
  • Generating correct IAM policies from scratch (the blast radius is too high to get wrong)

What's dangerous to delegate:

  • Auto-applying any AI-generated infrastructure change
  • Auto-reverting drift without human review
  • Letting AI agents manage state file operations
  • Blindly trusting AI-generated security group rules or IAM policies

The Path Forward

The most effective integration pattern I've seen treats AI agents as accelerators within a human-supervised workflow, not as autonomous operators. The developer writes a natural language description, the AI generates a starting configuration, the developer reviews and adjusts, the AI reviews the final code for security and cost issues, and a human approves the apply.

This isn't the fully autonomous future that some vendors pitch. But it's real, it works today, and it genuinely makes IaC more accessible without sacrificing the safety guarantees that make IaC valuable in the first place.

The gap between "I know what infrastructure I need" and "I can write production-grade Terraform to provision it" is shrinking. That's a meaningful change for the industry, even if it's less dramatic than "AI will replace DevOps engineers."

It won't. But it will make every DevOps engineer significantly more productive, and it will let developers outside the infrastructure team participate meaningfully in infrastructure decisions. That's the real democratization.

Keywords

AI agentdevops-agents