// Terraform Guide · Hands-On

Terraform Hands-On Guide: State, Modules, Workspaces & Real AWS Examples

📅 Updated April 2026 · 📅 April 2026 ⏱ 13 min read 🏷 Terraform · IaC · AWS · DevOps

👨‍💻

Dhanush R

Senior DevOps Engineer · 4.5+ Years Experience · Bengaluru

Terraform is the dominant Infrastructure as Code tool in the industry. I use Terraform daily in production to provision and manage AWS infrastructure — VPCs, EKS clusters, RDS databases, ALBs, and IAM roles. This guide covers everything from the basic plan-apply workflow to production-grade module design, with real AWS examples you can use immediately.

How Terraform Works — The Core Model

Terraform uses a declarative model: you describe the desired end state of your infrastructure in HCL (HashiCorp Configuration Language), and Terraform figures out how to get there. It maintains a state file that maps your HCL declarations to real resources in the cloud. On each run, Terraform compares your HCL (desired state) against the state file (last-known actual state) and plans the changes needed to reconcile them.

    The three-step workflow:

terraform init — Download providers, initialise backend

terraform plan — Preview what will change (no changes applied)

terraform apply — Apply the planned changes to real infrastructure

Terraform State — The Most Important Concept

The state file (terraform.tfstate) is the most critical component of any Terraform setup. It is the source of truth mapping your HCL resource declarations to actual cloud resources (EC2 instance IDs, RDS endpoint URLs, etc.). Without the state file, Terraform cannot know what already exists and would try to create everything from scratch.

By default, state is stored locally as a JSON file. This is fine for solo learning but catastrophic in teams: two engineers running terraform apply simultaneously can corrupt the state, and local state means the team cannot share infrastructure management.

Remote State with S3 + DynamoDB

For every team environment, always use remote state. The standard AWS setup is S3 for storage and DynamoDB for state locking:

# backend.tf — Remote state configuration
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/eks-cluster/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true               # encrypt state at rest
    dynamodb_table = "terraform-locks"  # prevents concurrent applies
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"              # pin provider versions
    }
  }

  required_version = ">= 1.6.0"       # pin Terraform version
}

The DynamoDB lock table prevents two engineers from running terraform apply simultaneously. When any state-modifying operation starts, Terraform writes a lock record to DynamoDB. If another operation finds an existing lock, it waits or errors. This prevents race conditions that can corrupt state.

State security warning: Terraform state files often contain sensitive values in plaintext — database passwords, generated credentials, private keys. Restrict S3 bucket access with IAM policies. Enable S3 versioning to recover from accidental state corruption. Enable S3 access logging for audit trails.

Real AWS Infrastructure Example — VPC + EKS

# main.tf — VPC and EKS cluster
provider "aws" {
  region = var.region
}

# VPC with public and private subnets across 3 AZs
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.2"

  name = "${var.env}-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = var.env != "production"  # HA NAT only in prod
  enable_dns_hostnames   = true

  # Required tags for EKS to discover subnets
  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }
  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }
}

# EKS cluster
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.0.0"

  cluster_name    = "${var.env}-cluster"
  cluster_version = "1.29"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # IRSA — pods assume IAM roles without node-level credentials
  enable_irsa = true

  eks_managed_node_groups = {
    general = {
      min_size       = 2
      max_size       = 10
      desired_size   = 3
      instance_types = ["t3.medium"]
      capacity_type  = "ON_DEMAND"
    }
    spot = {
      min_size       = 0
      max_size       = 20
      desired_size   = 0
      instance_types = ["t3.large", "t3a.large"]
      capacity_type  = "SPOT"    # 70% cheaper for non-critical workloads
    }
  }
}

Modules — Building Reusable Infrastructure

Modules are the primary mechanism for reuse in Terraform. A module is just a directory of .tf files with defined input variables and output values. You can create internal modules for your organisation's standard patterns, or use published modules from the Terraform Registry.

# modules/rds/main.tf — Reusable RDS module
variable "identifier"      { type = string }
variable "env"             { type = string }
variable "instance_class"  { type = string }
variable "subnet_ids"      { type = list(string) }
variable "vpc_id"          { type = string }

resource "aws_db_instance" "this" {
  identifier           = var.identifier
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = var.instance_class
  allocated_storage    = 20
  max_allocated_storage = 100

  multi_az               = var.env == "production"  # HA only in prod
  deletion_protection    = var.env == "production"
  skip_final_snapshot    = var.env != "production"

  backup_retention_period = 7
  backup_window           = "03:00-04:00"

  tags = { Environment = var.env }
}

output "endpoint" { value = aws_db_instance.this.endpoint }

# Usage in root module:
# module "prod_db" {
#   source         = "./modules/rds"
#   identifier     = "prod-postgres"
#   env            = "production"
#   instance_class = "db.t3.medium"
#   subnet_ids     = module.vpc.private_subnets
#   vpc_id         = module.vpc.vpc_id
# }

Workspaces vs Directory Structure for Environments

Terraform workspaces allow multiple state files from the same configuration directory. The common pattern: use workspaces for identical environments (same infra, different sizing/counts). Use separate directories for structurally different environments (staging might not have some services).

In practice, most teams use separate directories per environment (environments/staging/, environments/production/) with shared modules. This provides better isolation — a broken production apply cannot affect staging state.

Interview Q&A

Q1: What is Terraform state and what happens if it is lost?

State is a JSON file mapping HCL resource declarations to real cloud resources (EC2 IDs, ARNs, etc.). Without state, Terraform cannot track existing resources — running apply would try to create everything from scratch, causing duplicate resources or failures on name conflicts. If state is lost without backup, you must import existing resources into a new state file using terraform import, resource by resource. Enable S3 versioning and never allow state files to be deleted. This is why S3 bucket deletion protection for the state bucket is mandatory.

Q2: How do you import existing infrastructure into Terraform?

Write the HCL resource block for the resource you want to import, then run terraform import aws_instance.my_instance i-1234567890abcdef0. Terraform reads the real resource from AWS and writes its state to the state file. The HCL must match the actual resource configuration exactly, or the next plan will show a diff. For large existing infrastructures, tools like terraformer can auto-generate both the HCL and the import commands by scanning your AWS account.

Q3: What is the difference between terraform taint and -replace?

terraform taint marks a resource for replacement on the next apply (deprecated in Terraform 0.15.2). The modern equivalent is terraform apply -replace="aws_instance.my_instance". Both force destruction and recreation of a specific resource, even if its configuration has not changed. Use when a resource has entered a broken state that Terraform cannot detect from configuration alone — for example, an EC2 instance with a corrupted OS or a broken Kubernetes node.

// More Guides

📖 DevOps ☸️ Kubernetes 🐳 Docker ⚙️ CI/CD 🗂️ Terraform 🐧 Linux 🌿 Git ☁️ AWS 📊 Prometheus

🗂️ Explore Terraform on the Interactive Mind Map

See how Terraform connects to AWS, Azure, Kubernetes, Ansible, and Vault — with real commands and interview Q&A.

Open Interactive Mind Map ← CI/CD Guide

🚀 Want the complete DevOps interview kit?

Full notes, Q&A cheat sheets, real commands — all tools covered.

💳 Get Complete DevOps Kit →

Terraform provisions the infrastructure — CI/CD pipelines deploy to it. Learn how to wire everything together with GitHub Actions & ArgoCD →

📩 Get Free DevOps Interview Notes

Cheat sheets, real commands, interview Q&As — free.

No spam · Follow @master.devops for daily tips

// Continue Learning

☁️AWS — Provision EKS, VPC, RDS with Terraform ☸️Kubernetes — Manage clusters provisioned by Terraform 🐧Linux — Run Terraform from the command line

Terraform State — The Complete Picture

Terraform state is the most important concept in infrastructure-as-code. Every terraform plan compares three things: your HCL files, real cloud infrastructure, and the state file. Misunderstanding state is the root cause of most production Terraform incidents.

  What state actually stores: Resource IDs (so Terraform can find and update existing resources),
  dependency metadata, and outputs for use by other modules. State does not store your HCL code —
  it stores the current deployed reality.

# backend.tf — remote state with S3 + DynamoDB locking
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/eks/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# DynamoDB table for state locking
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute { name = "LockID"; type = "S" }
}

Modules — Reusable Infrastructure

Modules encapsulate a complete logical component (VPC, EKS cluster, RDS instance) with configurable inputs and useful outputs. Most mature DevOps teams maintain an internal module registry.

# modules/vpc/main.tf
variable "vpc_cidr"    { default = "10.0.0.0/16" }
variable "environment" {}

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  tags = { Name = "${var.environment}-vpc" }
}

output "vpc_id"          { value = aws_vpc.main.id }
output "private_subnets" { value = aws_subnet.private[*].id }

# Consuming the module
module "vpc" {
  source      = "./modules/vpc"
  environment = "production"
  vpc_cidr    = "10.10.0.0/16"
}

module "eks" {
  source     = "./modules/eks"
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
}

Workspaces vs Separate Directories

Approach	When to use	Limitation
Workspaces	Identical infra, different environments	Hard to have environment-specific config
Separate dirs	Environments with significant config differences	Code duplication without shared modules
Terragrunt	Large organisations needing DRY multi-env IaC	Adds tool dependency and learning curve

Terraform Interview Questions & Answers

Q: What happens if two engineers run terraform apply simultaneously?

With local state: the second apply overwrites the first's state file causing corruption and potentially duplicated infrastructure. With remote state and DynamoDB locking: the first apply writes a lock record. The second finds the lock and either waits (with -lock-timeout) or exits with an error. Once the first completes it releases the lock. This is why remote state + DynamoDB locking is mandatory for any team environment.

Q: How do you import existing infrastructure into Terraform state?

terraform import resource_type.resource_name cloud_resource_id. Example: terraform import aws_instance.web i-0abc1234. This adds the existing resource to state. Important: import only adds to state — you must also write the HCL configuration manually that matches the imported resource, then run terraform plan to verify no unintended changes are planned. Terraform 1.5+ added import blocks in HCL that can generate configuration automatically.

Q: How do you handle secrets in Terraform?

Never store secrets in plaintext in Terraform code or state. Use environment variables for provider credentials. For application secrets, reference secret ARNs using aws_secretsmanager_secret or vault_generic_secret data sources rather than storing values directly. Mark sensitive variables and outputs with sensitive = true to prevent them appearing in plan output. Use server-side encryption on the remote backend. Consider SOPS for encrypting sensitive tfvars files in version control.

Q: What is the difference between data sources and resources in Terraform?

Resources (resource block) create, update, and delete infrastructure — Terraform manages their lifecycle. Data sources (data block) read existing infrastructure that Terraform does not manage — they are read-only lookups. Example: you might use a data source to look up a VPC ID by tag name (data "aws_vpc" "existing") and then pass it as input to a resource you are creating. Data sources are re-evaluated on every plan; resources are only changed when their configuration changes.

🔗 Related DevOps Topics

🐳 Docker ☸️ Kubernetes 🗂️ Terraform 🐧 Linux ☁️ AWS ⚙️ CI/CD 📊 Prometheus 🌿 Git 📖 DevOps 🗺️ Mind Map