← Home πŸ—ΊοΈ Mind Map β˜• Ko-fi πŸ’³ Razorpay
// Terraform Guide Β· IaC & Cloud

Terraform Complete Guide 2026: State, Modules, CI/CD, Multi-Account & Interview Q&A

πŸ“… Updated May 2026 ⏱ 23 min read 🏷 Terraform Β· IaC Β· AWS Β· DevOps Β· Cloud
πŸ‘¨β€πŸ’»
Dhanush R β€” Senior DevOps Engineer
4.5+ years writing production Terraform β€” EKS clusters, multi-account AWS organisations, VPC architectures, and CI/CD pipelines. Every pattern here is from real-world usage, not documentation paraphrasing.
// Table of Contents
  1. What is Terraform and Why Is It the Standard?
  2. Core Concepts β€” State, Plan, Apply, Destroy
  3. Terraform State β€” The Most Critical Concept
  4. Variables, Outputs, and Locals
  5. Modules β€” Reusable Infrastructure Components
  6. Workspaces vs Separate State Files
  7. Terraform in CI/CD β€” The GitOps Workflow
  8. Production Best Practices
  9. Common Terraform Errors and Fixes
  10. 12 Terraform Interview Questions with Expert Answers

Terraform is the industry-standard Infrastructure as Code tool for DevOps and platform engineering. I use Terraform daily in production β€” provisioning EKS clusters, VPCs, RDS instances, IAM roles, and CloudFront distributions across multiple AWS accounts. This guide covers Terraform from core concepts to advanced patterns used in real production environments, with the depth needed to pass senior DevOps interviews.

What is Terraform and Why Is It the Standard?

Terraform by HashiCorp lets you define cloud infrastructure as declarative HCL (HashiCorp Configuration Language) code. You describe the desired end state of your infrastructure, and Terraform figures out how to create, update, or delete resources to reach that state. This is fundamentally different from scripting (Bash, Python boto3) where you write imperative steps β€” Terraform handles the dependency resolution, ordering, parallelisation, and state tracking for you.

Terraform won the IaC landscape because of three things: provider ecosystem (2,000+ providers covering every cloud, SaaS tool, database, and monitoring platform), the plan/apply workflow (you see exactly what will change before it changes β€” no surprises), and the module system (reusable, versioned infrastructure components that standardise patterns across teams). AWS CloudFormation covers AWS only. Pulumi is a code-based alternative but has a smaller community. Terraform has the largest ecosystem, most job postings, and most production deployments.

Key insight: Terraform is declarative but not magic. It only knows about resources it created and tracks in its state file. Resources created outside Terraform are invisible to it unless imported. This is why "Terraform drift" (someone manually changes a resource in the AWS console) is dangerous β€” Terraform will revert it on the next apply. Always enforce a policy: all infrastructure changes go through Terraform.

Core Concepts β€” State, Plan, Apply, Destroy

Terraform's workflow has four phases that you must understand deeply for interviews:

Terraform State β€” The Most Critical Concept

Terraform state is the source of truth for what Terraform has created. It maps your HCL resource definitions to real infrastructure resource IDs. Without state, Terraform cannot know what already exists and would try to create everything from scratch on every apply, causing duplicate resources and errors.

Never store Terraform state locally in production. Local state is not shared between team members, is lost if the developer's machine is lost, and cannot support locking (preventing simultaneous applies). Always use remote state with state locking:

# backend.tf β€” S3 remote state with DynamoDB locking terraform { backend "s3" { bucket = "company-terraform-state" key = "production/eks-cluster/terraform.tfstate" region = "ap-south-1" encrypt = true # encrypt state at rest with KMS kms_key_id = "alias/terraform-state-key" dynamodb_table = "terraform-state-locks" # prevents concurrent applies } } # Create the S3 bucket and DynamoDB table for state (bootstrapping) resource "aws_s3_bucket" "tf_state" { bucket = "company-terraform-state" } resource "aws_s3_bucket_versioning" "tf_state" { bucket = aws_s3_bucket.tf_state.id versioning_configuration { status = "Enabled" } } resource "aws_dynamodb_table" "tf_locks" { name = "terraform-state-locks" billing_mode = "PAY_PER_REQUEST" hash_key = "LockID" attribute { name = "LockID"; type = "S" } }
State manipulation warning: Never directly edit the state file. If you need to fix state issues, use terraform state subcommands: terraform state list (see all tracked resources), terraform state show resource (inspect a resource's state), terraform state mv (rename a resource in state without destroying it), terraform state rm (remove a resource from state without destroying the real resource), and terraform import (import an existing resource into state). These operations require careful execution β€” always backup state before manipulation.

Variables, Outputs, and Locals

# variables.tf β€” typed, validated, documented inputs variable "environment" { description = "Deployment environment: dev, staging, or production" type = string validation { condition = contains(["dev","staging","production"], var.environment) error_message = "Environment must be dev, staging, or production." } } variable "eks_node_instance_types" { description = "EC2 instance types for EKS managed node group" type = list(string) default = ["m6i.large"] } variable "db_password" { description = "RDS master password β€” provide via TF_VAR_db_password env var" type = string sensitive = true # never logged, never shown in plan output } # locals.tf β€” computed values derived from variables locals { name_prefix = "${var.environment}-${var.project}" common_tags = { Environment = var.environment Project = var.project ManagedBy = "terraform" Owner = "platform-team" } } # outputs.tf β€” values exposed for use by other modules or CI/CD output "eks_cluster_name" { description = "EKS cluster name for kubectl config update" value = module.eks.cluster_name } output "rds_endpoint" { description = "RDS instance endpoint for application config" value = aws_db_instance.main.endpoint sensitive = true }

Modules β€” Reusable Infrastructure Components

Modules are Terraform's code reuse mechanism. A module is a directory of .tf files with a defined interface (input variables and output values). The same module can be called multiple times with different inputs to create similar infrastructure for different environments or teams. Well-designed modules reduce duplication, enforce standards, and make large Terraform codebases manageable.

# Calling a VPC module from the Terraform registry module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 5.0" # pin to major version, allow patch updates name = "${local.name_prefix}-vpc" cidr = "10.0.0.0/16" azs = ["ap-south-1a","ap-south-1b","ap-south-1c"] private_subnets = ["10.0.10.0/23","10.0.12.0/23","10.0.14.0/23"] public_subnets = ["10.0.0.0/24","10.0.1.0/24","10.0.2.0/24"] enable_nat_gateway = true single_nat_gateway = false # one NAT GW per AZ for HA enable_dns_hostnames = true tags = local.common_tags public_subnet_tags = { "kubernetes.io/role/elb" = "1" # required for ALB Ingress Controller } private_subnet_tags = { "kubernetes.io/role/internal-elb" = "1" } }

Workspaces vs Separate State Files

Terraform workspaces allow multiple state files within a single backend, identified by workspace name. The default workspace is called "default". You can create a "staging" and "production" workspace within the same backend bucket. Workspaces are most useful for short-lived feature environments (PR environments) where the infrastructure is identical but needs isolation.

For long-lived environments (dev, staging, production) most experienced teams use separate state file paths instead of workspaces. Separate paths give clearer isolation, simpler blast radius, and avoid the common mistake of accidentally applying production changes to staging (or vice versa) by being in the wrong workspace. Use workspaces for ephemeral environments; use separate backends or state key paths for permanent environments.

Terraform in CI/CD β€” The GitOps Workflow

Running Terraform manually is only acceptable for initial bootstrapping. Production infrastructure changes must go through a CI/CD pipeline with plan review and approval gates. The standard pattern I use with GitHub Actions:

# .github/workflows/terraform.yml name: Terraform on: pull_request: branches: [main] push: branches: [main] jobs: terraform: runs-on: ubuntu-latest permissions: id-token: write # for OIDC auth to AWS β€” no stored keys contents: read pull-requests: write steps: - uses: actions/checkout@v4 - uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: arn:aws:iam::123456789:role/github-terraform-role aws-region: ap-south-1 - uses: hashicorp/setup-terraform@v3 with: { terraform_version: "1.8.0" } - run: terraform init - run: terraform validate - run: terraform fmt -check -recursive - name: Terraform Plan run: terraform plan -out=tfplan -no-color 2>&1 | tee plan_output.txt if: github.event_name == 'pull_request' - name: Comment Plan on PR uses: actions/github-script@v7 if: github.event_name == 'pull_request' with: script: | const fs = require('fs'); const plan = fs.readFileSync('plan_output.txt', 'utf8'); github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: '``` ' + plan.slice(-60000) + ' ```' }); - name: Terraform Apply run: terraform apply -auto-approve tfplan if: github.ref == 'refs/heads/main' && github.event_name == 'push'

Production Best Practices

Common Terraform Errors and How to Fix Them

12 Terraform Interview Questions with Expert Answers

Q1: What is Terraform state and why is it critical?
Terraform state is a JSON file that maps your HCL resource definitions to real infrastructure resource IDs. It is Terraform's memory of what it has created. Without state, Terraform cannot determine what already exists and would attempt to create everything from scratch on every apply, causing duplicate resources and conflicts. State also tracks resource dependencies for correct destroy ordering. Remote state (stored in S3 with DynamoDB locking) is mandatory for teams β€” it enables shared state access, prevents concurrent applies from corrupting state, and survives developer machine failures. The state file may contain sensitive values (passwords, private keys), so always encrypt it at rest and restrict access via IAM policies.
Q2: What is the difference between terraform taint and terraform apply -replace?
Both force recreation of a specific resource. terraform taint resource.name (deprecated since Terraform 0.15.2) marked a resource as "tainted" in state, causing it to be destroyed and recreated on the next apply. terraform apply -replace="resource.name" is the modern replacement β€” it generates a plan showing the destroy/create for that resource and applies it in one step. Use -replace when a resource is in a broken state that can't be fixed by updating its configuration: a corrupted EC2 instance, a Kubernetes node in NotReady state that won't recover, or a resource whose cloud-side state has diverged from what Terraform expects. Always prefer targeted replacement over manual resource deletion, which would leave the old resource in state pointing to a non-existent resource.
Q3: How do you manage secrets in Terraform without exposing them in state?
Three strategies: (1) Mark variables as sensitive=true so values never appear in plan/apply output or logs β€” but the value IS still stored in state. (2) Use AWS Secrets Manager or Parameter Store as the source of truth and reference them with data sources: data "aws_secretsmanager_secret_version". The secret value never touches your Terraform code or variables. (3) Use Vault dynamic secrets β€” Terraform's Vault provider generates short-lived credentials on demand (e.g., temporary database credentials) that expire after use. For the state file itself: always encrypt with KMS, restrict S3 bucket access to only the CI/CD role and senior engineers, and consider using Terraform Cloud or HCP Terraform which encrypts sensitive values in state separately from the state file itself.
Q4: Explain the difference between count and for_each. When would you use each?
Both create multiple instances of a resource. count creates N instances indexed by integer (0, 1, 2...). for_each creates instances keyed by a string or map value. The critical difference is what happens when you remove an item from the middle: with count, removing item at index 1 from a list of 3 causes Terraform to shift items 2 and 3 down to indices 1 and 2, which Terraform interprets as modifying those resources β€” potentially destroying and recreating them. With for_each using stable string keys (like a map or set of strings), removing one key only destroys that specific resource. Use count only when you need simple integer-indexed identical resources (like N EC2 instances with no identity). Use for_each for resources with distinct identities (IAM users by name, S3 buckets by purpose, security group rules by protocol). In practice, prefer for_each in almost all cases.
Q5: What happens if Terraform state is lost or corrupted?
If state is lost, Terraform has no memory of what it created. Running apply again will try to create all resources from scratch, and most will fail with "already exists" errors. Recovery steps: (1) If using S3 backend with versioning (which you always should), retrieve the previous state version from S3 versioning. (2) If the state is unrecoverable, use terraform import to manually import each existing resource back into a new empty state file. This is extremely time-consuming for large infrastructure. (3) Use terraform refresh (deprecated) or terraform apply -refresh-only to reconcile state with real infrastructure after recovery. Prevention: enable S3 versioning, enable S3 MFA Delete for state buckets, use DynamoDB state locking to prevent concurrent corruption, and back up state to a second bucket in a different region.
Q6: What is a Terraform module and how do you design a good one?
A module is a reusable, encapsulated Terraform configuration with a defined interface (input variables, output values, required providers). A well-designed module: has a clear, single responsibility (a VPC module, an EKS module, a PostgreSQL RDS module β€” not a "company infrastructure" module); exposes all customisable parameters as typed, documented, validated variables; provides sensible defaults for optional parameters; outputs all values that callers might need (IDs, ARNs, endpoints); pins its required_providers versions; and is versioned (via Git tags) so callers can pin to a specific version. Avoid: modules that call other modules deeply (hard to debug), modules that use hardcoded values that should be variables, and modules that produce so many resources that plan output becomes incomprehensible.
Q7: How do you handle Terraform drift?
Drift occurs when real infrastructure is modified outside Terraform (manual console changes, AWS auto-healing changes, external automation). Detect drift with terraform plan -refresh-only β€” this shows differences between state and real infrastructure without proposing any configuration changes. In CI/CD, run a scheduled drift detection job daily: terraform plan -detailed-exitcode returns exit code 2 if there are changes (including drift) and exit code 0 if there are none β€” alert on exit code 2. When drift is detected: if the manual change was intentional, codify it in Terraform and apply. If it was unintentional, run apply to revert it. Prevent drift with SCPs in AWS Organizations that block console write actions on production resources (only allow the Terraform CI/CD role to make changes).
Q8: What is the moved block in Terraform and when do you use it?
The moved block (introduced in Terraform 1.1) tells Terraform that a resource has been renamed or moved within the configuration β€” without destroying and recreating it. When you rename a resource block (from aws_instance.web to aws_instance.api_server) or move a resource into a module during refactoring, Terraform would normally plan a destroy of the old resource and create of the new one. A moved block records the rename in the configuration so Terraform updates state without touching the real infrastructure. Example: moved { from = aws_s3_bucket.old_name; to = aws_s3_bucket.new_name }. It is also essential when refactoring flat resources into modules or extracting modules into child modules. After applying the moved block, the entry in state is updated and you can remove the moved block in the next commit.
Q9: How do you use Terraform in a multi-account AWS organisation?
Multi-account Terraform architecture: separate state files per account and environment (dev account state in dev S3 bucket, prod state in prod S3 bucket β€” never share state across accounts). Use IAM role assumption in provider configuration: the CI/CD pipeline runs in a central tooling account, assumes a terraform-apply role in each target account via cross-account IAM role assumption. Define provider aliases for multi-account resources in a single configuration: provider "aws" { alias = "production"; assume_role { role_arn = "arn:aws:iam::PROD:role/terraform" } }. Use AWS Organizations + SCPs managed by Terraform to enforce guardrails across all accounts. Use Terragrunt (a thin wrapper around Terraform) to DRY up multi-account, multi-region configurations β€” it handles backend configuration generation, dependency management between stacks, and parallel apply across multiple modules.
Q10: What is the purpose of terraform validate and terraform fmt?
terraform validate checks the configuration for syntax errors and internal consistency β€” references to undefined variables, incorrect argument types, invalid resource configurations. It does not check against the cloud provider API, so it runs offline without credentials. Run it in CI/CD as the first step before init or plan. terraform fmt automatically formats .tf files to the canonical HCL style β€” consistent indentation (2 spaces), alignment of = signs in blocks, and sorted arguments. Run terraform fmt -check -recursive in CI to fail the pipeline if any file is not formatted β€” this enforces a consistent code style across the team. Both are fast, free, and should be gates in every Terraform CI/CD pipeline before plan is allowed to run.
Q11: What is Terragrunt and why do teams use it with Terraform?
Terragrunt is an open-source wrapper around Terraform (by Gruntwork) that solves the problem of DRY (Don't Repeat Yourself) configuration at scale. Without Terragrunt, every environment directory (dev/staging/prod, per region, per service) contains duplicated backend configuration, duplicated provider blocks, and similar variable values. Terragrunt provides: automatic backend configuration generation per environment (no more copy-pasted backend.tf files), a dependency system for referencing outputs of other Terraform modules (e.g., the EKS module references the VPC module's subnet IDs without manual copy-paste), run-all commands that apply all modules in a directory tree in parallel with correct dependency ordering, and environment-specific variable inheritance (globals β†’ environment β†’ region β†’ module). Teams adopt Terragrunt at the point where they manage 10+ Terraform modules across 3+ environments and the boilerplate maintenance becomes painful.
Q12: How do you test Terraform code?
Four levels of Terraform testing: (1) Static analysis β€” terraform validate (syntax), tflint (linting, provider-specific rules), tfsec or Checkov (security policy scanning β€” finds publicly exposed S3 buckets, unencrypted RDS instances, overly permissive security groups). Run in CI on every PR. (2) Unit tests β€” Terraform 1.6+ has a native test framework (terraform test) that lets you write .tftest.hcl files testing specific module outputs with mock providers (no real cloud resources required). (3) Integration tests β€” Terratest (Go library by Gruntwork) provisions real infrastructure in a test account, runs assertions, then destroys everything. Slower (minutes to hours) but catches real provider behaviour. (4) Policy-as-code β€” Sentinel (Terraform Cloud) or OPA/Conftest enforces organisational policies: "all S3 buckets must have versioning enabled", "no security groups can allow 0.0.0.0/0 on port 22". Run as a pre-apply gate in your pipeline.

πŸ—‚οΈ Explore Terraform on the Interactive Mind Map

See how Terraform connects to AWS, Kubernetes, CI/CD pipelines, and GitOps workflows.

Open Interactive Mind Map ☁️ AWS Guide β†’
// More Guides
πŸ“– DevOps ☸️ Kubernetes 🐳 Docker βš™οΈ CI/CD ☁️ AWS πŸ—‚οΈ Terraform πŸ“Š Prometheus 🐧 Linux 🌿 Git
Advertisement
β˜• Support Master DevOps

All guides are 100% free. If this helped you learn or land an interview, your support keeps the project alive.

β˜• Ko-fi β€” International πŸ’³ Razorpay β€” India
πŸ—‚οΈ
Written by Dhanush R
Senior DevOps Engineer Β· 4.5+ Years Β· Bengaluru Β· AWS Β· Kubernetes Β· Terraform

4.5+ years writing production Terraform β€” EKS clusters, multi-account AWS organisations, VPC architectures, and CI/CD pipelines. Every pattern here is from real-world usage, not documentation paraphrasing. Last updated: May 2026.

πŸ“Έ Instagram ▢️ YouTube πŸ’Ό LinkedIn About β†’
πŸŒ™