// AWS Guide · DevOps & Cloud

AWS DevOps Roadmap: EKS, IAM, VPC, IRSA & Real Interview Questions

📅 Updated April 2026 · 📅 April 2026 ⏱ 12 min read 🏷 AWS · EKS · IAM · Cloud · DevOps

👨‍💻

Dhanush R

Senior DevOps Engineer · 4.5+ Years Experience · Bengaluru

AWS is the dominant cloud platform in DevOps, with the widest service catalogue and the largest ecosystem of tools and integrations. At a top enterprise, I work with AWS daily — provisioning EKS clusters with Terraform, configuring IRSA for pod-level IAM authentication, designing multi-AZ VPC architectures, and debugging IAM permission issues. This guide covers the AWS services and concepts that appear most in DevOps interviews.

Core AWS Services Every DevOps Engineer Must Know

EKS (Elastic Kubernetes Service) — Managed Kubernetes control plane. AWS manages the control plane (API server, etcd); you manage worker nodes via Node Groups.
EC2 (Elastic Compute Cloud) — Virtual machines. The foundation of most AWS workloads. Know instance families (t3, m5, c5, r5) and when to use each.
IAM (Identity and Access Management) — Authentication and authorisation for all AWS APIs. Every action in AWS goes through IAM.
VPC (Virtual Private Cloud) — Isolated virtual network. Every production workload lives in a VPC with private subnets.
S3 (Simple Storage Service) — Object storage. Terraform state, build artifacts, static assets, log archives, and data lake storage.
RDS (Relational Database Service) — Managed SQL databases. Multi-AZ for HA, Read Replicas for read scaling.
ALB/ELB (Load Balancers) — Application Load Balancer for L7 HTTP routing. Network Load Balancer for L4 TCP performance.
CloudWatch — Metrics, logs, alarms, and dashboards. The primary observability tool in AWS environments.
Lambda — Serverless functions. Event-driven processing without managing servers.

VPC Architecture — Production Design

Every production AWS workload should live in a properly designed VPC. The standard multi-AZ architecture:

3 public subnets (one per AZ) — Internet-facing resources only: NAT Gateways, Application Load Balancers, bastion hosts.
3 private subnets (one per AZ) — Application layer: EC2 instances, EKS worker nodes, Lambda functions. No direct internet access.
3 database subnets (one per AZ) — Database layer: RDS, ElastiCache. No route to internet at all.
NAT Gateway in each public subnet — Allows private subnet resources to initiate outbound internet connections (for package downloads, AWS API calls) without accepting inbound connections.
Internet Gateway attached to VPC — Required for public subnet resources to communicate with the internet.

    Security Group vs NACL: Security Groups are stateful — return traffic is automatically allowed. NACLs are stateless — you must explicitly allow both inbound and outbound. Security Groups operate at the ENI (instance) level; NACLs operate at the subnet level. Use Security Groups for application-level rules (port 8080 from ALB). Use NACLs as a blunt subnet-level defence (block entire IP ranges).
  

IAM — Identity and Access Management

IAM is the most complex AWS service and the most common source of security vulnerabilities. The principle of least privilege must be applied everywhere: users, roles, and services should only have exactly the permissions they need and nothing more.

Key IAM Concepts

IAM User — Long-term credentials for a human or non-federated service. Avoid creating IAM Users for applications — use IAM Roles instead.
IAM Role — Temporary credentials assumed by services, applications, or users. No long-term access keys. A Lambda function, EC2 instance, or EKS pod assumes a role to get temporary credentials.
IAM Policy — JSON document defining allowed/denied actions on specific resources. Attach to users, groups, or roles.
SCP (Service Control Policy) — Organisation-level guardrails that restrict what accounts in an AWS Organisation can do, regardless of IAM policies. Use to prevent root account actions, restrict regions, or mandate encryption.

# Minimal IAM policy example — least privilege for S3 access
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::company-artifacts/*",  # specific bucket only
      "Condition": {
        "StringEquals": {
          "s3:prefix": ["api/releases/"]      # specific prefix only
        }
      }
    }
  ]
}

IRSA — IAM Roles for Service Accounts

IRSA (IAM Roles for Service Accounts) is one of the most important EKS security features. Without IRSA, the only way to give a Kubernetes pod AWS API access is to attach an IAM role to the EC2 node — giving every pod on that node the same permissions. This violates least privilege.

IRSA solves this by federating Kubernetes Service Accounts with IAM Roles using OIDC. Each pod gets its own IAM role with precisely the permissions it needs. The mechanism:

EKS cluster has an OIDC provider endpoint (configured when creating the cluster).
An IAM Role is created with a trust policy that allows the Kubernetes Service Account to assume it.
The Service Account is annotated with the IAM Role ARN.
When the Pod starts, the EKS Pod Identity Webhook injects AWS credentials via environment variables — no access keys stored anywhere.

# Annotate the Kubernetes Service Account
kubectl annotate serviceaccount api-sa   eks.amazonaws.com/role-arn=arn:aws:iam::123456789:role/api-s3-role   -n production

# Verify the pod gets credentials
kubectl exec -it api-pod -n production -- aws sts get-caller-identity
# Should show the IAM role ARN, not the node role

S3 vs EBS vs EFS — When to Use Each

S3 (Object Storage) — Unlimited capacity, 99.999999999% (11 nines) durability, accessed via HTTP API. Use for: build artifacts, Terraform state, static website assets, log archives, data lake, backups. Not mountable as a filesystem (except with S3FS, which is slow).
EBS (Elastic Block Store) — Block storage attached to a single EC2 instance or Kubernetes pod (PersistentVolume). Low latency, high IOPS. Use for: OS volumes, databases that need block storage (Postgres, MySQL running on EC2), Kubernetes PersistentVolumes for StatefulSets.
EFS (Elastic File System) — Managed NFS filesystem. Can be mounted by thousands of EC2 instances and pods simultaneously. Use for: shared content (CMS media, machine learning datasets), configuration files shared across instances, ReadWriteMany Kubernetes PersistentVolumes.

RDS — Multi-AZ vs Read Replicas

This is a very common AWS interview question, and the answer requires understanding the purpose of each:

Multi-AZ — High availability (HA) and disaster recovery. A standby instance in a different AZ receives synchronous replication. On primary failure, RDS automatically promotes the standby in 1-2 minutes. The standby is not accessible for reads — it exists purely for failover. Use in all production environments.
Read Replica — Read scalability. Asynchronous replication to one or more read-only instances. Applications can route read queries to replicas, reducing load on the primary. Replicas can be in the same AZ, different AZ, or even different regions. Can be promoted to a standalone database if needed.

    In production: Use both. Multi-AZ for HA. Read Replicas for read-heavy applications. Aurora PostgreSQL gives you both with up to 15 replicas and sub-second failover — the best option when you can afford it.
  

Interview Q&A

Q1: How does IRSA work under the hood?

EKS has an OIDC provider. When a Pod with an annotated ServiceAccount starts, the EKS Pod Identity Webhook intercepts the pod creation and injects three things: an AWS_ROLE_ARN environment variable, an AWS_WEB_IDENTITY_TOKEN_FILE path, and a projected volume containing a short-lived Kubernetes ServiceAccount token. The AWS SDK in the pod reads the token file and calls AWS STS AssumeRoleWithWebIdentity, exchanging the Kubernetes token for temporary IAM credentials. The IAM role's trust policy must include the OIDC provider and the specific ServiceAccount as a principal.

Q2: What is the difference between a Security Group and a NACL?

Security Groups are stateful — if you allow inbound traffic on port 8080, the return traffic is automatically allowed without an explicit outbound rule. They operate at the instance/ENI level and can only allow (not deny) traffic. NACLs are stateless — you must explicitly allow both inbound and outbound traffic. They operate at the subnet level and support both allow and deny rules, evaluated in order by rule number. Use Security Groups for application-level access control. Use NACLs for subnet-level broad controls like blocking known malicious IP ranges.

Q3: How do you design a multi-AZ high-availability architecture on AWS?

Deploy application tier across at least 2 AZs (3 recommended) using an Auto Scaling Group or EKS Multi-AZ node groups. Use an Application Load Balancer spanning all AZs — it automatically routes away from unhealthy instances. Database layer: RDS Multi-AZ for automatic failover, or Aurora with Multi-AZ replicas. Cache: ElastiCache in cluster mode with shards across AZs. Use Route53 health checks for DNS-level failover. Set PodDisruptionBudgets in EKS to prevent all pods being drained from one AZ simultaneously. Use topologySpreadConstraints to ensure pods spread across AZs.

// More Guides

📖 DevOps ☸️ Kubernetes 🐳 Docker ⚙️ CI/CD 🗂️ Terraform 🐧 Linux 🌿 Git ☁️ AWS 📊 Prometheus