← Home πŸ—ΊοΈ Mind Map β˜• Ko-fi πŸ’³ Razorpay
// Kubernetes Guide Β· Intermediate–Advanced

Kubernetes Complete Guide 2026: Architecture, YAML, RBAC & Production Patterns

πŸ“… Updated May 2026 ⏱ 25 min read 🏷 Kubernetes Β· DevOps Β· SRE Β· Containers
πŸ‘¨β€πŸ’»
Dhanush R β€” Senior DevOps Engineer
4.5+ years production experience with Kubernetes on AWS EKS and Azure AKS. Every section of this guide is written from real incident response, cluster management, and senior DevOps interview experience.
// Table of Contents
  1. What is Kubernetes and Why Does It Exist?
  2. Kubernetes Architecture β€” Control Plane & Worker Nodes
  3. Core Workload Objects: Pod, Deployment, StatefulSet, DaemonSet
  4. Networking β€” Services, Ingress, and Network Policies
  5. Production YAML Examples with Real Annotations
  6. Health Probes β€” The Most Misunderstood Feature
  7. Security, RBAC, and Secrets Management
  8. Autoscaling: HPA, VPA, and KEDA
  9. Storage: PersistentVolumes and StorageClasses
  10. Debugging Kubernetes in Production β€” Systematic Approach
  11. Essential kubectl Command Reference
  12. Common Production Errors and How to Fix Them
  13. 15 Kubernetes Interview Questions with Expert Answers

Kubernetes (K8s) is the most important skill in modern DevOps and SRE. I have used it daily in production for 4.5 years β€” managing clusters on AWS EKS and Azure AKS, responding to production incidents at 2am, tuning HPA policies for high-traffic services, and designing multi-AZ high-availability architectures. This guide is written from that real-world experience, not from copying documentation.

Originally built by Google based on their internal cluster management system called Borg, Kubernetes was open-sourced in 2014 and donated to the Cloud Native Computing Foundation (CNCF). It has since become the industry standard for running containerised workloads at scale. If you are preparing for a Senior DevOps, Platform Engineer, or SRE role in 2026, a deep and practical understanding of Kubernetes is non-negotiable. This guide covers everything you need.

What is Kubernetes and Why Does It Exist?

Before Kubernetes, running Docker containers at scale exposed a fundamental gap: Docker solved packaging, but not orchestration. When you have 50 Docker containers running across 10 servers, who restarts a container that crashes? Who redistributes load when a server goes down? How do you roll out a new version of your application without downtime across all instances? How do you automatically add more containers during a traffic spike and remove them when traffic drops?

Kubernetes answers all of these questions. It is an orchestration platform β€” a system that manages containers across a cluster of machines. You declare what you want (three replicas of my API service, always running, with 512MB of RAM each, accessible at this DNS name) and Kubernetes makes it happen and maintains it continuously β€” even when servers fail, containers crash, or traffic triples unexpectedly.

The key insight is that Kubernetes is a desired-state system. You describe the desired state of your infrastructure in YAML manifest files. Kubernetes continuously monitors the actual state of the cluster and reconciles it toward the desired state. This reconciliation loop runs constantly β€” every few seconds for every resource. It is the architectural foundation of everything in the platform and explains why Kubernetes is self-healing by design.

Why Kubernetes replaced Docker Swarm and Mesos: Kubernetes won the container orchestration wars by 2018 because of its extensible API, rich ecosystem (Helm, Operators, service meshes), strong RBAC model, and the backing of every major cloud provider. AWS, GCP, and Azure all offer managed Kubernetes services (EKS, GKE, AKS). Today, Kubernetes is essentially synonymous with container orchestration in production environments.

Kubernetes Architecture β€” Control Plane & Worker Nodes

A Kubernetes cluster consists of two types of infrastructure: the control plane, which manages the cluster state and makes scheduling decisions, and worker nodes, which run the actual application workloads. Understanding this architecture in depth is the first question in most Kubernetes interviews.

Control Plane Components

In a managed Kubernetes service like AWS EKS or GKE, the control plane is fully managed by the cloud provider. You pay for it but never SSH into it. In self-managed clusters (kubeadm, Rancher), you manage these components yourself:

Worker Node Components

Core Workload Objects

Pod β€” The Atomic Unit

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share a network namespace (they reach each other via localhost) and can share storage volumes. In practice, most Pods contain a single application container. The sidecar pattern β€” where a second container handles a cross-cutting concern like log shipping, secret injection, or service mesh proxying β€” is the main use case for multi-container Pods.

Pods are ephemeral and are not self-healing. A bare Pod that crashes stays crashed. Never run bare Pods in production β€” always use a controller (Deployment, StatefulSet, DaemonSet, or Job) that will recreate the Pod if it fails. The scheduler assigns each Pod to a node once at creation time; after that, the Pod is permanently bound to that node. If the node dies, the Pod is gone β€” the controller creates a replacement Pod on a healthy node.

Deployment β€” For Stateless Applications

A Deployment is the standard way to run stateless applications. You declare the desired number of replicas and the container image. The Deployment controller creates a ReplicaSet, which creates and maintains the specified number of Pods. If a Pod is deleted or fails, the ReplicaSet controller creates a replacement immediately.

Deployments also manage rolling updates. When you update the image version, the Deployment creates a new ReplicaSet with the new image, gradually scales it up while scaling down the old ReplicaSet β€” all while maintaining the number of available Pods specified by the rolling update strategy. Rolling back is instant: the old ReplicaSet still exists with all its Pods cached, so rollback just scales the old ReplicaSet back up.

StatefulSet β€” For Stateful Applications

StatefulSets provide three guarantees that Deployments do not: stable, persistent Pod identities (pod-0, pod-1, pod-2 β€” names never change even after rescheduling), stable per-Pod PersistentVolumeClaims that remain bound to the same Pod even when it is rescheduled to a different node, and ordered, sequential startup and shutdown (pod-0 must be Running before pod-1 is started; pod-1 is stopped before pod-0 during scale-down).

Use StatefulSets for: PostgreSQL, MySQL, MongoDB, Kafka, Redis Cluster, Cassandra, Elasticsearch β€” any workload where instance identity matters. For example, a primary-replica PostgreSQL setup needs a stable pod-0 that is always the primary, with pod-1 and pod-2 as replicas that connect to pod-0. With a Deployment, Pod names are random and change on rescheduling, making this impossible.

DaemonSet β€” One Pod Per Node

DaemonSets ensure exactly one Pod runs on every node in the cluster (or a filtered subset). When a new node joins the cluster, the DaemonSet controller automatically schedules its Pod on it. When a node is removed, its DaemonSet Pod is garbage collected. Use DaemonSets for: log collectors (Fluentd, Filebeat, Promtail), monitoring agents (Prometheus Node Exporter, Datadog Agent), network plugins (Calico, Cilium CNI agents), and security tools (Falco runtime security).

Job and CronJob β€” For Batch Workloads

A Job runs one or more Pods to completion β€” it creates Pods until the specified number of successful completions is reached, then stops. Use Jobs for database migrations, data processing tasks, and one-time setup operations. A CronJob creates Jobs on a schedule using standard cron syntax. Kubernetes CronJobs replace traditional server cron entries with the reliability and observability of the Kubernetes platform.

Networking β€” Services, Ingress, and Network Policies

Kubernetes networking is built on a flat network model: every Pod in the cluster gets a unique IP address, and any Pod can reach any other Pod's IP directly, without NAT, regardless of which node each Pod runs on. This simplicity is powerful but has implications for security β€” by default, all Pods can talk to all other Pods. Network Policies address this.

Service Types

Network Policies β€” Zero-Trust Networking

By default, all Pods in a Kubernetes cluster can communicate with all other Pods. Network Policies are namespace-scoped firewall rules that restrict this. A best-practice production cluster has a default-deny policy that blocks all ingress and egress, then explicit allow policies for required communication paths. The CNI plugin must support Network Policies β€” Calico, Cilium, and Weave all do; Flannel does not.

# Default-deny all ingress to a namespace apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-ingress namespace: production spec: podSelector: {} # applies to ALL pods in namespace policyTypes: - Ingress # blocks all inbound traffic by default --- # Allow ingress only from the api-gateway pod apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-gateway namespace: production spec: podSelector: matchLabels: app: backend-api policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: api-gateway ports: - protocol: TCP port: 8080

Production YAML Examples with Real Annotations

Production-Grade Deployment

# production-deployment.yaml β€” all production-critical fields included apiVersion: apps/v1 kind: Deployment metadata: name: api-server namespace: production labels: app: api-server version: "2.1.0" team: platform spec: replicas: 3 selector: matchLabels: app: api-server strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 # never drop below 3 available pods maxSurge: 1 # create one extra pod during update template: metadata: labels: app: api-server version: "2.1.0" spec: serviceAccountName: api-sa securityContext: runAsNonRoot: true runAsUser: 1001 fsGroup: 2000 topologySpreadConstraints: # spread pods across AZs - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: api-server containers: - name: api image: registry.company.com/api:2.1.0 imagePullPolicy: IfNotPresent ports: - containerPort: 8080 name: http resources: requests: cpu: "100m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: db-secret key: password - name: APP_ENV value: "production" readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 startupProbe: # allow slow startup without CrashLoopBackOff httpGet: path: /health/ready port: 8080 failureThreshold: 30 periodSeconds: 10 lifecycle: preStop: exec: command: ["sleep", "5"] # drain in-flight requests before SIGTERM terminationGracePeriodSeconds: 30

Ingress with TLS, Rate Limiting, and CORS

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: api-ingress namespace: production annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/rate-limit: "100" nginx.ingress.kubernetes.io/cors-allow-origin: "https://app.company.com" cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - api.company.com secretName: api-tls-cert rules: - host: api.company.com http: paths: - path: /api/v1 pathType: Prefix backend: service: name: api-service port: number: 80

Health Probes β€” The Most Misunderstood Feature

Health probe misconfiguration is the single most common cause of production incidents I have investigated over four years. There are three probe types and they have fundamentally different behaviours β€” confusing them creates cascading failures.

Liveness probe failure β†’ container is killed and restarted by kubelet. Use for detecting process deadlocks, memory corruption, or stuck goroutines. If the app process is running but not making progress, kill it and start fresh.
Readiness probe failure β†’ Pod is removed from Service endpoints. No traffic. No restart. Use for: app still warming up caches, database connection not yet established, feature flag service unavailable. The Pod is live but not ready to serve traffic.
Startup probe β†’ liveness and readiness are suspended until startup probe succeeds. Use for slow-starting applications (Java services with JVM warm-up, apps loading large ML models) to prevent CrashLoopBackOff during first startup.
Critical mistake to avoid: Never put an external dependency check in a liveness probe. If your liveness probe calls your database and the database has a 30-second outage, Kubernetes will restart every single Pod in your Deployment simultaneously. The result is worse than the original outage: a thundering herd of restarting Pods all simultaneously hammering the recovering database, causing the database to crash again. External dependency checks belong in the readiness probe only.
Second critical mistake: Setting initialDelaySeconds too low on the liveness probe for slow-starting applications. A Java Spring Boot service may take 45–90 seconds to start on cold JVM. If your liveness probe starts checking at 10 seconds and the app isn't responding yet, Kubernetes kills it β€” and you enter CrashLoopBackOff before the app ever successfully started. Use a startup probe with generous failure thresholds for slow-starting apps.

Security, RBAC, and Secrets Management

Role-Based Access Control (RBAC) is the primary access control mechanism inside a Kubernetes cluster. The principle of least privilege must be applied everywhere: each ServiceAccount should only have access to the exact Kubernetes API resources it needs, and nothing more. RBAC is always a topic in senior DevOps and SRE interviews.

The RBAC model has four resource types: Role (namespace-scoped permissions), ClusterRole (cluster-wide permissions), RoleBinding (assigns a Role to a user, group, or ServiceAccount), and ClusterRoleBinding (assigns a ClusterRole cluster-wide). Start with Role/RoleBinding unless you specifically need cluster-wide access.

# Role: read-only access to pods and logs in one namespace apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reader namespace: production rules: - apiGroups: [""] resources: ["pods", "pods/log", "pods/exec"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: api-pod-reader namespace: production subjects: - kind: ServiceAccount name: api-sa namespace: production roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io
# Always verify your RBAC rules with auth can-i kubectl auth can-i get pods \ --as=system:serviceaccount:production:api-sa -n production # Expected: yes kubectl auth can-i delete deployments \ --as=system:serviceaccount:production:api-sa -n production # Expected: no
Kubernetes Secrets are NOT encrypted at rest by default. They are base64-encoded, which any user with kubectl get secret access (or etcd access) can trivially decode. For production security, use one of: HashiCorp Vault with the Vault Agent Injector (secrets injected into Pod environment at startup, never stored in etcd), External Secrets Operator (syncs secrets from AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault into Kubernetes Secrets), or Sealed Secrets (Bitnami) for GitOps workflows where encrypted secrets are safe to commit. Enable etcd encryption at rest as a baseline hardening measure.

Autoscaling: HPA, VPA, and KEDA

Kubernetes provides three autoscaling mechanisms, each operating at a different level:

Horizontal Pod Autoscaler (HPA) scales the number of Pod replicas based on metrics. CPU and memory utilisation are the built-in metrics. Custom metrics (HTTP request rate, queue depth, latency) are available via the Custom Metrics API (Prometheus Adapter, KEDA). HPA works best when your application can scale horizontally and each Pod has well-calibrated resource requests.

Vertical Pod Autoscaler (VPA) adjusts the CPU and memory requests of existing Pods based on historical usage. It solves the "how much resource should I request?" question automatically. VPA requires a Pod restart to apply new resource requests, so it is not suitable for stateless production services under constant load β€” use it in combination with Goldilocks to generate recommendations, then bake them into your deployment manifests.

KEDA (Kubernetes Event-Driven Autoscaling) extends HPA to scale on virtually any event source: Kafka consumer lag, SQS queue depth, Redis queue length, Cron schedules, Prometheus metrics, and 60+ other scalers. KEDA can scale down to zero (for cost saving) and back up again, which HPA cannot do natively.

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-server minReplicas: 3 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 behavior: scaleDown: stabilizationWindowSeconds: 300 # wait 5 min before scaling down policies: - type: Percent value: 10 periodSeconds: 60 # scale down max 10% per minute scaleUp: stabilizationWindowSeconds: 0 # scale up immediately policies: - type: Percent value: 100 periodSeconds: 30 # can double replicas every 30s

Storage: PersistentVolumes and StorageClasses

Kubernetes abstracts storage through three resource types. A PersistentVolume (PV) is a piece of storage provisioned in the cluster β€” an EBS volume, an NFS share, a Ceph pool. A PersistentVolumeClaim (PVC) is a request for storage by a user or workload β€” "I need 10GB of ReadWriteOnce storage." A StorageClass enables dynamic provisioning β€” when a PVC is created, the StorageClass automatically provisions a matching PV without manual admin intervention.

In production on AWS EKS, you use the EBS CSI Driver (for block storage, ReadWriteOnce β€” one Pod per volume) and EFS CSI Driver (for shared file storage, ReadWriteMany β€” multiple Pods can mount the same volume). Always use dynamic provisioning with StorageClasses rather than manually creating PVs. Set reclaimPolicy: Retain for production databases so volumes are not deleted when the PVC is deleted.

Production tip: Set volumeBindingMode: WaitForFirstConsumer on your StorageClass. This delays PV provisioning until a Pod is scheduled, ensuring the volume is created in the same Availability Zone as the Pod. Without this, EBS volumes are created in a random AZ and Pods may fail to schedule because the volume is in a different AZ from the only available nodes.

Debugging Kubernetes in Production

This is the section that separates senior engineers from juniors in interviews and on-call. When a Pod is broken in production at 3am, you need a systematic, fast debugging methodology β€” not random kubectl commands fired in panic. Here is the exact flow I use in production incidents:

# Step 1: Get Pod status β€” identify the failure mode kubectl get pods -n production -o wide # States: Pending, Running, Terminating, CrashLoopBackOff, Error, OOMKilled, ImagePullBackOff # Step 2: Describe the Pod β€” the Events section is the most valuable kubectl describe pod api-7d4b-xyz -n production # Events tell you: why it's Pending (scheduling failure), what image pull error occurred, etc. # Step 3: Get logs β€” use --previous for the LAST crash, not the current run kubectl logs api-7d4b-xyz -n production --previous kubectl logs api-7d4b-xyz -n production --tail=200 --timestamps # Step 4: Check cluster-wide events sorted by time kubectl get events --sort-by='.lastTimestamp' -n production # Step 5: Check resource usage β€” OOMKilled means memory limit too low kubectl top pods -n production --sort-by=memory kubectl top nodes # Step 6: Check if Service has healthy endpoints kubectl get endpoints api-service -n production # Empty endpoints = readiness probe failing or Pod label selector mismatch # Step 7: Exec into container for live debugging kubectl exec -it api-7d4b-xyz -n production -- sh # Step 8: Port-forward for local testing kubectl port-forward svc/api-service 8080:80 -n production # Step 9: Debug a node-level issue kubectl debug node/ip-10-0-1-42.ec2.internal -it --image=busybox

Essential kubectl Command Reference

CommandWhat it does
kubectl get pods -AList all Pods in all namespaces
kubectl get pods -n prod -o wideWide output including node and IP
kubectl describe pod <name>Full details including Events β€” start debugging here
kubectl logs <pod> --previousLogs from the last crashed container instance
kubectl rollout status deploy/<name>Wait for rollout to complete
kubectl rollout undo deploy/<name>Rollback to previous ReplicaSet immediately
kubectl scale deploy/<name> --replicas=5Imperatively scale a Deployment
kubectl set image deploy/<name> app=image:tagTrigger a rolling update to a new image
kubectl apply -f manifest.yamlDeclarative create or update (idempotent)
kubectl delete -f manifest.yamlDelete all resources defined in the file
kubectl exec -it <pod> -- bashInteractive shell in a running container
kubectl cp <pod>:/path ./localCopy files from container to local
kubectl auth can-i <verb> <resource>Test RBAC permissions for current context
kubectl top pods -n prodLive CPU and memory usage per Pod
kubectl get events --sort-by=.lastTimestampCluster events sorted by time β€” best for incident triage
kubectl drain <node> --ignore-daemonsetsSafely evict all Pods before node maintenance
kubectl cordon <node>Mark node unschedulable (no new Pods)
kubectl uncordon <node>Return node to schedulable state

Common Production Errors and How to Fix Them

❌ CrashLoopBackOff
Cause: Container crashes immediately after start and Kubernetes keeps restarting it with exponential backoff (10s, 20s, 40s, up to 5 minutes). Common causes: application startup error, missing environment variable, database not yet available, liveness probe firing too aggressively before app is ready, OOMKilled (memory limit too low).

Fix: Run kubectl logs <pod> --previous to see the crash output. Check exit code with kubectl describe pod β€” exit code 137 means OOMKilled, exit code 1 means application error. Fix the root cause: increase memory limit, fix missing env var, add a startup probe for slow-starting apps.
❌ ImagePullBackOff / ErrImagePull
Cause: Kubernetes cannot pull the container image. Either the image does not exist, the tag is wrong, the registry is private and credentials are missing, or the registry is unreachable from the cluster network.

Fix: Verify image tag exists in the registry. Check that the imagePullSecret is created and referenced in the Pod spec. Verify network connectivity from nodes to the registry. For ECR on EKS, verify the node IAM role has ecr:GetAuthorizationToken and ecr:BatchGetImage permissions.
❌ Pending β€” FailedScheduling
Cause: No node in the cluster satisfies all the Pod's scheduling constraints. Common causes: insufficient CPU or memory on all nodes, node selector or affinity rules that no node matches, taints on all nodes that the Pod doesn't tolerate, or topology spread constraints that can't be satisfied.

Fix: Run kubectl describe pod <pending-pod> and read the Events section β€” the scheduler will tell you exactly why it couldn't schedule. Add more nodes, adjust resource requests, or relax affinity rules as appropriate.
❌ OOMKilled (Out of Memory)
Cause: The container exceeded its memory limit. The Linux OOM killer terminates the process. The container exit code is 137 (128 + SIGKILL). This is common for Java apps where the JVM allocates more memory than the container limit, or for memory leaks that grow over time.

Fix: Check kubectl top pods for current memory usage. Increase the memory limit in the resource spec. For Java, set explicit heap limits with -Xmx that are lower than the container limit (leaving 200-300MB for JVM overhead). Investigate memory leaks in long-running Pods with heap dumps.

15 Kubernetes Interview Questions with Expert Answers

Q1: What is CrashLoopBackOff and how do you systematically debug it?
CrashLoopBackOff means the container is crashing immediately after start, and Kubernetes is backing off restart attempts exponentially (10s, 20s, 40s, up to 5 minutes between restarts). Start with kubectl logs <pod> --previous β€” this shows the output of the last crashed instance, which almost always contains the error. Then kubectl describe pod to see the exit code and events. Exit code 137 means OOMKilled (raise memory limit). Exit code 1 typically means an application startup error (missing config, failed database connection). Exit code 2 often means a missing command or entrypoint. Use a startup probe with generous failure thresholds to prevent CrashLoopBackOff during legitimate slow startups.
Q2: Explain the difference between a Deployment and a StatefulSet. When would you choose each?
Use a Deployment for stateless applications where Pods are interchangeable β€” a web API, a worker process, a caching layer. Pods get random names, share PVCs if needed, and can be replaced in any order. Use a StatefulSet when each instance needs a stable, persistent identity. StatefulSets guarantee: ordered, predictable Pod names (pod-0, pod-1, pod-2 β€” unchanged across restarts), per-Pod PersistentVolumeClaims that follow the Pod across rescheduling, and ordered startup/shutdown. Use StatefulSets for: PostgreSQL primary/replica clusters (pod-0 must always be primary), Kafka brokers (each broker needs stable broker ID), Redis Cluster (slots are pinned to specific nodes), Elasticsearch (shard allocation depends on node identity). Never use StatefulSet for stateless apps β€” it makes rolling updates slower without any benefit.
Q3: Walk me through what happens when you run kubectl apply -f deployment.yaml
kubectl serialises the YAML and sends an HTTP request to the API server. The API server authenticates the request (checks the client certificate or bearer token), authorises it via RBAC (does this user have verb=create/update on resource=deployments in this namespace?), runs admission controllers (MutatingAdmissionWebhooks may modify the object; ValidatingAdmissionWebhooks may reject it), validates the resource schema, and finally persists the object to etcd. Once in etcd, the Deployment controller (watching for Deployment changes) is notified. It computes the desired number of ReplicaSets and Pods, creates any needed ReplicaSets, and the ReplicaSet controller creates the desired number of Pods. The scheduler watches for unscheduled Pods, scores all nodes, and assigns each Pod to the best node. The kubelet on each assigned node pulls the image and starts the containers. kube-proxy updates routing rules when Pods become Ready and are added to Service endpoints.
Q4: What is IRSA and why is it important for EKS security?
IRSA (IAM Roles for Service Accounts) is the mechanism that allows EKS Pods to authenticate to AWS services (S3, DynamoDB, Secrets Manager, ECR) using IAM roles β€” without embedding AWS credentials in the Pod spec or environment variables. Without IRSA, Pods would need either: a long-lived IAM access key (a security risk β€” keys can be exfiltrated from the container), or node-level IAM roles (which grant every Pod on that node the same permissions, violating least privilege). IRSA works by annotating a Kubernetes ServiceAccount with the ARN of an IAM role that trusts the cluster's OIDC provider. Pods using that ServiceAccount receive a signed JWT token that AWS STS validates, then issues temporary credentials for the IAM role. This way, each Pod can have its own minimum-privilege IAM role with short-lived credentials that auto-rotate.
Q5: How do you achieve zero-downtime deployments in Kubernetes?
Five things must all be correct simultaneously. First, set maxUnavailable: 0 and maxSurge: 1 in the rolling update strategy so capacity never drops below 100%. Second, configure a readiness probe so new Pods only receive traffic after they are genuinely ready to serve β€” not just after the container starts. Third, add a preStop lifecycle hook (sleep 5) so the Pod has time to finish in-flight requests after receiving SIGTERM. Fourth, set terminationGracePeriodSeconds long enough for the app to drain gracefully (typically 30-60 seconds). Fifth, configure a PodDisruptionBudget with minAvailable or maxUnavailable to prevent voluntary disruptions (node drains, cluster upgrades) from taking down more Pods than intended. Missing any one of these five conditions will cause at least brief downtime during deployments.
Q6: What is an etcd backup and why is it critical?
etcd is the cluster's single source of truth β€” it stores every Kubernetes resource definition, including all Deployments, Services, Secrets, ConfigMaps, RBAC policies, and custom resources. If etcd data is corrupted or lost without a backup, the entire cluster configuration is irrecoverable β€” you cannot recreate it from the worker nodes. Back up etcd using etcdctl snapshot save, targeting a path outside the cluster (S3, GCS). For production, take hourly snapshots with a retention policy. Test restores regularly β€” a backup you have never tested is not a backup. In managed Kubernetes (EKS, GKE, AKS), the cloud provider handles etcd backup. In self-managed clusters (kubeadm), etcd backup is your responsibility.
Q7: What is a PodDisruptionBudget and when do you need it?
A PodDisruptionBudget (PDB) limits the number of Pods that can be simultaneously unavailable during voluntary disruptions: node drains for maintenance, cluster version upgrades, cluster autoscaler scale-down events. Without a PDB, draining a node that runs two out of three replicas of your Deployment will evict both replicas simultaneously, causing downtime. A PDB with minAvailable: 2 prevents the drain from proceeding unless at least 2 Pods will remain healthy. Create a PDB for every production Deployment with more than one replica. Set minAvailable to at least one below the replica count, or maxUnavailable to a small number. Note: PDBs only protect against voluntary disruptions β€” involuntary disruptions (node hardware failure, kernel panic) can still cause downtime regardless of PDB settings.
Q8: How does Kubernetes handle node failures?
The Node controller in the controller manager monitors node health via node heartbeats (kubelet updates node status every few seconds). If a node stops sending heartbeats, the controller marks it as NotReady after 40 seconds (configurable). After another 300 seconds (default), the controller evicts all Pods from the NotReady node, setting their state to Terminating. The controllers managing those Pods (Deployment controller, StatefulSet controller) then create replacement Pods on healthy nodes. This process typically completes within 5-7 minutes of node failure. To reduce this window, tune --node-monitor-grace-period and --pod-eviction-timeout on the controller manager β€” but lowering these too aggressively can cause false evictions during transient network issues.
Q9: Explain Kubernetes resource requests and limits. What happens when a container exceeds its memory limit?
Resource requests tell the scheduler how much CPU and memory a container needs β€” the scheduler only places a Pod on a node with at least that much available. Resource limits are the hard ceiling a container cannot exceed. If a container tries to use more CPU than its limit, it is throttled (CPU cycles are taken away). If a container exceeds its memory limit, the Linux OOM killer kills the container process with SIGKILL (exit code 137 β€” OOMKilled), and kubelet restarts it. If OOMKills happen repeatedly, set memory limits higher. Requests and limits that are set equal (Guaranteed QoS class) are most predictable. Limits higher than requests (Burstable QoS) allow bursting but risk OOMKills under load. No limits set (BestEffort QoS) means the container can use any available resources but is the first to be evicted under memory pressure.
Q10: What is a Kubernetes Operator and when should you build one?
A Kubernetes Operator is a controller that automates the lifecycle management of a complex, stateful application using Kubernetes APIs and custom resources (CRDs). It encodes the operational knowledge β€” how to deploy, scale, backup, upgrade, and recover β€” that a human operator would otherwise need to perform manually. Build an Operator when: the application has complex day-2 operations (automated failover, online schema migrations, rolling upgrades with specific ordering), you need a Kubernetes-native API for the application, or you are distributing the application to many clusters. Well-known Operators include the Prometheus Operator (manages Prometheus, Alertmanager, and ServiceMonitors via CRDs), Cert-Manager (manages TLS certificate issuance and renewal), and the Strimzi Kafka Operator. For most in-house applications, Helm charts with Deployments and StatefulSets are sufficient β€” Operators are warranted for genuinely complex stateful systems.
Q11: How do you implement canary deployments in Kubernetes?
The simplest canary approach uses two Deployments with the same Pod labels: a stable Deployment with 9 replicas and a canary Deployment with 1 replica. The single Service selector matches both, so 10% of traffic naturally routes to the canary. Once validated, scale up the canary and scale down the stable version. For more precise traffic splitting (by percentage, by header value, by user cohort), use an Ingress controller with canary annotations (NGINX Ingress supports canary-weight annotations), or a service mesh like Istio (VirtualService with traffic weights). Argo Rollouts is a dedicated progressive delivery controller that implements canary and blue-green deployments with automated analysis β€” it integrates with Prometheus metrics to auto-promote or auto-rollback based on error rate SLOs.
Q12: What is the difference between ConfigMap and Secret? When should you use each?
ConfigMaps store non-sensitive configuration data as plaintext key-value pairs β€” environment-specific config files, feature flags, application settings. Secrets store sensitive data and are base64-encoded in etcd (not encrypted by default). Use ConfigMaps for: database hostnames, log levels, feature toggle values, nginx configuration files. Use Secrets for: passwords, API keys, TLS certificates, SSH keys, OAuth tokens. Both can be consumed as environment variables or mounted as files in the container. Important: base64 is encoding, not encryption. For genuine secret security in production, use Vault Agent Injector, External Secrets Operator, or enable etcd encryption at rest. Never commit Secret YAML files to Git (even base64 encoded) β€” use Sealed Secrets or External Secrets Operator for GitOps workflows.
Q13: How does the Kubernetes Ingress controller work?
An Ingress controller is a reverse proxy (NGINX, HAProxy, Traefik, or a cloud-native LB) that runs inside the cluster as a Deployment and watches for Ingress resource changes. When you create or update an Ingress resource, the controller reads the routing rules and reconfigures its proxy configuration accordingly β€” without restarts or downtime. An Ingress object itself is just a routing rule definition (match this host and path, send to this Service) β€” it does nothing without a controller to implement it. On AWS EKS, the AWS Load Balancer Controller creates an Application Load Balancer per Ingress, with path-based routing rules matching the Ingress spec. The Ingress controller pattern lets you use a single cloud load balancer for all HTTP/HTTPS Services, with SSL termination, rate limiting, authentication, and canary routing handled at the Ingress layer.
Q14: What are taints and tolerations? Give a real production use case.
Taints are applied to nodes to repel Pods that do not explicitly tolerate them. Tolerations are applied to Pods to allow scheduling on tainted nodes. Together, they implement node specialisation: certain Pods run only on certain nodes. Real production use case: GPU nodes for ML workloads. GPU nodes are expensive. Taint them with nvidia.com/gpu=present:NoSchedule. Only ML inference Pods with a matching toleration will schedule on GPU nodes β€” all other Pods are automatically repelled without needing nodeSelectors. Another use case: dedicated nodes for your ingress controller (taint with ingress=true:NoSchedule, add toleration only to ingress Pods) so ingress processing doesn't share CPU with application workloads. Also used for spot/preemptible nodes β€” taint them with spot=true:NoSchedule so only fault-tolerant batch workloads run on them.
Q15: How do you debug a situation where a Service is not routing traffic to Pods?
Systematic approach: First, check kubectl get endpoints <service-name> β€” if the endpoints list is empty, the Service selector does not match any Pod labels, or all matching Pods have failing readiness probes. Second, compare the Service's selector (kubectl describe svc <name>) with the actual Pod labels (kubectl get pods --show-labels) β€” a single typo or missing label breaks routing. Third, if endpoints exist but traffic still fails, the issue is at the kube-proxy / iptables level or the application itself. Test by curling directly to the Pod IP (bypassing the Service): kubectl exec <pod> -- curl http://<pod-ip>:8080. If that works, the app is fine and the problem is in Service routing. Use kubectl port-forward svc/<name> to test the Service IP directly from your machine.

☸️ Explore Kubernetes on the Interactive Mind Map

See how Kubernetes connects to Docker, Helm, ArgoCD, Prometheus, AWS, and more β€” with real commands per tool.

Open Interactive Mind Map ← DevOps Basics First
// Continue Learning
🐳 Docker βš™οΈ CI/CD ☁️ AWS πŸ—‚οΈ Terraform πŸ“Š Prometheus 🐧 Linux 🌿 Git
Advertisement
β˜• Support Master DevOps

All 9 guides on this site are 100% free. If this helped you prepare for an interview or learn something useful, your support keeps the project alive.

β˜• Ko-fi β€” International πŸ’³ Razorpay β€” India
☸️
Written by Dhanush R
Senior DevOps Engineer Β· 4.5+ Years Β· Bengaluru Β· AWS EKS Β· Kubernetes Β· Terraform

DevOps engineer with 4.5+ years of hands-on production experience in Kubernetes, AWS, CI/CD, Terraform, and SRE. Every guide on this platform comes from real incident response, cluster management, and interview experience β€” not from copying documentation. Last updated: May 2026.

πŸ“Έ Instagram ▢️ YouTube πŸ’Ό LinkedIn About β†’
πŸŒ™