Docker Complete Guide 2026: Images, Multi-Stage Builds, Networking & Security
- What is Docker? Containers vs Virtual Machines
- Image Layers, Caching, and How Docker Builds Work
- Dockerfile Instructions Deep Dive
- Multi-Stage Builds β The Production Standard
- Docker Networking Modes Explained
- Volumes and Persistent Data Management
- Container Security Hardening Checklist
- Docker Compose for Local Development
- Working with Container Registries
- Debugging Running Containers
- Essential Docker Command Reference
- 12 Docker Interview Questions with Expert Answers
Docker is the foundation of modern DevOps. Every CI/CD pipeline builds Docker images. Every Kubernetes workload runs containers. I have built, secured, and debugged Docker images daily in production for 4.5 years β managing multi-stage Dockerfiles for Java, Go, Node.js, and Python services, enforcing container security policies, and investigating production incidents caused by Dockerfile anti-patterns. This guide covers everything you need to use Docker correctly and answer every Docker interview question confidently.
What is Docker? Containers vs Virtual Machines
Docker packages an application and all its dependencies β libraries, runtime, configuration β into a portable, self-contained unit called a container. Containers solve the "works on my machine" problem permanently: the exact same image runs identically on a developer's laptop, a GitHub Actions CI runner, and a production Kubernetes node.
Under the hood, Docker is not magic. It uses two Linux kernel features that have existed since around 2006. Namespaces provide isolation: each container gets its own process tree (PID namespace), network stack (network namespace), filesystem view (mount namespace), and hostname (UTS namespace). cgroups (control groups) provide resource limits: the kernel enforces the CPU and memory limits you configure, killing processes that exceed their memory limit (OOMKill). Docker is a user-friendly management layer on top of these kernel primitives.
Image Layers, Caching, and How Docker Builds Work
A Docker image is a stack of read-only layers. Each instruction in a Dockerfile (FROM, RUN, COPY, ADD) creates a new layer. When you run a container, Docker adds a thin read-write layer on top of the image layers β all writes inside the container go to this writable layer. When the container is deleted, this layer is deleted. The read-only image layers are shared between all containers using the same image, which is why Docker is storage-efficient even when running dozens of containers.
Layer caching is what makes Docker builds fast. If an instruction and all its inputs are identical to a previous build, Docker reuses the cached layer and skips re-running the instruction. Cache is invalidated when: the instruction text changes, any input file changes (for COPY/ADD), or any preceding layer's cache is invalidated. This has a critical implication: the order of instructions in your Dockerfile directly determines how effective layer caching is.
Dockerfile Instructions Deep Dive
Every Dockerfile instruction has specific semantics that affect build performance, image size, and container runtime behaviour. Here are the ones that come up most in production and interviews:
- FROM β The base image. Always pin to a specific digest or version tag in production (
node:20.11-alpine3.19, notnode:latest).latestis a moving target that breaks builds when the upstream image changes. Usealpinevariants for smaller attack surface and image size β anode:20-alpineimage is ~180MB vs ~1.1GB fornode:20. - RUN β Executes a command and creates a new layer. Chain related commands with
&&and clean up in the same RUN instruction to avoid persisting temporary files in a layer:RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*. Each RUN creates exactly one layer regardless of how many commands are chained. - COPY vs ADD β Use COPY for copying files from the build context. ADD has additional magic (automatic tar extraction, URL downloads) that makes it unpredictable. Only use ADD when you specifically need its extra features. COPY is explicit and predictable.
- CMD vs ENTRYPOINT β ENTRYPOINT defines the executable that always runs. CMD provides default arguments that can be overridden at runtime. Always use exec form (
["node", "server.js"]) not shell form (node server.js). Shell form wraps your process in/bin/sh -c, making your app a child of sh. When Kubernetes sends SIGTERM for graceful shutdown, sh intercepts it and your application never receives the signal. - USER β Sets the user for all subsequent RUN, CMD, and ENTRYPOINT instructions. Create a non-root user and switch to it before the final CMD. Running as root in a container is a critical security risk.
- WORKDIR β Sets the working directory. Always set explicitly β never rely on implicit working directories.
- ARG vs ENV β ARG variables are only available during the build process and are not persisted in the final image or container. ENV variables are available at build time and at runtime inside the container. Never put secrets in either β they are visible in
docker historyanddocker inspect.
.dockerignore file in your project root. Without it, COPY . . sends your entire project directory as the build context β including node_modules, .git, *.log files, and environment files containing secrets. A proper .dockerignore file is as important as .gitignore.Multi-Stage Builds β The Production Standard
Multi-stage builds are the single most important Dockerfile pattern for production images. They allow you to use a large, build-tool-heavy image to compile and test your application, then produce a minimal runtime image containing only the compiled output and its runtime dependencies. The final image contains zero build tools, test frameworks, or source code β dramatically reducing image size and attack surface.
In production, I have seen multi-stage builds reduce Java Spring Boot images from 800MB to 120MB and Go images from 1.2GB to 12MB. Smaller images mean faster pulls, faster deployments, less storage cost, and fewer CVEs to patch.
Docker Networking Modes Explained
Docker provides several networking modes, each with different isolation characteristics and use cases. Understanding these is essential for both production container design and security interviews.
- bridge (default) β Docker creates a virtual network bridge (
docker0) on the host. Each container gets a unique IP on this bridge network. Containers communicate with each other by container name (using Docker's embedded DNS). External traffic reaches containers via port mapping (-p 8080:80maps host port 8080 to container port 80). Every container you run without specifying a network is on the default bridge. For production Docker Compose workloads, Docker automatically creates a dedicated bridge network per Compose project so services can reach each other by service name. - host β The container shares the host machine's network stack entirely β no network isolation. The container uses the host's IP and ports directly. Use only for monitoring agents or performance-critical network tools where even bridge overhead is unacceptable. Never for application containers β a vulnerability in the app could directly expose the host network.
- none β No networking. The container has a loopback interface only. Use for offline batch processing jobs that read local files and produce local output with no network requirement.
- overlay β Multi-host networking for Docker Swarm. Creates an encrypted tunnel between Docker hosts so containers on different machines can communicate. Kubernetes uses its own CNI-based network model and does not use Docker overlay networks.
- macvlan β Assigns a real MAC address to each container, making it appear as a physical device on the network. Use when containers must be reachable on the physical network with their own IPs β for legacy applications that bind to specific IPs.
Volumes and Persistent Data Management
Containers are ephemeral by design. All data written inside a container's writable layer is lost when the container is removed. For data that must survive container restarts and removals, Docker provides three storage mechanisms:
- Named Volumes β Managed by Docker, stored in Docker's dedicated storage area (
/var/lib/docker/volumes/). Referenced by name:docker run -v pgdata:/var/lib/postgresql/data postgres. Named volumes persist across container stop/start and container removal. They can be shared between multiple containers. This is the preferred approach for production persistent data. - Bind Mounts β Maps a specific host path into the container:
docker run -v /host/path:/container/path myapp. Ideal for development (mount your source code directory into the container for live reloading). Problematic in production β the container depends on a specific host path, making the container non-portable and risking host file system access if the container is compromised. - tmpfs Mounts β In-memory storage that never touches disk:
docker run --tmpfs /tmp myapp. Data is cleared when the container stops. Use for temporary files, session data, or any sensitive data (secrets, tokens) that should not be persisted to disk.
docker-compose down -v (the -v flag removes volumes), you lose all data. For production, databases should not run in Docker at all β use a managed service (AWS RDS, Cloud SQL) or Kubernetes StatefulSet with proper backup automation.Container Security Hardening Checklist
Container security is tested in every Senior DevOps and platform engineering interview. I use this checklist during production container security reviews:
- Never run as root. Add
USER 1001(or a named non-root user) to your Dockerfile before the final CMD/ENTRYPOINT. Running as root inside a container means a container escape gives the attacker root on the host. Most application processes have no legitimate need for root access. - Use minimal base images. Alpine-based images, distroless images (Google's gcr.io/distroless), or scratch (for statically compiled Go binaries) dramatically reduce the number of installed packages and therefore the number of CVEs. Every package you don't install is a vulnerability you don't have.
- Pin image versions. Use
FROM node:20.11.0-alpine3.19or even better, use image digests (FROM node@sha256:abc123...). Never uselatestβ it changes without notice and can silently introduce breaking changes or vulnerabilities. - Scan images for CVEs. Integrate
docker scout, Trivy, Grype, or Snyk into your CI/CD pipeline. Block deployments when critical CVEs are found. Scan both base images and application dependencies. - Never embed secrets in images. Secrets passed as build ARGs are stored in the image layer history and are visible to anyone with
docker history. Pass secrets at runtime via environment variables (from Kubernetes Secrets or Vault) or use Docker BuildKit's secret mounts (RUN --mount=type=secret) which are never persisted in layers. - Enable read-only filesystem. Run containers with
--read-onlyand explicit--tmpfsfor directories that need write access. This prevents malware from modifying the container filesystem if the application is compromised. - Drop capabilities. Kubernetes and Docker both support dropping Linux capabilities. Drop all capabilities and add back only what the application specifically needs:
--cap-drop=ALL --cap-add=NET_BIND_SERVICE. - Set resource limits. Always set CPU and memory limits (
docker run --memory=512m --cpus=0.5). Without limits, a single runaway container can exhaust all host resources and take down every other container on the machine.
Docker Compose for Local Development
Docker Compose defines and runs multi-container applications from a single YAML file. It is the standard tool for local development environments where you need your application, a database, a cache, and perhaps a message broker all running together with a single command.
Working with Container Registries
A container registry stores and distributes Docker images. Docker Hub is the public default. For production, use a private registry: AWS ECR (Elastic Container Registry), Google Artifact Registry, Azure Container Registry, or self-hosted Harbor. Private registries give you access control, vulnerability scanning, and keep proprietary images private.
Debugging Running Containers
Effective container debugging separates senior DevOps engineers from juniors. Here are the techniques I use most in production incidents:
Essential Docker Command Reference
docker build -t name:tag . --no-cacheβ Build image forcing full rebuild (skip cache)docker imagesβ List all local images with sizedocker image prune -aβ Remove all unused images (reclaim disk space)docker ps -aβ List all containers including stopped onesdocker run -d --name myapp -p 8080:8080 --restart=unless-stopped myimageβ Run detached with restart policydocker stop container && docker rm containerβ Stop then remove a containerdocker-compose up -d --buildβ Rebuild and start all Compose services in backgrounddocker-compose down -vβ Stop and remove containers AND volumes (data loss risk!)docker system prune -af --volumesβ Nuclear clean: remove all unused images, containers, networks, volumesdocker history image:tagβ See all layers with their sizes (spot oversized layers)docker save image:tag | gzip > image.tar.gzβ Export image to a file for air-gapped transferdocker load < image.tar.gzβ Import a previously exported image
12 Docker Interview Questions with Expert Answers
docker run time. Together, ENTRYPOINT + CMD = the full command. Exec form (["node", "server.js"]) runs the process directly as PID 1. Shell form (node server.js) runs as /bin/sh -c "node server.js", making your app a child process of sh. The critical runtime difference: when Docker or Kubernetes sends SIGTERM for graceful shutdown, shell form containers have sh as PID 1 β sh doesn't forward signals to child processes by default, so your app never receives SIGTERM and eventually gets SIGKILL after the grace period. This causes ungraceful shutdowns, dropped requests, and slow rolling deployments. Always use exec form for production containers.--restart=unless-stopped, Docker restarts the container β which will OOMKill again if the memory limit is still too low, creating a restart loop. In Kubernetes, this shows as OOMKilled in kubectl describe pod and CrashLoopBackOff if it keeps happening. Root cause is almost always: memory limit set too low for actual usage, a memory leak in the application, or a Java application not respecting container memory limits (set -XX:MaxRAMPercentage=75 for JVM containers to leave headroom for non-heap memory).RUN --mount=type=cache,target=/root/.m2 mvn package) to persist Maven/npm caches between pipeline runs even when source changes.kubectl debug). For Python and Node.js apps with many dynamic dependencies, distroless is harder to use correctly β Alpine is a good alternative.docker run -e DB_PASSWORD=$SECRET or Kubernetes Secrets mounted as env vars. The secret is never in the image. (2) BuildKit secret mounts for build-time secrets β RUN --mount=type=secret,id=npmrc,dst=/root/.npmrc npm install mounts the secret only during that RUN instruction and never persists it in the image layer. (3) Volume-mounted secret files β mount secrets from Kubernetes Secrets or Vault Agent as files at a path inside the container at runtime. Never use ENV or ARG for secrets β both are visible in docker history and docker inspect. Never commit .env files containing real secrets to Git./var/lib/docker/volumes/), its lifecycle, and provides commands to inspect, backup, and remove it. Volumes are portable (work the same on any Docker host), can use volume drivers for cloud storage backends, and are the recommended approach for production persistent data. A bind mount maps a specific, absolute path on the host filesystem directly into the container. The container can read and write files on the host. Bind mounts are ideal for development (mount your source code for live reload) but problematic in production: they create a tight dependency on host filesystem layout, don't work well in container orchestration, and give the container potential access to sensitive host directories.http://postgres:5432, Docker's embedded DNS resolves "postgres" to the postgres container's IP on the bridge network β no port mapping or external DNS required. This works only on user-defined bridges, not the default bridge (where containers can only reach each other by IP). On the default bridge, you must use --link (deprecated) or explicitly use IPs. For production multi-container applications, always define an explicit network in Docker Compose or your orchestration layer.DOCKER_BUILDKIT=1 docker build . or docker buildx build .. In CI/CD, use docker buildx build --cache-from type=registry,ref=registry/image:cache --cache-to type=registry,ref=registry/image:cache,mode=max . to share build cache across pipeline runs.docker run -it --pid=container:myapp --net=container:myapp busybox sh, attaching a debug container to the same PID and network namespaces as the target container. In Kubernetes, use kubectl debug -it pod/myapp --image=busybox --target=myapp. (2) Override the entrypoint at run time β docker run --entrypoint sh myimage β only works if the image contains a shell. (3) Use nsenter on the host β docker inspect myapp | grep Pid gives the host PID, then nsenter -t PID -m -u -i -n -p sh enters all namespaces. Method 1 is the most practical for production debugging without modifying the running container.curl in a RUN instruction with checksum verification instead.docker history image:tag to identify which layers are largest. (2) Switch to a minimal base image β from ubuntu to debian-slim to alpine to distroless. (3) Use multi-stage builds to separate build from runtime β the most impactful change. (4) Chain RUN commands and clean up in the same layer: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* β the cleanup must be in the same RUN instruction, otherwise the deleted files still exist in the previous layer. (5) Use --no-install-recommends with apt-get. (6) Delete downloaded archives, test files, documentation, and examples in the same RUN instruction that installs them. (7) Use Dive (a CLI tool) to inspect every layer and find hidden large files.π³ Explore Docker on the Interactive Mind Map
See how Docker connects to Kubernetes, CI/CD pipelines, container registries, and more.
Open Interactive Mind Map βΈοΈ Kubernetes Next β