Linux Commands for DevOps Engineers: Essential Guide with Real Examples
📅 Updated April 2026 ·
📅 April 2026⏱ 10 min read🏷 Linux · DevOps · SRE · Shell
👨💻
master.devops
Practising DevOps Engineer with deep hands-on experience in Kubernetes, AWS, CI/CD, and SRE. Every guide is written from real production work.
Linux is the operating system that runs every server, container, and Kubernetes node in production.
Every DevOps engineer spends hours every week in a terminal. At a top enterprise, I use Linux daily
for debugging production issues, managing servers, writing automation scripts, and investigating
performance problems. This guide covers the commands and concepts that appear most often in
DevOps interviews and real on-call situations.
File System Navigation and Permissions
# Navigate and explore
pwd # print working directory
ls -lah # long listing with hidden files and human-readable sizes
find /var/log -name "*.log" -mtime -1 # find logs modified in last 24h
find / -perm /4000 2>/dev/null # find setuid files (security audit)
du -sh /var/log/* # disk usage per directory
df -hT # disk free with filesystem type
File Permissions — chmod, chown, umask
Linux permissions use a three-group model: owner, group,
others. Each group has three bits: read (4), write (2), execute (1).
chmod 755 means owner=7(rwx), group=5(r-x), others=5(r-x).
# Octal notation
chmod 755 script.sh # rwxr-xr-x
chmod 600 ~/.ssh/id_rsa # rw------- (SSH key must be 600)
chmod 644 /etc/nginx/nginx.conf
# Symbolic notation
chmod u+x script.sh # add execute for owner
chmod g-w file.txt # remove write from group
chmod o=r file.txt # set others to read-only
chmod -R 755 /var/www/html # recursive# Change ownership
chown appuser:appgroup app.jar
chown -R nginx:nginx /var/www
# umask — default permission mask
umask 022 # files created as 644, dirs as 755
umask 027 # more restrictive: files 640, dirs 750
Process Management
# View processes
ps aux # all processes with CPU and memory
ps aux | grep java # filter for Java processes
top # live viewer (press 1 for per-CPU, M for memory sort)
htop # interactive top (install separately)# Kill processes
kill -15 PID # SIGTERM — graceful shutdown (try this first)
kill -9 PID # SIGKILL — force kill (last resort)
kill -9 $(lsof -t -i:8080) # kill process on port 8080
pkill -f "java.*api" # kill by process name pattern# Process priority
nice -n 10 ./heavy-script.sh # start with lower priority
renice -n 5 -p PID # change running process priority# Background jobs
nohup ./long-script.sh & # run detached from terminal
./script.sh > output.log 2>&1 & # redirect stdout+stderr, background
Systemd — Managing Services
Systemd is the init system on all modern Linux distributions. In DevOps work, you use systemd to
manage long-running services, investigate service failures, and view structured logs.
# Service management
systemctl start nginx
systemctl stop nginx
systemctl restart nginx
systemctl reload nginx # reload config without restart (if supported)
systemctl status nginx # detailed status with recent log lines
systemctl enable nginx # start on boot
systemctl disable nginx
# View logs with journalctl
journalctl -u nginx # all logs for nginx service
journalctl -u nginx -f # follow (tail -f equivalent)
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx -p err # errors only
journalctl --disk-usage # how much disk logs use
Networking Commands
# Ports and connections
ss -tulnp # listening ports with process names (modern netstat)
ss -tulnp | grep :8080 # which process is on port 8080?
netstat -tulnp # older equivalent (deprecated but still common)
lsof -i :8080 # processes using port 8080# IP and routing
ip addr show # interface IPs (replaces ifconfig)
ip route show # routing table
ip link show # interface status# DNS debugging
dig api.company.com # full DNS response
dig +short api.company.com # IP only
dig @8.8.8.8 api.company.com # query specific DNS server
nslookup api.company.com
cat /etc/resolv.conf # which DNS servers this machine uses# HTTP testing
curl -v https://api.company.com/health
curl -H "Authorization: Bearer $TOKEN" -X POST https://api/data -d '{"key":"val"}'
wget -O- https://api.company.com/health
# Network path tracing
traceroute api.company.com
mtr api.company.com # interactive traceroute
ping -c 4 api.company.com
Log Analysis — grep, awk, sed
# grep — search
grep "ERROR" /var/log/app.log
grep -i "error" app.log # case-insensitive
grep -r "NullPointerException" /var/log/
grep -c "ERROR" app.log # count matching lines
grep -v "DEBUG" app.log # exclude DEBUG lines
grep -A 5 "FATAL" app.log # 5 lines after match
grep -B 3 "FATAL" app.log # 3 lines before match# Count errors in last hour from timestamped logs
awk '/2026-04-13 1[5-6]:/ && /ERROR/' app.log | wc -l
# Extract specific fields
awk '{print $1, $4, $9}' /var/log/nginx/access.log # IP, timestamp, status
awk -F: '{print $1}' /etc/passwd # usernames only# Top 10 IPs hitting your server
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# sed — stream editor
sed -i 's/old-value/new-value/g' config.yaml # replace in file
sed -n '100,200p' large.log # print lines 100-200
sed '/^#/d' config.txt # delete comment lines
Performance Analysis
# Memory
free -h # RAM and swap summary
cat /proc/meminfo # detailed memory info
vmstat 1 5 # system stats every 1s, 5 times# CPU
top # press 1 for per-core view
mpstat -P ALL 1 # per-CPU stats
cat /proc/cpuinfo | grep "model name" | head -1
# Disk I/O
iostat -xz 1 # extended I/O stats per device
iotop # per-process I/O (like top for disk)# Load average
uptime # 1min, 5min, 15min load averages# Load average > number of CPU cores = system is overloaded
Interview Q&A
Q1: What does chmod 777 do and why is it dangerous?
chmod 777 gives read, write, and execute permission to owner, group, and all other users. It means any user on the system can modify or execute the file. This is dangerous because: any compromised process can modify the file, a web application running as www-data can write malicious code, and it violates the principle of least privilege. In production, files should typically be 644 (readable by all, writable by owner only) and scripts 755 (executable by all, writable by owner only). Config files with credentials should be 600 (only owner can read/write).
Q2: How do you find which process is using port 8080?
ss -tulnp | grep :8080 on modern systems. Or lsof -i :8080. Both show the PID and process name. Add sudo to see processes owned by other users. Once you have the PID, use ps -p PID -o cmd to see the full command with arguments. If you want to kill it: kill -15 $(lsof -t -i:8080) — try SIGTERM (15) first, then SIGKILL (9) only if the process does not respond.
Q3: What is a zombie process and how do you handle it?
A zombie process (shown as Z in ps output) is a process that has finished execution but whose entry in the process table has not been cleaned up — because the parent process has not called wait() to read the exit status. Zombies consume no CPU or memory, just a process table slot. They cannot be killed with kill -9 (they are already dead). The solution is to fix the parent process to properly collect child exit codes. If the parent is not fixable, killing the parent causes its zombie children to be adopted by init (PID 1), which does collect exit codes.
These three tools are the bread and butter of log analysis and data extraction in any Linux environment.
In production SRE work you will use them daily — parsing Nginx access logs, extracting error rates,
transforming configuration files, and building ad-hoc monitoring scripts.
# grep — search text
grep "ERROR" /var/log/app.log # find lines with ERROR
grep -i "error" /var/log/app.log # case-insensitive
grep -r "TODO" /app/src/ # recursive in directory
grep -c "ERROR" /var/log/app.log # count matching lines
grep -v "DEBUG" /var/log/app.log # exclude DEBUG lines
grep -E "ERROR|WARN" /var/log/app.log # regex: ERROR or WARN
grep -B2 -A5 "OutOfMemoryError" app.log # 2 lines before, 5 after# awk — field processing
awk '{print $1, $7}' access.log # print column 1 and 7
awk -F: '{print $1}' /etc/passwd # colon delimiter, print usernames
awk '$9 == 500' access.log # lines where field 9 is 500 (HTTP 500s)
awk '{sum+=$10} END {print sum}' log # sum column 10 (bytes sent)# sed — stream editor
sed 's/foo/bar/g' file.txt # replace foo with bar globally
sed -i 's/localhost/db.prod.internal/g' config.yaml # in-place edit
sed -n '50,100p' large.log # print lines 50-100
sed '/^#/d' config.conf # delete comment lines
Real production use: Top 10 most common HTTP 500 URLs in Nginx log: grep " 500 " /var/log/nginx/access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -10
Performance Analysis
When a service is slow or a node is under pressure, these commands help you pinpoint the bottleneck
within seconds. Kubernetes nodes are simply Linux servers — the same tools apply for debugging
pod-level and node-level performance issues.
# CPU and memory
top # live view — press 1 for per-core, M to sort by mem
vmstat 1 10 # 10 snapshots at 1s intervals: CPU swap IO
free -h # memory summary with human units# Disk I/O
iostat -xz 1 # disk utilisation per device (await = latency ms)
iotop # per-process disk I/O (like top for disk)
lsblk # block devices and mount points# Network
iftop -i eth0 # live bandwidth per connection
ss -s # socket statistics summary (connections by state)
Interview tip: When asked "how do you debug a slow server?", answer in layers:
CPU (top/vmstat) → Memory (free) → Disk I/O (iostat) → Network (ss, iftop) → Application logs (journalctl, grep).
Interviewers want to see a systematic approach.
Shell Scripting Fundamentals
Shell scripting is how DevOps engineers automate repetitive tasks — health checks, log rotation,
deployment helpers, and on-call runbooks. A well-written Bash script saves an entire team hours per week.
#!/bin/bash
set -euo pipefail # exit on error, undefined var, pipe failure
ENV=${1:-staging} # first arg, default to staging
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
check_health() {
local URL=$1
local STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$URL/health")
[[ "$STATUS" == "200" ]] && echo "OK: $URL" || echo "FAIL: $URL returned $STATUS"
}
for HOST in api.staging payment.staging; do
check_health "https://${HOST}.internal"
done
Cron Jobs and Scheduling
Cron is the standard scheduler on Linux systems. DevOps engineers use cron for log rotation, backup scripts,
health checks, certificate renewal, and metric collection.
Common mistake: Cron runs with a minimal environment — $PATH is not your interactive shell's PATH.
Always use absolute paths (/usr/bin/python3 not python3). Check journalctl -u cron when cron jobs fail silently.
Linux Interview Questions & Answers
Q: What is the difference between a hard link and a symbolic link?
A hard link is another directory entry pointing to the same inode — both point to the same data on disk. Deleting one does not affect the other. Hard links cannot span filesystems or point to directories. A symbolic link (symlink) is a special file containing a path to another file. If the target is deleted, the symlink breaks. Symlinks can cross filesystems and point to directories. In DevOps, symlinks are commonly used to manage versioned binaries (/usr/local/bin/python → python3.11).
Q: What does set -euo pipefail mean?
set -e exits immediately if any command returns non-zero. set -u treats undefined variables as errors — prevents bugs where a misspelled variable silently evaluates to empty. set -o pipefail makes a pipeline fail if any command in it fails, not just the last one. Without pipefail, false | true would succeed. Every production Bash script should start with this combination.
Q: A server is responding slowly. What is your diagnostic process?
I follow a layered approach: 1) Check CPU with top or vmstat 1 — is any process at 100%? 2) Check memory with free -h — is the server swapping? Swapping kills performance. 3) Check disk I/O with iostat -xz 1 — is await (disk latency) high? 4) Check network with ss -s — thousands of CLOSE_WAIT connections indicate connection pool exhaustion. 5) Check application logs with journalctl -u app -p err. Each layer narrows the root cause.
Q: How do you find what is listening on port 8080?
ss -tulnp | grep :8080 (modern, preferred) or lsof -i :8080. Both show the PID and process name. Add sudo if the process is owned by root. Once you have the PID, use ps -p PID -o pid,cmd,user for details.
Master DevOps is a community of practising DevOps and SRE engineers sharing real production knowledge —
from Kubernetes internals to CI/CD pipeline design. All content is written from hands-on experience,
not copied from documentation. Our mission: make senior-level DevOps knowledge free for everyone.