← Home 🗺️ Mind Map ☕ Ko-fi 💳 Razorpay
// Prometheus & Grafana Guide · Observability

Prometheus & Grafana Complete Guide: PromQL, Alerting & SLO Monitoring

📅 Updated April 2026 · 📅 April 2026 ⏱ 12 min read 🏷 Prometheus · Grafana · Observability · SRE · Monitoring
👨‍💻
master.devops
Practising DevOps Engineer with deep hands-on experience in Kubernetes, AWS, CI/CD, and SRE. Every guide is written from real production work.

Prometheus and Grafana are the standard observability stack for Kubernetes environments. I have deployed and tuned Prometheus in production for monitoring production EKS clusters — writing PromQL queries for SLO burn rate alerts, building Grafana dashboards for engineering teams, and configuring Alertmanager for on-call routing. This guide covers everything from the data model to production SLO alerting.

How Prometheus Works — The Pull Model

Prometheus uses a pull model — it scrapes metrics from targets on a schedule (every 15 seconds by default). This is fundamentally different from push-based systems like StatsD or InfluxDB where applications push metrics to the monitoring system. The pull model means: Prometheus controls the scrape rate, failed scrapes generate an alert (missing target), and no agent needs to be installed in your application (just expose a /metrics endpoint).

Pull vs Push — the interview answer: Pull is easier to reason about (Prometheus knows exactly what it is monitoring), makes target discovery more natural (Prometheus finds your pods via Kubernetes Service Discovery), and avoids push storms where all services push simultaneously. Push works better for ephemeral jobs (batch jobs, cron jobs) — use pushgateway for these.

Prometheus Data Model

Every metric in Prometheus is a time series identified by a metric name and a set of key-value labels. Labels are what make Prometheus powerful — they allow you to slice and aggregate metrics by any dimension.

# Example: HTTP request counter with labels http_requests_total{ method="GET", path="/api/users", status="200", service="api", namespace="production" } 1847 1713000000000 # The same metric with different label combinations http_requests_total{method="POST", path="/api/users", status="201", ...} 234 http_requests_total{method="GET", path="/api/users", status="500", ...} 12

Four Metric Types

PromQL — Essential Queries

# Request rate (requests per second over last 5 minutes) rate(http_requests_total{namespace="production"}[5m]) # Error rate (percentage of 5xx responses) rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) # p99 latency from histogram histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{ namespace="production", service="api" }[5m]) ) # Top 10 memory-consuming pods topk(10, container_memory_working_set_bytes{namespace="production", container!=""} ) # CPU utilisation per pod (% of request) rate(container_cpu_usage_seconds_total{namespace="production"}[5m]) / on(pod, namespace) kube_pod_container_resource_requests{resource="cpu"} # Pods not ready in production kube_pod_status_ready{namespace="production", condition="true"} == 0 # SLO burn rate alert query (1-hour window, 14x burn rate) ( rate(http_requests_total{status=~"5.."}[1h]) / rate(http_requests_total[1h]) ) > (1 - 0.999) * 14 # 0.999 = 99.9% SLO, 14x = fast burn

Recording Rules — Performance Optimisation

Complex PromQL queries run on every panel load and every alert evaluation. For expensive queries (wide range vectors, many series), use recording rules to pre-compute the result every scrape interval. This dramatically reduces query time for dashboards and ensures alerts evaluate quickly.

# rules/slo_rules.yaml groups: - name: slo.rules interval: 30s rules: - record: job:http_requests_total:rate5m expr: sum(rate(http_requests_total[5m])) by (service, namespace) - record: job:http_errors_total:rate5m expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (service, namespace) - record: job:http_error_ratio:rate5m expr: | job:http_errors_total:rate5m / job:http_requests_total:rate5m

Alertmanager — Alert Routing

# alertmanager.yaml — route alerts by team global: slack_api_url: 'https://hooks.slack.com/services/...' route: group_by: ['alertname', 'namespace'] group_wait: 30s # wait 30s to group related alerts group_interval: 5m repeat_interval: 4h # resend unresolved alert every 4h receiver: 'slack-general' routes: - match: severity: critical namespace: production receiver: 'pagerduty-oncall' - match: team: platform receiver: 'slack-platform' receivers: - name: 'slack-general' slack_configs: - channel: '#alerts' title: '{{ .CommonAnnotations.summary }}' text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}' - name: 'pagerduty-oncall' pagerduty_configs: - service_key: '$PD_SERVICE_KEY' inhibit_rules: # suppress warning if critical already firing - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'namespace']

Grafana Dashboards

Grafana queries Prometheus (and other data sources like Loki, Tempo, CloudWatch) and renders visualisations. In production, store dashboards as JSON in Git and provision them via ConfigMaps in Kubernetes — this is "dashboard-as-code" and means dashboards are version-controlled and reproducible.

The Four Golden Signals (Google SRE Book)

Interview Q&A

Q1: Counter vs Gauge vs Histogram — when to use each?
Counter: for things that only increase — request count, error count, bytes transferred. Always query with rate() over a time window. Gauge: for values that fluctuate up and down — current memory usage, active connections, queue depth, number of running pods. Query directly. Histogram: for measuring distributions — request latency, request size. Allows calculating any percentile (p50, p95, p99) via histogram_quantile(). Use histogram for any SLO that involves latency. Never use Summary when you need to aggregate across multiple instances — only Histogram supports cross-instance aggregation.
Q2: How do you implement SLO burn rate alerting?
Burn rate = current error rate / (1 - SLO target). For a 99.9% SLO, the error budget is 0.1% per month. A burn rate of 1 means you are exactly consuming budget at the sustainable rate. A burn rate of 14 means you will exhaust the entire month's budget in 2 days. The standard approach (from Google's SRE Workbook) uses two burn rate windows: a fast window (1h) catches fast-burning outages quickly, a slow window (6h) catches slow burns. Alert when BOTH windows exceed the burn rate threshold — this reduces false positives from brief spikes.
Q3: What is Grafana Loki and how does it differ from Elasticsearch?
Loki is a log aggregation system designed to work with Prometheus — it uses the same label model and ships with Grafana. Unlike Elasticsearch, Loki does NOT index log content — it only indexes labels (pod name, namespace, app). Full-text search uses streaming grep over compressed log chunks. This makes Loki dramatically cheaper (10x less storage, no Lucene indexing overhead) but slower for ad-hoc full-text search. Use Loki when: you already have Prometheus/Grafana, cost is a concern, your team queries logs by service/pod rather than arbitrary full-text search. Use Elasticsearch when: you need powerful full-text search, complex aggregations, or compliance requirements for searchable audit logs.
// More Guides
📖 DevOps ☸️ Kubernetes 🐳 Docker ⚙️ CI/CD 🗂️ Terraform 🐧 Linux 🌿 Git ☁️ AWS 📊 Prometheus

📊 Explore Prometheus on the Interactive Mind Map

See how Prometheus and Grafana connect to Kubernetes, Splunk, Istio, and AWS — with real PromQL examples and interview Q&A.

Open Interactive Mind Map ← AWS Guide
🚀 Want the complete DevOps interview kit?
Full notes, Q&A cheat sheets, real commands — all tools covered.
💳 Get Complete DevOps Kit →

Prometheus monitors Kubernetes workloads. See how to set up the full stack — from cluster to alerts — in the Kubernetes guide →

📩 Get Free DevOps Interview Notes

Cheat sheets, real commands, interview Q&As — free.

No spam · Follow @master.devops for daily tips

// Continue Learning
☸️Kubernetes — Prometheus monitors K8s pods and nodes ☁️AWS — Export CloudWatch metrics to Prometheus ⚙️CI/CD — Alert on deployment failures automatically

Prometheus Data Model & Metric Types

Prometheus stores data as time series — streams of timestamped float64 values identified by a metric name and key-value labels. Understanding this model is essential before writing PromQL, because it determines what queries are possible.

Type What it measures Example
CounterMonotonically increasing. Never goes down (except restart)http_requests_total, errors_total
GaugeCurrent value — goes up or downmemory_bytes, active_connections
HistogramDistribution across configurable bucketsRequest latency (p50/p95/p99)
SummaryPre-calculated quantiles client-sideGC pause time, response size

Essential PromQL Queries

These are the queries every SRE and DevOps engineer needs to know — they cover the four golden signals and appear regularly in interviews and on-call runbooks.

# Request rate (per second, 5-min window) rate(http_requests_total[5m]) # Error rate as percentage rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) * 100 # p99 latency — the #1 interview PromQL question histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service) ) # CPU usage per pod in Kubernetes sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod) # Memory usage MB per pod container_memory_working_set_bytes{namespace="production"} / 1024 / 1024 # Node disk usage percentage (node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
Key rule: Always use rate() on counters for dashboards (averages over window — stable for graphs). Use irate() only when you need per-second precision on the last two data points (alerts on sudden spikes).

Alertmanager — Production Alerting Rules

# prometheus-rules.yaml groups: - name: api-slo-alerts rules: - alert: HighErrorRate expr: | rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate: {{ $labels.service }}" description: "Error rate {{ $value | humanizePercentage }}" - alert: HighP99Latency expr: | histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]) ) > 1.0 for: 3m labels: severity: warning - alert: PodCrashLooping expr: increase(kube_pod_container_status_restarts_total[30m]) > 3 for: 0m labels: severity: critical

Prometheus & Grafana Interview Questions

Q: Counter vs Gauge — when do you use each?
A Counter only ever increases (or resets to zero on restart). Use it for things that accumulate: total requests, total errors, total bytes. Always query counters with rate() or increase(). A Gauge represents a current value that can go up or down — memory usage, active connections, queue depth. Query gauges directly. The key rule: if you are counting events, use Counter. If you are measuring current state, use Gauge.
Q: How do you write a p99 latency PromQL query?
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])). You must use rate() on the _bucket metric before passing to histogram_quantile. The le label must be included in any aggregation: sum(...) by (le, service). Omitting le in the by() clause breaks the quantile calculation — this is the most common PromQL mistake in interviews.
Q: What are recording rules and why do you need them?
Recording rules pre-compute expensive PromQL queries and store results as new time series. Without them, a dashboard with 20 panels each running complex histogram_quantile queries across millions of series will time out. With recording rules, the query runs once on schedule (e.g., every 30s) and the result is a simple gauge that dashboards can query instantly. Naming convention: level:metric:operations, e.g., job:http_requests_total:rate5m.
Q: What are the four golden signals?
Defined by Google's SRE book: Latency — how long requests take (distinguish successful vs error latency); Traffic — how much demand the system handles (requests/sec, queries/sec); Errors — rate of requests that fail (explicit 5xx, implicit wrong content, policy violations); Saturation — how "full" the service is (CPU utilisation, memory pressure, queue depth). If you can only instrument four things, make it these four.

🔗 Related DevOps Topics

🐳 Docker ☸️ Kubernetes 🗂️ Terraform 🐧 Linux ☁️ AWS ⚙️ CI/CD 📊 Prometheus 🌿 Git 📖 DevOps 🗺️ Mind Map

☕ Support Master DevOps

All content is 100% free. If this guide helped you crack an interview or learn something new, your support keeps the project going.

☕ Ko-fi — International 💳 Razorpay — UPI / India

No subscription · One-time equally loved 🙏

☸️
Written by Master DevOps
DevOps & SRE Engineer · Updated April 2026

Master DevOps is a community of practising DevOps and SRE engineers sharing real production knowledge — from Kubernetes internals to CI/CD pipeline design. All content is written from hands-on experience, not copied from documentation. Our mission: make senior-level DevOps knowledge free for everyone.

📸 Instagram ▶️ YouTube 💼 LinkedIn About Us →
🎯

Ready to Crack Your DevOps Interview?

Access 90+ interview Q&As, real commands, SRE frameworks, and 18-tool reference cards — all free, no login required. Used by 1,300+ DevOps engineers.

🎯 Open Interview Kit → 🗺️ Explore Mind Map

No account needed · Works on mobile · Updated weekly

Advertisement
🌙