Module 1.1: PromQL Deep Dive
PCA Track | Complexity:
[COMPLEX]| Time: 50-60 min
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Prometheus Module — architecture, pull model, basic PromQL
- Observability Theory — metrics concepts
- Basic Kubernetes knowledge
- A running Prometheus instance (kind/minikube with kube-prometheus-stack)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Construct PromQL queries using range vectors, aggregation operators, and binary operations to answer production questions about latency, error rates, and saturation
- Apply
histogram_quantileandrate()correctly to compute percentile latencies and per-second rates from counter and histogram metrics - Build recording rules that pre-compute expensive queries for dashboard performance and SLO tracking
- Diagnose misleading metric behavior (counter resets, label cardinality explosions, stale markers) by reasoning about PromQL evaluation mechanics
It was 2:00 AM on Black Friday at one of Europe’s largest e-commerce platforms. Traffic was 12x normal. The platform team had dashboards everywhere — CPU, memory, pod counts, request rates. Everything looked green.
Then customer support started reporting: “Users say checkout is slow.” The dashboard showed average latency at 200ms. Well within SLA. The on-call engineer almost went back to sleep.
But something nagged him. He opened Prometheus and typed:
histogram_quantile(0.99, sum by (le)(rate(http_request_duration_seconds_bucket{service="checkout"}[5m])))The P99 was 14 seconds. Average was fine because 95% of requests were fast — but 5% of users were waiting 14 seconds for checkout. On Black Friday. With millions of users, that “5%” was 50,000 people per hour abandoning carts.
He dug deeper:
histogram_quantile(0.99, sum by (le, payment_method)(rate(http_request_duration_seconds_bucket{service="checkout"}[5m])))The P99 for payment_method="credit_card" was 200ms. For payment_method="paypal", it was 14 seconds. A PayPal integration timeout was cascading into retries, holding connections, and starving the connection pool for everyone.
The average hid the problem. PromQL revealed it. That single query saved an estimated $2.1 million in abandoned carts that night.
Why This Module Matters
Section titled “Why This Module Matters”PromQL is 28% of the PCA exam — the single largest domain. But more importantly, PromQL is the language you use to answer questions during outages. Can you write a query that shows the error rate by service? Can you compute the 99th percentile latency? Can you join two metrics to calculate resource utilization as a percentage?
This module takes you beyond the basics covered in the Prometheus fundamentals module. You’ll learn every selector type, master rate functions, build complex aggregations, work with histograms deeply, use binary operators for metric joins, write recording rules, and understand subqueries.
Did You Know?
Section titled “Did You Know?”- PromQL processes queries in a “pull” fashion too — when you query, Prometheus reads time-series data from disk/memory and evaluates expressions. It doesn’t pre-compute results (that’s what recording rules are for).
- The
rate()function is mathematically sophisticated — it handles counter resets automatically by detecting decreases in values and compensating. This means you never need to worry about pod restarts breaking your rate calculations. histogram_quantile()uses linear interpolation between bucket boundaries. If your buckets are[0.1, 0.5, 1.0, +Inf]and the 95th percentile falls between 0.5 and 1.0, Prometheus draws a straight line between those points to estimate the value. Poor bucket choices = poor accuracy.- PromQL has no
JOINkeyword — but binary operators withon()andgroup_left()effectively give you the same power as SQL joins, just with different syntax.
Selectors: Finding Your Data
Section titled “Selectors: Finding Your Data”Instant Vector Selectors
Section titled “Instant Vector Selectors”An instant vector selector returns the most recent sample for each matching time series.
# Select all series with this metric namehttp_requests_total
# Filter by exact label matchhttp_requests_total{method="GET"}
# Filter by multiple labels (AND logic)http_requests_total{method="GET", status="200"}
# Regex match (RE2 syntax)http_requests_total{status=~"2.."}
# Negative matchhttp_requests_total{status!="500"}
# Negative regex matchhttp_requests_total{method!~"OPTIONS|HEAD"}Label matcher types:
| Matcher | Meaning | Example |
|---|---|---|
= | Exact match | {job="api"} |
!= | Not equal | {job!="test"} |
=~ | Regex match | {status=~"5.."} |
!~ | Negative regex | {path!~"/health|/ready"} |
Important: Every selector must have at least one matcher that doesn’t match the empty string. {job=~".*"} alone is invalid because it matches everything including empty — use {job=~".+"} instead.
Range Vector Selectors
Section titled “Range Vector Selectors”A range vector returns a window of samples for each series. Required for functions like rate().
# Last 5 minutes of sampleshttp_requests_total{method="GET"}[5m]
# Last 1 hourhttp_requests_total[1h]
# Valid time durations: ms, s, m, h, d, w, y# 5m = 5 minutes, 1h30m = 90 minutes, 1d = 1 dayYou cannot graph a range vector directly. Range vectors are inputs to functions:
# WRONG: Cannot graph thishttp_requests_total[5m]
# RIGHT: rate() converts range vector to instant vectorrate(http_requests_total[5m])Offset Modifier
Section titled “Offset Modifier”Compare current values to historical data:
# Current request raterate(http_requests_total[5m])
# Request rate 1 hour agorate(http_requests_total[5m] offset 1h)
# Request rate 1 week ago (for week-over-week comparison)rate(http_requests_total[5m] offset 7d)
# How much has the rate changed compared to 1 hour ago?rate(http_requests_total[5m])-rate(http_requests_total[5m] offset 1h)The @ Modifier
Section titled “The @ Modifier”Pin a query to a specific timestamp (useful for debugging past incidents):
# Value at a specific Unix timestamphttp_requests_total @ 1704067200
# Value at the start of the query rangehttp_requests_total @ start()
# Value at the end of the query rangehttp_requests_total @ end()Rate Functions: The Counter Toolkit
Section titled “Rate Functions: The Counter Toolkit”Counters only go up (except on reset). You almost never want the raw counter value — you want the rate of change.
rate()
Section titled “rate()”Calculates the average per-second rate of increase over the range:
# Average requests per second over last 5 minutesrate(http_requests_total[5m])# If counter went from 1000 to 1300 over 5 min:# rate = (1300 - 1000) / 300 seconds = 1.0 req/s
# CPU usage rate (seconds of CPU per second of wall time)rate(process_cpu_seconds_total[5m])# Result of 0.25 means 25% of one CPU coreHow rate() handles counter resets:
COUNTER RESET HANDLING──────────────────────────────────────────────────────────────
Normal: 100 → 200 → 300 → 400 rate = (400 - 100) / time = normal calculation
With reset: 100 → 200 → 50 → 150 Prometheus detects 200 → 50 (decrease = reset) Assumes: previous total was 200, new counter starts at 0 Effective increase: 200 + 150 = 350 rate = 350 / time
This is why rate() is safe to use even when pods restart!irate()
Section titled “irate()”Uses only the last two data points to compute an instantaneous rate:
# Instantaneous request rateirate(http_requests_total[5m])# Only uses the last 2 samples within the 5m window# Much more volatile than rate()When to use which:
RATE vs IRATE DECISION──────────────────────────────────────────────────────────────
rate(metric[5m])├── Smoothed average over the range├── Stable, predictable values├── USE FOR: alerting rules, SLO calculations, recording rules├── USE FOR: dashboard panels showing trends└── The range [5m] matters — it's the averaging window
irate(metric[5m])├── Instantaneous rate (last 2 points only)├── Volatile, shows spikes and drops├── USE FOR: debugging during incidents├── USE FOR: "what's happening right now?" panels└── The range [5m] only sets lookback for finding 2 pointsincrease()
Section titled “increase()”Total increase over the range (like rate * seconds):
# Total requests in the last hourincrease(http_requests_total[1h])# If rate was 100 req/s, increase ≈ 100 * 3600 = 360,000
# Total errors in the last 24 hoursincrease(http_errors_total[24h])
# Useful for human-readable counts:# "We processed 1.2 million requests today"increase(http_requests_total[24h])Relationship between rate and increase:
# These are approximately equivalent:increase(http_requests_total[1h]) ≈ rate(http_requests_total[1h]) * 3600resets()
Section titled “resets()”Count the number of counter resets (pod restarts, process crashes):
# How many times has this counter reset in the last hour?resets(http_requests_total[1h])
# High reset count may indicate crash loopsresets(process_start_time_seconds[1h]) > 5Common Pitfall: Range Too Short
Section titled “Common Pitfall: Range Too Short”THE 4x RULE──────────────────────────────────────────────────────────────
If scrape_interval = 15s, minimum useful range for rate() = 60s (4 × 15s)
Why? rate() needs at least 2 data points in the range. With 15s interval, a 30s range may contain only 1-2 points. A 60s range guarantees at least 4 points for reliable calculation.
Rule of thumb: range >= 4 × scrape_interval
rate(metric[15s]) ← BAD: might have only 1 pointrate(metric[30s]) ← RISKY: might have only 2 pointsrate(metric[1m]) ← OK: typically 4 pointsrate(metric[5m]) ← SAFE: ~20 points, good smoothingAggregation Operators
Section titled “Aggregation Operators”Aggregation operators combine multiple time series into fewer series.
Core Aggregation Functions
Section titled “Core Aggregation Functions”# SUM: total across all seriessum(rate(http_requests_total[5m]))# Result: single number — total requests/sec across all pods
# AVG: average across all seriesavg(rate(http_requests_total[5m]))# Result: average requests/sec per pod
# MIN / MAX: extremesmin(node_filesystem_avail_bytes)max(container_memory_usage_bytes)
# COUNT: number of seriescount(up == 1)# Result: how many targets are up
# STDDEV / STDVAR: statistical spreadstddev(rate(http_requests_total[5m]))# High stddev = uneven load distribution
# TOPK / BOTTOMK: highest/lowest N seriestopk(5, rate(http_requests_total[5m]))# Top 5 pods by request rate
# QUANTILE: compute quantile across seriesquantile(0.95, rate(http_requests_total[5m]))# 95th percentile of request rate across all pods# (NOT histogram_quantile — this works across series, not buckets)
# COUNT_VALUES: count unique valuescount_values("version", build_info)# How many instances are running each versionGrouping: by and without
Section titled “Grouping: by and without”# GROUP BY specific labels (keep only these)sum by (method)(rate(http_requests_total[5m]))# Result: one series per method (GET, POST, PUT, etc.)
sum by (method, status)(rate(http_requests_total[5m]))# Result: one series per method+status combination
# EXCLUDE specific labels (keep all others)sum without (instance)(rate(http_requests_total[5m]))# Result: removes instance label, keeps everything else
# Equivalent forms:sum by (method)(metric) = sum without (instance, job, ...)(metric)Real-world aggregation examples:
# Request rate per servicesum by (service)(rate(http_requests_total[5m]))
# Error rate per service (percentage)sum by (service)(rate(http_requests_total{status=~"5.."}[5m]))/sum by (service)(rate(http_requests_total[5m]))* 100
# Top 10 pods by memory usagetopk(10, container_memory_usage_bytes{container!=""})
# Average CPU per namespaceavg by (namespace)(rate(container_cpu_usage_seconds_total[5m]))
# Total network received per nodesum by (node)(rate(node_network_receive_bytes_total[5m]))Binary Operators and Vector Matching
Section titled “Binary Operators and Vector Matching”Arithmetic Operators
Section titled “Arithmetic Operators”# Simple arithmetic with scalarsnode_memory_MemTotal_bytes / 1024 / 1024 / 1024# Convert bytes to GiB
# Arithmetic between two vectorscontainer_memory_usage_bytes / container_spec_memory_limit_bytes * 100# Memory utilization percentage# Labels must match on both sides!Comparison Operators
Section titled “Comparison Operators”# Filter: only series where value > thresholdhttp_requests_total > 1000# Returns only series with value > 1000
# Boolean mode: returns 1 or 0 instead of filteringhttp_requests_total > bool 1000# Returns 1 (true) or 0 (false) for each series
# Useful in alerting:rate(http_requests_total{status=~"5.."}[5m])/ rate(http_requests_total[5m])> 0.05# Only returns series where error rate exceeds 5%Logical/Set Operators
Section titled “Logical/Set Operators”# AND: returns left side where right side also has matchesup == 1 and on(job) rate(http_requests_total[5m]) > 100# Targets that are up AND have high request rates
# OR: union of both sidesrate(http_requests_total{status="500"}[5m]) > 10orrate(http_requests_total{status="503"}[5m]) > 10# Series matching either condition
# UNLESS: returns left side where right side has NO matchup == 1 unless on(job) alerts{alertname="Maintenance"}# Targets that are up but NOT in maintenanceVector Matching: on() and ignoring()
Section titled “Vector Matching: on() and ignoring()”When binary operations involve two vectors, Prometheus must match series from each side. By default, all labels must match. Use on() or ignoring() to control matching.
# DEFAULT: all labels must matchcontainer_memory_usage_bytes / container_spec_memory_limit_bytes# Works if both sides have identical label sets
# ON: match only on specific labelscontainer_memory_usage_bytes / on(container, namespace) container_spec_memory_limit_bytes# Match only on container + namespace, ignore other labels
# IGNORING: match on everything EXCEPT specific labelshttp_requests_total / ignoring(status) group_left http_requests_total_sum# Ignore the "status" label when matchingMany-to-One and One-to-Many: group_left() / group_right()
Section titled “Many-to-One and One-to-Many: group_left() / group_right()”When one side has more series than the other (different cardinality), you need group_left or group_right:
# PROBLEM: node_info has labels (node, os, kernel_version)# node_memory_MemTotal_bytes has labels (node)# Many info series per node vs one memory series per node
# SOLUTION: group_left brings labels from the "one" sidenode_memory_MemTotal_bytes* on(node) group_left(os, kernel_version)node_info# Result has memory bytes with os and kernel_version labels added
# Real-world example: add service owner labels to metricsrate(http_requests_total[5m])* on(service) group_left(team, oncall)service_owner_info# Now your request rate has team and oncall labels!VECTOR MATCHING VISUAL──────────────────────────────────────────────────────────────
ONE-TO-ONE (default): Left: {method="GET", status="200"} → matches → Right: {method="GET", status="200"} Left: {method="POST", status="200"} → matches → Right: {method="POST", status="200"}
MANY-TO-ONE (group_left): Left: {node="a", cpu="0"} ─┐ Left: {node="a", cpu="1"} ─┼── on(node) group_left ──→ Right: {node="a"} Left: {node="a", cpu="2"} ─┘
ONE-TO-MANY (group_right): Left: {node="a"} ──── on(node) group_right ──┬─ Right: {node="a", disk="sda"} └─ Right: {node="a", disk="sdb"}Histogram Queries
Section titled “Histogram Queries”Understanding Histogram Metrics
Section titled “Understanding Histogram Metrics”A histogram metric generates three types of series:
HISTOGRAM STRUCTURE──────────────────────────────────────────────────────────────
Metric: http_request_duration_seconds
Generated series: http_request_duration_seconds_bucket{le="0.005"} = 24054 (≤5ms) http_request_duration_seconds_bucket{le="0.01"} = 33444 (≤10ms) http_request_duration_seconds_bucket{le="0.025"} = 100392 (≤25ms) http_request_duration_seconds_bucket{le="0.05"} = 129389 (≤50ms) http_request_duration_seconds_bucket{le="0.1"} = 133988 (≤100ms) http_request_duration_seconds_bucket{le="0.25"} = 144320 (≤250ms) http_request_duration_seconds_bucket{le="0.5"} = 144700 (≤500ms) http_request_duration_seconds_bucket{le="1"} = 144838 (≤1s) http_request_duration_seconds_bucket{le="+Inf"} = 144927 (all) http_request_duration_seconds_sum = 53423.4 (total seconds) http_request_duration_seconds_count = 144927 (total requests)
Key insight: buckets are CUMULATIVE. le="0.1" includes everything from le="0.005" through le="0.1"histogram_quantile()
Section titled “histogram_quantile()”# P50 (median) latencyhistogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m]))
# P90 latencyhistogram_quantile(0.90, rate(http_request_duration_seconds_bucket[5m]))
# P99 latency per servicehistogram_quantile(0.99, sum by (le, service)(rate(http_request_duration_seconds_bucket[5m])))# IMPORTANT: always keep "le" in the by() clause!# histogram_quantile() needs the le label to work.
# P99.9 latency (the "three nines" percentile)histogram_quantile(0.999, sum by (le)(rate(http_request_duration_seconds_bucket[5m])))Critical rule: When aggregating before histogram_quantile(), you MUST keep the le label:
# WRONG: drops le label — histogram_quantile cannot workhistogram_quantile(0.99, sum by (service)(rate(http_request_duration_seconds_bucket[5m])))
# RIGHT: keeps le labelhistogram_quantile(0.99, sum by (le, service)(rate(http_request_duration_seconds_bucket[5m])))Average Latency from Histogram
Section titled “Average Latency from Histogram”# Average request duration (sum of all durations / count of requests)rate(http_request_duration_seconds_sum[5m])/rate(http_request_duration_seconds_count[5m])
# Per servicesum by (service)(rate(http_request_duration_seconds_sum[5m]))/sum by (service)(rate(http_request_duration_seconds_count[5m]))Apdex Score from Histogram
Section titled “Apdex Score from Histogram”Application Performance Index — user satisfaction metric:
# Apdex with target = 300ms (satisfied ≤ 300ms, tolerating ≤ 1.2s)( sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) + sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])))/ 2/sum(rate(http_request_duration_seconds_count[5m]))
# Result interpretation:# 1.0 = all users satisfied# 0.85+ = excellent# 0.7-0.85 = good# < 0.5 = poorHistogram Bucket Selection Guidelines
Section titled “Histogram Bucket Selection Guidelines”CHOOSING HISTOGRAM BUCKETS──────────────────────────────────────────────────────────────
Default Prometheus buckets: [.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
Custom buckets (for a web API with 200ms SLO): [.01, .025, .05, .1, .2, .3, .5, .75, 1, 2, 5] ^^^ ^^^ Fine resolution Bucket AT your SLO target for fast path for accurate SLO reporting
Rules:1. Always have a bucket at or near your SLO target2. More buckets near expected values = better accuracy3. Wider gaps at extremes (>1s) are fine4. Too many buckets = high cardinality (each bucket is a series)5. The +Inf bucket is always auto-createdSubqueries
Section titled “Subqueries”Subqueries let you evaluate an instant vector expression over a range, creating a range vector that can be fed to functions.
# Basic syntax: <instant_query>[<range>:<resolution>]
# Average of the max over last hour, sampled every 5 minutesavg_over_time(max by (instance)(rate(http_requests_total[5m]))[1h:5m])
# Standard deviation of error rate over the last 6 hoursstddev_over_time( ( sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) )[6h:1m])
# Min value over last hour (useful for detecting dips)min_over_time(up[1h:1m])# Returns 0 if target was down at any point in the last hour*Common _over_time functions:
| Function | Purpose | Example Use |
|---|---|---|
avg_over_time() | Average over range | Smooth volatile metric |
min_over_time() | Minimum in range | Detect any downtime in window |
max_over_time() | Maximum in range | Find peak usage |
sum_over_time() | Sum of all samples | Total accumulation |
count_over_time() | Count of samples | Detect missing scrapes |
quantile_over_time() | Percentile over time | P95 of a gauge over 1 hour |
stddev_over_time() | Standard deviation | Detect unusual variance |
last_over_time() | Most recent value | Fill gaps in sparse metrics |
present_over_time() | 1 if any sample exists | Check metric existence |
Subquery vs recording rule: Subqueries are evaluated at query time (expensive). If you use a subquery frequently, convert it to a recording rule.
Recording Rules
Section titled “Recording Rules”Naming Convention
Section titled “Naming Convention”RECORDING RULE NAMING: level:metric:operations──────────────────────────────────────────────────────────────
level = aggregation level (e.g., job, instance, cluster)metric = the original metric nameoperations = list of operations applied (e.g., rate5m)
Examples: job:http_requests:rate5m instance:node_cpu:ratio cluster:http_errors:rate5m_ratio
IMPORTANT: Use colons (:) as separators. Raw metrics use underscores (_). Recording rules use colons (:). This makes it instantly clear which metrics are computed.Recording Rule Examples
Section titled “Recording Rule Examples”groups: - name: http_recording_rules interval: 30s rules: # Request rate per job - record: job:http_requests:rate5m expr: sum by (job)(rate(http_requests_total[5m]))
# Error rate per job - record: job:http_errors:rate5m expr: sum by (job)(rate(http_requests_total{status=~"5.."}[5m]))
# Error ratio per job (for SLO dashboards) - record: job:http_error_ratio:rate5m expr: | job:http_errors:rate5m / job:http_requests:rate5m
# P99 latency per job - record: job:http_latency_p99:rate5m expr: | histogram_quantile(0.99, sum by (job, le)(rate(http_request_duration_seconds_bucket[5m])) )
# Memory utilization per namespace - record: namespace:container_memory_utilization:ratio expr: | sum by (namespace)(container_memory_usage_bytes{container!=""}) / sum by (namespace)(container_spec_memory_limit_bytes{container!=""} > 0)
# CPU utilization per node - record: node:node_cpu_utilization:ratio_rate5m expr: | 1 - avg by (node)(rate(node_cpu_seconds_total{mode="idle"}[5m]))When to Create Recording Rules
Section titled “When to Create Recording Rules”CREATE A RECORDING RULE WHEN:──────────────────────────────────────────────────────────────
1. Dashboard query takes > 1 second to execute2. Same query is used in multiple dashboards3. Query is used in alerting rules (pre-compute = faster evaluation)4. You need to aggregate high-cardinality metrics down5. You want consistent values across different consumers6. You need longer time ranges on an expensive query
DON'T create a recording rule when:1. Query is simple and fast (e.g., up == 0)2. Only used in one place3. The metric is already low-cardinalityCommon Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
Using rate() on gauges | Nonsensical results | Use rate() only on counters; use deriv() for gauge rate-of-change |
Forgetting rate() on counters | Dashboard shows ever-increasing line | Always wrap counter queries in rate() or increase() |
Range too short for rate() | Missing or inaccurate results | Range >= 4x scrape_interval (e.g., [1m] for 15s scrape) |
Dropping le in histogram agg | histogram_quantile() fails | Always include le in by() clause |
Alerting on irate() | Flapping alerts | Use rate() for alerts; irate() for debugging only |
High-cardinality by() | Slow queries, memory issues | Group by low-cardinality labels; drop instance/pod where possible |
sum without () vs sum() | Unexpected label retention | sum() drops all labels; sum without(x) drops only x |
| Dividing without matching labels | Empty result or wrong joins | Use on() and group_left() for cross-metric division |
count() instead of count_over_time() | Counts series, not samples | count(up) = number of series; count_over_time(up[1h]) = samples per series |
| Subquery without resolution | Prometheus picks default | Always specify resolution: metric[1h:5m] not metric[1h:] |
Test your PromQL knowledge — these reflect PCA exam difficulty:
1. What is the difference between an instant vector and a range vector? When does PromQL require each?
Answer:
-
Instant vector: Returns one sample per time series (the most recent value within the lookback window). Used for graphing, comparison, and most operations.
- Example:
http_requests_total{method="GET"}
- Example:
-
Range vector: Returns multiple samples per time series over a time window. Cannot be graphed directly — must be passed to a function.
- Example:
http_requests_total{method="GET"}[5m]
- Example:
When each is required:
rate(),increase(),irate(),resets(),*_over_time()functions require range vectors- Arithmetic, comparison, aggregation, and
histogram_quantile()operate on instant vectors - A range vector is only valid as a direct argument to a function that expects one
2. Write a PromQL query to calculate the error rate (as a percentage) for each service, where errors are HTTP 5xx responses.
Answer:
sum by (service)(rate(http_requests_total{status=~"5.."}[5m]))/sum by (service)(rate(http_requests_total[5m]))* 100Key points:
- Use
rate()on the counter, not the raw counter value - Use
sum by (service)to aggregate across instances/pods - Both numerator and denominator must have the same
by()clause - Multiply by 100 to convert ratio to percentage
status=~"5.."matches 500, 501, 502, etc.
3. Why must you always include the `le` label when aggregating before `histogram_quantile()`? What happens if you don't?
Answer:
histogram_quantile() works by examining the cumulative bucket counts across different le (less-than-or-equal) boundaries. It uses these boundaries to interpolate where the requested percentile falls.
If you drop the le label during aggregation:
- All buckets get summed together into a single number
histogram_quantile()has no bucket boundaries to interpolate between- The result will be meaningless or produce an error
# WRONG — le is dropped, buckets are merged:histogram_quantile(0.99, sum by (service)(rate(metric_bucket[5m])))
# RIGHT — le is preserved, buckets remain separate:histogram_quantile(0.99, sum by (le, service)(rate(metric_bucket[5m])))4. Explain the difference between `quantile()` and `histogram_quantile()`. When would you use each?
Answer:
-
quantile(phi, instant_vector): Computes the phi-quantile across series. Takes a set of instant vector values and finds the value at the given quantile.- Example:
quantile(0.95, rate(http_requests_total[5m]))— “the 95th percentile of request rates across all pods” - Input: instant vector (multiple series, each with one value)
- Example:
-
histogram_quantile(phi, instant_vector): Computes the phi-quantile within a histogram. Uses cumulative bucket counts to estimate the value at the given quantile.- Example:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))— “the request duration that 95% of requests were faster than” - Input: instant vector that MUST contain
lelabels (histogram buckets)
- Example:
Use quantile() when you want a percentile across a set of gauge-like values (e.g., “what’s the 95th percentile CPU usage across my fleet?”).
Use histogram_quantile() when computing latency percentiles or any distribution metric stored as a histogram.
5. You have two metrics: `container_memory_usage_bytes` (labels: namespace, pod, container) and `kube_pod_info` (labels: namespace, pod, node, host_ip). Write a query that shows memory usage per node.
Answer:
sum by (node)( container_memory_usage_bytes{container!=""} * on(namespace, pod) group_left(node) kube_pod_info)Explanation:
container_memory_usage_byteshas nonodelabelkube_pod_infohas thenodelabel and can be joined onnamespace, podon(namespace, pod)specifies the join keysgroup_left(node)brings thenodelabel from the right side (many-to-one: many containers per pod info)sum by (node)aggregates the result by nodecontainer!=""excludes the pause container
6. What is a subquery? Write a subquery that finds the maximum error rate over the last 6 hours, sampled every 5 minutes.
Answer:
A subquery evaluates an instant vector expression repeatedly over a range at a specified resolution, producing a range vector. This range vector can then be passed to *_over_time() functions.
Syntax: <instant_query>[<range>:<resolution>]
max_over_time( ( sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) )[6h:5m])This evaluates the error rate expression every 5 minutes over the past 6 hours, then takes the maximum of all those values. Useful for answering: “What was the peak error rate in the last 6 hours?”
Note: Subqueries are expensive because Prometheus must evaluate the inner expression at every step. For frequently-used subqueries, consider a recording rule instead.
7. What is the recording rule naming convention? Why do recording rules use colons while raw metrics use underscores?
Answer:
Naming convention: level:metric:operations
level: The aggregation level (e.g.,job,instance,namespace,cluster)metric: The base metric nameoperations: Functions applied (e.g.,rate5m,ratio)
Examples:
job:http_requests:rate5m— request rate aggregated to job levelinstance:node_cpu:ratio— CPU ratio at instance levelnamespace:container_memory:sum— memory summed per namespace
Why colons? Convention separates raw metrics (underscores only) from computed metrics (colons). When you see job:http_requests:rate5m, you immediately know:
- This is a recording rule, not a raw metric
- It’s aggregated to the job level
- It’s a 5-minute rate
This makes it easy to audit which metrics are “real” vs. pre-computed.
8. Given a 15-second scrape interval, explain why `rate(metric[30s])` might give inaccurate results. What range should you use?
Answer:
With a 15-second scrape interval, a 30-second range window will contain at most 2-3 data points (depending on alignment):
Time: 0s 15s 30s 45s 60sScrape: | | | | | [--- 30s window ---]
Best case: 3 points (0s, 15s, 30s)Worst case: 2 points (if window doesn't align perfectly)Problems:
- With only 2 points,
rate()becomes equivalent toirate()— it’s just the slope between two points with no smoothing - A single anomalous scrape will dominate the result
- If one scrape is missed (network blip), you might have only 1 point —
rate()returns nothing
Minimum recommended range: 4 × scrape_interval = 60s
rate(metric[1m]) # 4 points minimum — acceptablerate(metric[5m]) # ~20 points — good smoothing, standard choiceThe [5m] range is the de facto standard because it balances responsiveness with stability.
Hands-On Exercise: PromQL Workout
Section titled “Hands-On Exercise: PromQL Workout”Practice PromQL on a live Prometheus instance with real metrics.
# Create a kind cluster if you don't have onekind create cluster --name promql-lab
# Install kube-prometheus-stack (includes Prometheus, Grafana, node-exporter)helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring --create-namespace \ --set prometheus.prometheusSpec.scrapeInterval=15s
# Wait for all pods to be readykubectl wait --for=condition=ready pod -l app.kubernetes.io/instance=monitoring \ -n monitoring --timeout=120sStep 1: Access Prometheus UI
Section titled “Step 1: Access Prometheus UI”# Port-forward to Prometheuskubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090
# Open http://localhost:9090 in your browserStep 2: Selector Practice
Section titled “Step 2: Selector Practice”Type these queries in the Prometheus UI expression browser:
# 1. Find all targetsup
# 2. Filter by jobup{job="kubelet"}
# 3. Regex: find all kube-state-metrics series starting with "kube_pod"{__name__=~"kube_pod.*"}
# 4. Negative filter: all HTTP metrics except health checks{__name__=~"http.*", handler!~"/health|/ready"}Step 3: Rate Function Practice
Section titled “Step 3: Rate Function Practice”# 5. CPU usage rate per noderate(node_cpu_seconds_total{mode!="idle"}[5m])
# 6. Compare rate vs irate (switch between Graph tab to see difference)rate(node_cpu_seconds_total{mode="user", cpu="0"}[5m])irate(node_cpu_seconds_total{mode="user", cpu="0"}[5m])
# 7. Network bytes received (increase over 1 hour)increase(node_network_receive_bytes_total{device="eth0"}[1h])Step 4: Aggregation Practice
Section titled “Step 4: Aggregation Practice”# 8. Total CPU usage per node (aggregate across CPU cores)sum by (instance)(rate(node_cpu_seconds_total{mode!="idle"}[5m]))
# 9. Top 3 pods by memory usagetopk(3, container_memory_usage_bytes{container!=""})
# 10. Count running pods per namespacecount by (namespace)(kube_pod_status_phase{phase="Running"})
# 11. Average memory usage per namespaceavg by (namespace)(container_memory_usage_bytes{container!=""})Step 5: Binary Operators and Joins
Section titled “Step 5: Binary Operators and Joins”# 12. Memory utilization percentage per containercontainer_memory_usage_bytes{container!=""}/ on(namespace, pod, container)container_spec_memory_limit_bytes{container!=""}* 100
# 13. Node CPU utilization (1 - idle ratio)1 - avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m]))
# 14. Filesystem usage percentage per node(node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"})/node_filesystem_size_bytes{mountpoint="/"}* 100Step 6: Histogram Queries
Section titled “Step 6: Histogram Queries”# 15. P99 API server request latencyhistogram_quantile(0.99, sum by (le)(rate(apiserver_request_duration_seconds_bucket[5m])))
# 16. P50 vs P99 comparison — add both to Graph tabhistogram_quantile(0.5, sum by (le)(rate(apiserver_request_duration_seconds_bucket[5m])))
# 17. Average request duration from histogramrate(apiserver_request_duration_seconds_sum[5m])/rate(apiserver_request_duration_seconds_count[5m])Success Criteria
Section titled “Success Criteria”You’ve completed this exercise when you can:
- Write instant and range vector selectors with all four matcher types
- Use
rate(),irate(), andincrease()and explain when to use each - Aggregate with
sum by,avg by,topkand understandbyvswithout - Compute histogram percentiles with
histogram_quantile() - Join two metrics using
on()andgroup_left() - Explain why the 4x scrape_interval rule matters for
rate()ranges - Convert a complex query into a recording rule with proper naming
Key Takeaways
Section titled “Key Takeaways”Before moving on, ensure you understand:
- Selector types: Instant vectors (one sample per series) vs range vectors (multiple samples per series over a time window)
- Label matchers:
=,!=,=~,!~— regex uses RE2 syntax, every selector needs at least one non-empty matcher - rate() vs irate() vs increase():
rate= smoothed per-second average (alerting),irate= instantaneous (debugging),increase= total count over range - 4x rule: Range for
rate()should be >= 4 times the scrape interval - Aggregation operators:
sum,avg,min,max,count,topk,quantilewithby/withoutfor grouping - Binary operators: Arithmetic, comparison, and logical operators with vector matching via
on()/ignoring()andgroup_left()/group_right() - histogram_quantile(): Needs
lelabel, uses linear interpolation between buckets, bucket selection affects accuracy - Recording rules:
level:metric:operationsnaming convention with colons, pre-compute expensive queries - Subqueries:
expr[range:resolution]syntax for evaluating an expression over time, feed to*_over_time()functions
Further Reading
Section titled “Further Reading”- PromQL Cheat Sheet — Quick reference by Julius Volz
- Prometheus Querying Documentation — Official PromQL reference
- PromLabs Blog — Deep PromQL articles
- Robust Perception Blog — Brian Brazil’s PromQL patterns
- Recording Rules Best Practices — Official conventions
Next Module
Section titled “Next Module”Continue to Module 2: Instrumentation & Alerting to learn about client libraries, metric naming, exporters, and Alertmanager configuration.