Skip to content

Module 2.2: Control Groups (cgroups)

Linux Foundations | Complexity: [MEDIUM] | Time: 30-35 min

Before starting this module:


After this module, you will be able to:

  • Configure cgroup resource limits for CPU, memory, and I/O
  • Explain how Kubernetes requests and limits map to cgroup settings
  • Diagnose OOMKilled containers by reading cgroup memory accounting
  • Compare cgroups v1 and v2 and explain why the industry is migrating to v2

While namespaces provide isolation (what a process can see), cgroups provide limits (how much a process can use).

Every Kubernetes resource request and limit, every Docker memory constraint, every container CPU throttle—they all use cgroups.

Understanding cgroups helps you:

  • Debug OOM kills — Why did my container get killed?
  • Tune resource limits — Set appropriate requests and limits
  • Understand throttling — Why is my container slow but not using 100% CPU?
  • Troubleshoot systemd — Services use cgroups too

When a pod is evicted for memory pressure or your application is mysteriously slow, cgroups are involved.


  • cgroups were originally developed by Google engineers Paul Menage and Rohit Seth in 2006. They wanted a way to control resource usage in their massive data centers.

  • Kubernetes memory limits trigger the OOM killer — When a container exceeds its memory limit, the kernel’s Out-Of-Memory killer terminates it. There’s no gradual slowdown—it’s sudden death.

  • CPU limits use CFS bandwidth control — Your container might use less than 100% CPU even under load because it’s being “throttled” for using too much CPU in a given period.

  • cgroups v2 is now the default — Kubernetes 1.25+ uses cgroups v2 by default. It provides a unified hierarchy and better resource control, but some older tools may not support it.


Control groups (cgroups) organize processes into hierarchical groups whose resource usage can be limited, monitored, and controlled.

┌─────────────────────────────────────────────────────────────────┐
│ CGROUP HIERARCHY │
│ │
│ / (root) │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ system │ │ user │ │ kubepods │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ┌────┴────┐ ... ┌─────┴─────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │ sshd │ │ docker │ │ burstable│ │guaranteed│ │
│ │ 512 MB │ │ 2 GB │ │ │ │ │ │
│ └─────────┘ └─────────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ pod-abc... pod-xyz... │
└─────────────────────────────────────────────────────────────────┘
ResourceControllerWhat It Controls
CPUcpu, cpusetCPU time, CPU cores
MemorymemoryRAM usage, swap
I/Oio (v2), blkio (v1)Disk bandwidth
PIDspidsNumber of processes
Networknet_cls, net_prioNetwork priority (limited)
DevicesdevicesAccess to devices
FreezerfreezerSuspend/resume processes

Critical: Kubernetes 1.35+ Requires cgroup v2

Starting with Kubernetes 1.35 (December 2025), cgroup v1 support is disabled by default. The kubelet will fail to start on cgroup v1 nodes. If you’re running Kubernetes, you must be on cgroup v2. Check with: stat -fc %T /sys/fs/cgroup — must return cgroup2fs. Affected OS versions: CentOS 7, RHEL 7, Ubuntu 18.04.

cgroups v1 had separate hierarchies per controller:

# v1: Multiple hierarchies (messy)
/sys/fs/cgroup/
├── cpu/ ← CPU hierarchy
│ └── docker/
│ └── container1/
├── memory/ ← Memory hierarchy
│ └── docker/
│ └── container1/
└── pids/ ← PIDs hierarchy
└── docker/
└── container1/

Each controller had its own tree, leading to:

  • Complex management
  • Inconsistent process groupings
  • No single place to see a process’s resources
# v2: Single hierarchy (clean)
/sys/fs/cgroup/
└── docker/
└── container1/
├── cpu.max ← CPU settings
├── memory.max ← Memory settings
└── pids.max ← PIDs settings
Terminal window
# Check cgroup version
mount | grep cgroup
# v1 shows: cgroup on /sys/fs/cgroup/cpu type cgroup
# v2 shows: cgroup2 on /sys/fs/cgroup type cgroup2
# Or check directly
cat /sys/fs/cgroup/cgroup.controllers 2>/dev/null && echo "v2" || echo "v1 or mixed"
Featurev1v2
HierarchyMultiple (per controller)Single (unified)
Process membershipCan be in different groups per controllerOne group for all controllers
Memory pressureNot availableAvailable (memory.pressure)
I/O controlLimited (blkio)Better (io)
Kubernetes supportLegacyDefault (1.25+)

Stop and think: If a Java application with a 512MB heap size is placed in a container with a 512MB cgroup memory limit, it will almost certainly be OOMKilled. Why? Consider what else inside the container’s environment or the JVM process requires memory beyond just the allocated heap space.

┌────────────────────────────────────────────────────────────────┐
│ CONTAINER MEMORY │
│ │
│ memory.max = 512MB │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Used: 400MB │ │
│ │ ███████████████████████████████████████░░░░░░░░░░░░░░ │ │
│ │ ◄──────── 400MB ─────────►│◄── 112MB ──► │ │
│ │ Used │ Available │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ If usage reaches 512MB → OOM KILL │
└────────────────────────────────────────────────────────────────┘

When memory exceeds the limit:

  1. Kernel invokes OOM killer
  2. Process is immediately killed (SIGKILL)
  3. Container/pod is restarted
  4. No graceful shutdown possible
Terminal window
# Check if OOM killed
dmesg | grep -i "oom"
# or
journalctl -k | grep -i "oom"
# In Kubernetes
kubectl describe pod <pod-name> | grep -i oom
# Look for: OOMKilled
Terminal window
# v2
cat /sys/fs/cgroup/user.slice/user-1000.slice/memory.max
cat /sys/fs/cgroup/user.slice/user-1000.slice/memory.current
# For a container (path varies)
# Find container cgroup
find /sys/fs/cgroup -name "memory.max" 2>/dev/null | head -5
Terminal window
# v2 memory statistics
cat /sys/fs/cgroup/user.slice/memory.stat
# Key values:
# anon - anonymous memory (heap, stack)
# file - file cache
# kernel - kernel memory
# shmem - shared memory

Pause and predict: If you set a CPU limit of 0.5 (500m) for a single-threaded Node.js application, and it receives a massive spike in traffic, what will happen to the response time? Will the container crash, or will something else occur at the kernel level?

Unlike memory, CPU doesn’t trigger kills—it throttles.

┌────────────────────────────────────────────────────────────────┐
│ CPU CFS BANDWIDTH │
│ │
│ Period: 100ms │
│ Quota: 50ms (50% of one CPU = "500m" in Kubernetes) │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Time: 0ms 100ms │ │
│ │ ├─────────────────────────────────────────────────────►│ │
│ │ │ │
│ │ ████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░ │ │
│ │ ◄──── Running (50ms) ────►│◄──── Throttled ────────► │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ After using 50ms in 100ms period → throttled until next period│
└────────────────────────────────────────────────────────────────┘
KubernetesMeaningcgroup quota/period
11 full CPU100000/100000
500m0.5 CPU (50%)50000/100000
100m0.1 CPU (10%)10000/100000
22 full CPUs200000/100000
Terminal window
# v2 CPU controls
cat /sys/fs/cgroup/cpu.max
# Format: quota period
# "50000 100000" = 50ms per 100ms = 50%
# Check throttling stats (v2)
cat /sys/fs/cgroup/cpu.stat
# Look for:
# nr_throttled - number of times throttled
# throttled_usec - total time throttled
Terminal window
# Run a CPU-intensive process
stress --cpu 1 --timeout 30 &
# Watch throttling (in another terminal)
watch -n1 'cat /sys/fs/cgroup/user.slice/cpu.stat | grep throttled'

resources:
requests: # Scheduling guarantee
memory: "256Mi"
cpu: "250m"
limits: # Hard limit (cgroup)
memory: "512Mi"
cpu: "500m"
SettingPurposecgroup Behavior
request.memorySchedulingNot directly enforced by cgroup
request.cpuSchedulingSets cpu.weight (shares)
limit.memoryHard limitmemory.max
limit.cpuThrottlingcpu.max
┌─────────────────────────────────────────────────────────────────┐
│ KUBEPODS CGROUP HIERARCHY │
│ │
│ /sys/fs/cgroup/kubepods.slice/ │
│ │ │
│ ├── kubepods-burstable.slice/ ← Burstable pods │
│ │ └── kubepods-burstable-pod<uid>/ ← Individual pod │
│ │ └── cri-containerd-<id>/ ← Container │
│ │ │
│ ├── kubepods-besteffort.slice/ ← BestEffort pods │
│ │ └── ... │
│ │ │
│ └── kubepods-pod<uid>/ ← Guaranteed pods │
│ └── ... (directly under kubepods)│
└─────────────────────────────────────────────────────────────────┘
Terminal window
# Find cgroup for a container
# 1. Get container ID
docker ps
crictl ps
# 2. Find its cgroup
cat /proc/<container-pid>/cgroup
# 3. Or search
find /sys/fs/cgroup -name "*<container-id-prefix>*" 2>/dev/null

systemd uses cgroups extensively for service management.

Terminal window
# See cgroup hierarchy
systemd-cgls
# Resource usage by service
systemd-cgtop
# Specific service resources
systemctl show docker.service | grep -E "(Memory|CPU)"
/etc/systemd/system/myapp.service
[Service]
MemoryMax=512M
CPUQuota=50%
TasksMax=100
# Apply
systemctl daemon-reload
systemctl restart myapp
Terminal window
# Set runtime limit
sudo systemctl set-property docker.service MemoryMax=2G
# View current settings
systemctl show docker.service -p MemoryMax

MistakeProblemSolution
No memory limitPod can consume all node memoryAlways set memory limits
Limit = RequestNo burst capacitySet limit > request for burstable
Ignoring throttlingSlow application, blame networkCheck cpu.stat for throttling
OOM without logsDon’t know why container diedCheck dmesg, set proper logging
Not understanding v1 vs v2Tooling differencesCheck version, use appropriate paths
Too low CPU limitConstant throttlingMonitor and adjust based on usage

You are deploying a critical database pod and set its memory request to 4GB and memory limit to 8GB. During a sudden traffic spike, the pod’s memory usage reaches 6GB, and the Node it is running on experiences severe memory pressure, running out of allocatable RAM. What will the kubelet and the kernel do to this pod, and why?

Show Answer

The pod is at risk of being evicted or OOMKilled, despite being under its 8GB limit. Why? When a node experiences memory pressure, Kubernetes evicts pods to reclaim memory. Because the pod is using more than its 4GB request (it’s using 6GB), it falls into the “Burstable” QoS class and is actively consuming resources beyond its guaranteed baseline. The kernel’s OOM killer or the kubelet eviction manager will target pods that exceed their requests before touching pods that are strictly within their requested boundaries. Setting requests equal to limits (Guaranteed QoS) would have protected this critical database from being the first victim.

Your team deploys a video encoding application. The developers complain that the application is running slowly, but when they check monitoring tools, the container is only using 40% of the node’s CPU capacity. They insist the node must have a hardware issue. How do you explain this behavior using cgroups?

Show Answer

The container is experiencing CPU throttling enforced by the Completely Fair Scheduler (CFS) bandwidth control in cgroups. Why? The deployment likely has a CPU limit set (e.g., limit: 400m on a 1-core node). The cgroup translates this limit into a specific quota of CPU time allowed per period (usually 100ms). Once the video encoder uses up its allotted quota (e.g., 40ms) within that period, the kernel pauses the process until the next period begins. This manifests as the application artificially running slowly without ever reaching 100% host CPU utilization, as the cgroup restricts its access to the physical cores.

A developer sets a memory limit of 512MB for a Python data processing container. The application attempts to load a 600MB dataset entirely into RAM. The developer expects the application to throw a catchable MemoryError exception so they can log it and gracefully exit. Instead, the container simply vanishes and restarts. What kernel mechanism caused this, and why didn’t the application catch the error?

Show Answer

The kernel’s Out-Of-Memory (OOM) killer intervened and terminated the container abruptly. Why? cgroup memory limits represent a hard boundary enforced by the Linux kernel, not the language runtime. When the container’s total memory footprint attempts to exceed the memory.max value set in its cgroup, the kernel immediately sends a SIGKILL signal to the process. A SIGKILL cannot be caught, blocked, or handled by the application code (unlike a soft memory exception thrown by a runtime). Therefore, the Python process never gets a chance to log the error or gracefully shut down before the container is restarted by Kubernetes or Docker.

You are upgrading your Kubernetes cluster to a version that enforces cgroups v2. A legacy monitoring daemonset in your cluster fails to start, complaining that it cannot find /sys/fs/cgroup/memory/memory.usage_in_bytes. What architectural change between cgroups v1 and v2 is causing this failure?

Show Answer

The monitoring tool is failing because cgroups v2 uses a unified hierarchy, whereas cgroups v1 used separate hierarchies for every resource controller. Why? In cgroups v1, CPU, memory, and PIDs were mounted in different directory trees (e.g., /sys/fs/cgroup/memory/... and /sys/fs/cgroup/cpu/...), allowing a process to belong to different groups for different resources. cgroups v2 simplifies this by placing a process in exactly one cgroup path (e.g., /sys/fs/cgroup/user.slice/...), and all resource controllers (memory, CPU, I/O) are managed via files in that single directory (like memory.current and cpu.max). The legacy tool is hardcoded to look for the v1 split-directory structure and specific v1 filenames, which no longer exist in a v2 environment.

You are investigating a node where a specific Docker container is mysteriously running very slowly. You want to manually check the raw kernel values to see if the container’s CPU quota has been artificially restricted. Walk through the exact steps you would take on the host system to find this container’s specific CPU limit configuration in cgroups v2.

Show Answer

You must first find the process ID (PID) of the container and then trace it to its cgroup path. Why? Containers are just isolated processes to the kernel, so their cgroup configurations are tied to their PID. First, you would run docker inspect <container_name> --format '{{.State.Pid}}' to get the host PID. Next, you read /proc/<PID>/cgroup to discover the exact unified cgroup path assigned to that process (e.g., 0::/system.slice/docker-<id>.scope). Finally, you append that path to the cgroup mount point (/sys/fs/cgroup) and inspect the cpu.max file (e.g., cat /sys/fs/cgroup/system.slice/docker-<id>.scope/cpu.max) to see the raw quota and period values causing the throttling.


Objective: Understand cgroup structure, limits, and throttling.

Environment: Linux system with cgroups (v1 or v2)

Terminal window
# 1. Check mount type
mount | grep cgroup
# 2. Check for v2
if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
echo "cgroups v2"
cat /sys/fs/cgroup/cgroup.controllers
else
echo "cgroups v1 or mixed"
ls /sys/fs/cgroup/
fi
Terminal window
# 1. Your process's cgroup
cat /proc/$$/cgroup
# 2. View hierarchy (v2)
ls /sys/fs/cgroup/
# 3. See systemd slices
systemd-cgls | head -50
# 4. Resource usage
systemd-cgtop
Terminal window
# 1. Find memory settings (v2)
cat /sys/fs/cgroup/user.slice/memory.max 2>/dev/null || \
cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null
# 2. Current usage
cat /sys/fs/cgroup/user.slice/memory.current 2>/dev/null || \
cat /sys/fs/cgroup/memory/memory.usage_in_bytes 2>/dev/null
# 3. Memory statistics
cat /sys/fs/cgroup/user.slice/memory.stat 2>/dev/null | head -10
Terminal window
# 1. CPU settings (v2)
cat /sys/fs/cgroup/user.slice/cpu.max 2>/dev/null
# 2. CPU statistics
cat /sys/fs/cgroup/user.slice/cpu.stat 2>/dev/null
# 3. Check for throttling
cat /sys/fs/cgroup/user.slice/cpu.stat 2>/dev/null | grep throttled

Part 5: Create a cgroup (v2, requires root)

Section titled “Part 5: Create a cgroup (v2, requires root)”
Terminal window
# 1. Create a test cgroup
sudo mkdir /sys/fs/cgroup/test-cgroup
# 2. Enable controllers
echo "+memory +cpu" | sudo tee /sys/fs/cgroup/test-cgroup/cgroup.subtree_control
# 3. Set memory limit (100MB)
echo "104857600" | sudo tee /sys/fs/cgroup/test-cgroup/memory.max
# 4. Check it
cat /sys/fs/cgroup/test-cgroup/memory.max
# 5. Move current shell to this cgroup
echo $$ | sudo tee /sys/fs/cgroup/test-cgroup/cgroup.procs
# 6. Verify
cat /proc/$$/cgroup
# 7. Check memory usage
cat /sys/fs/cgroup/test-cgroup/memory.current
# 8. Exit shell to leave cgroup, then cleanup
exit
sudo rmdir /sys/fs/cgroup/test-cgroup
  • Identified cgroup version on your system
  • Explored the cgroup hierarchy
  • Found memory and CPU settings
  • Understood throttling statistics
  • (Optional) Created and tested a custom cgroup

  1. cgroups limit resources — While namespaces isolate views, cgroups enforce limits

  2. Memory limits are fatal — Exceeding them triggers OOM kill (SIGKILL)

  3. CPU limits cause throttling — Process is paused, not killed

  4. v2 is the future — Single hierarchy, better features, Kubernetes default

  5. Kubernetes uses cgroups — Every limit you set becomes a cgroup configuration


In Module 2.3: Capabilities & LSMs, you’ll learn how Linux provides fine-grained privilege control beyond just root/non-root.