Module 4.1: Security Contexts
Complexity:
[MEDIUM]- Core CKS skillTime to Complete: 45-50 minutes
Prerequisites: CKA pod spec knowledge, basic Linux security concepts
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure pod and container security contexts to enforce non-root, read-only, and drop-all-capabilities
- Audit workloads for dangerous security context settings like privileged mode and hostPID
- Implement defense-in-depth security contexts combining runAsNonRoot, readOnlyRootFilesystem, and capability drops
- Debug application failures caused by security context restrictions
Why This Module Matters
Section titled “Why This Module Matters”Security contexts define privilege and access control settings for pods and containers. They’re your first line of defense against container breakout—controlling whether containers run as root, can access host resources, or have elevated privileges.
CKS heavily tests security context configuration.
What Security Contexts Control
Section titled “What Security Contexts Control”┌─────────────────────────────────────────────────────────────┐│ SECURITY CONTEXT SCOPE │├─────────────────────────────────────────────────────────────┤│ ││ Pod-Level (spec.securityContext): ││ ├── runAsUser - UID for all containers ││ ├── runAsGroup - GID for all containers ││ ├── fsGroup - GID for volume ownership ││ ├── runAsNonRoot - Prevent running as root ││ ├── supplementalGroups - Additional group memberships ││ ├── seccompProfile - Seccomp profile ││ └── sysctls - Kernel parameters ││ ││ Container-Level (containers[].securityContext): ││ ├── runAsUser - Override pod-level UID ││ ├── runAsGroup - Override pod-level GID ││ ├── runAsNonRoot - Container-specific check ││ ├── privileged - Full host access (dangerous!) ││ ├── allowPrivilegeEscalation - Prevent privilege gain ││ ├── capabilities - Linux capabilities ││ ├── readOnlyRootFilesystem - Immutable container ││ ├── seccompProfile - Container-specific seccomp ││ └── seLinuxOptions - SELinux labels ││ ││ Container settings OVERRIDE pod settings ││ │└─────────────────────────────────────────────────────────────┘Critical Security Context Settings
Section titled “Critical Security Context Settings”runAsNonRoot
Section titled “runAsNonRoot”apiVersion: v1kind: Podmetadata: name: non-root-podspec: securityContext: runAsNonRoot: true # Pod-level enforcement containers: - name: app image: nginx securityContext: runAsUser: 1000 # Must specify non-root UID runAsGroup: 1000
# If image tries to run as root (UID 0), pod fails to start:# Error: container has runAsNonRoot and image will run as rootallowPrivilegeEscalation
Section titled “allowPrivilegeEscalation”apiVersion: v1kind: Podmetadata: name: no-escalationspec: containers: - name: app image: nginx securityContext: allowPrivilegeEscalation: false # Prevent setuid, setgid
# This prevents:# - setuid binaries from gaining privileges# - Container processes from becoming root# - Exploits that rely on privilege escalationStop and think: A container runs as root by default unless you explicitly set
runAsNonRoot: true. But many popular images (nginx, redis, postgres) are built to run as root. What happens when you enforcerunAsNonRoot: trueon these images, and what’s the proper fix?
readOnlyRootFilesystem
Section titled “readOnlyRootFilesystem”apiVersion: v1kind: Podmetadata: name: readonly-podspec: containers: - name: app image: nginx securityContext: readOnlyRootFilesystem: true # Can't write to container filesystem volumeMounts: - name: tmp mountPath: /tmp # Must provide writable volume for temp files - name: cache mountPath: /var/cache/nginx - name: run mountPath: /var/run volumes: - name: tmp emptyDir: {} - name: cache emptyDir: {} - name: run emptyDir: {}privileged (AVOID!)
Section titled “privileged (AVOID!)”# DON'T DO THIS in productionapiVersion: v1kind: Podmetadata: name: privileged-podspec: containers: - name: app image: nginx securityContext: privileged: true # Full access to host!
# privileged: true means:# - Access to all host devices# - Can load kernel modules# - Can modify iptables# - Can escape container entirely# - ONLY use for system-level daemons (CNI, CSI drivers)Linux Capabilities
Section titled “Linux Capabilities”┌─────────────────────────────────────────────────────────────┐│ LINUX CAPABILITIES │├─────────────────────────────────────────────────────────────┤│ ││ Capabilities split root powers into fine-grained units: ││ ││ CAP_NET_BIND_SERVICE - Bind to ports < 1024 ││ CAP_NET_ADMIN - Configure network interfaces ││ CAP_NET_RAW - Use raw sockets (ping, etc.) ││ CAP_SYS_ADMIN - Many syscalls (mount, etc.) ││ CAP_SYS_PTRACE - Debug other processes ││ CAP_CHOWN - Change file ownership ││ CAP_DAC_OVERRIDE - Bypass file permissions ││ CAP_SETUID/SETGID - Change UID/GID ││ CAP_KILL - Send signals to any process ││ ││ Default container capabilities (Docker): ││ CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, ││ SETGID, SETUID, SETPCAP, NET_BIND_SERVICE, NET_RAW, ││ SYS_CHROOT, MKNOD, AUDIT_WRITE, SETFCAP ││ ││ Best practice: Drop ALL, add only what's needed ││ │└─────────────────────────────────────────────────────────────┘Configuring Capabilities
Section titled “Configuring Capabilities”apiVersion: v1kind: Podmetadata: name: minimal-capsspec: containers: - name: app image: nginx securityContext: capabilities: drop: - ALL # Drop all capabilities add: - NET_BIND_SERVICE # Only add what's neededPause and predict: You set
capabilities: drop: ["ALL"]on a container that needs to bind to port 80. The container crashes with “permission denied” on startup. What single capability do you need to add back, and why is the “drop all, add back” approach safer than the default?
Complete Secure Pod Example
Section titled “Complete Secure Pod Example”apiVersion: v1kind: Podmetadata: name: hardened-podspec: securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: app image: myapp:1.0 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL resources: limits: memory: "128Mi" cpu: "500m" requests: memory: "64Mi" cpu: "250m" volumeMounts: - name: tmp mountPath: /tmp volumes: - name: tmp emptyDir: {}Pod vs Container Security Context
Section titled “Pod vs Container Security Context”apiVersion: v1kind: Podmetadata: name: mixed-contextspec: securityContext: runAsUser: 1000 # Default for all containers runAsGroup: 1000 containers: - name: app image: myapp # Inherits runAsUser: 1000 from pod - name: sidecar image: sidecar securityContext: runAsUser: 2000 # Overrides pod-level setting # This container runs as UID 2000, not 1000Real-World War Story: The Privileged Container Escape
Section titled “Real-World War Story: The Privileged Container Escape”In a well-documented incident at a major tech company, a security team investigating a breached Kubernetes cluster discovered that attackers had completely compromised a worker node by escaping a seemingly harmless application container. The root cause? During initial troubleshooting months earlier, developers had temporarily set privileged: true in the pod’s security context to “make it work quickly” and forgot to remove it before deploying to production.
When the attackers found a basic remote code execution (RCE) vulnerability in the web application, they dropped into a shell. Because the container was privileged, it had all Linux capabilities enabled, including CAP_SYS_MODULE and CAP_SYS_ADMIN. The attackers compiled a malicious Linux kernel module, loaded it directly into the host’s kernel from inside the container, and shattered the container’s isolation boundary. From the host, they accessed the kubelet’s TLS certificates and pivoted to compromise the entire cluster.
Stop and think: If the developers had instead run the container as root (
runAsUser: 0) but withoutprivileged: trueand with default capabilities, would the attackers have been able to load the kernel module?
The Lesson: If the developers had used capabilities: drop: ["ALL"] and explicitly added only the specific network capability they were trying to troubleshoot, the CAP_SYS_MODULE capability would have been dropped. The kernel module injection would have failed, trapping the attackers inside the container. Never use privileged: true as a shortcut for missing capabilities.
Real Exam Scenarios
Section titled “Real Exam Scenarios”Scenario 1: Fix Pod Running as Root
Section titled “Scenario 1: Fix Pod Running as Root”# Before (insecure)apiVersion: v1kind: Podmetadata: name: insecure-podspec: containers: - name: app image: nginx
# After (secure)apiVersion: v1kind: Podmetadata: name: secure-podspec: securityContext: runAsNonRoot: true runAsUser: 1000 containers: - name: app image: nginx securityContext: allowPrivilegeEscalation: falseScenario 2: Add Minimal Capabilities
Section titled “Scenario 2: Add Minimal Capabilities”# Pod needs to bind to port 80 but shouldn't run as rootcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: web-serverspec: securityContext: runAsNonRoot: true runAsUser: 1000 containers: - name: nginx image: nginx securityContext: capabilities: drop: - ALL add: - NET_BIND_SERVICE # Allow binding to port 80 allowPrivilegeEscalation: falseEOFScenario 3: Make Filesystem Read-Only
Section titled “Scenario 3: Make Filesystem Read-Only”# Add read-only filesystem with required writable mountscat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: readonly-nginxspec: containers: - name: nginx image: nginx securityContext: readOnlyRootFilesystem: true volumeMounts: - name: cache mountPath: /var/cache/nginx - name: run mountPath: /var/run volumes: - name: cache emptyDir: {} - name: run emptyDir: {}EOFPause and predict: You deploy a pod with
readOnlyRootFilesystem: trueandrunAsNonRoot: true. The application writes temporary files to/tmpand logs to/var/log/app/. Without any volume mounts, which writes will fail and why?
Debugging Security Context Issues
Section titled “Debugging Security Context Issues”# Check pod's effective security contextkubectl get pod mypod -o yaml | grep -A 20 securityContext
# Check container's security contextkubectl get pod mypod -o jsonpath='{.spec.containers[0].securityContext}' | jq .
# Check if pod failed due to security contextkubectl describe pod mypod | grep -i error
# Common errors:# "container has runAsNonRoot and image will run as root"# "unable to write to read-only filesystem"# "operation not permitted" (missing capability)Did You Know?
Section titled “Did You Know?”-
runAsNonRoot doesn’t pick a UID for you. If the image runs as root and you set runAsNonRoot: true without runAsUser, the pod fails to start.
-
Docker drops many capabilities by default. Containers don’t get the full root power set even when running as UID 0—but privileged: true gives everything back.
-
The nobody user is UID 65534 on most systems. It’s a common choice for runAsUser when you don’t have a specific user in mind.
-
fsGroup changes volume ownership. When you mount a volume, Kubernetes changes the group ownership to fsGroup and sets group-writable permissions.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Hurts | Solution |
|---|---|---|
| runAsNonRoot without runAsUser | Pod fails if image defaults to root | Specify runAsUser: 1000 |
| Using privileged: true | Full host access | Never use unless required |
| Not dropping capabilities | Excessive permissions | Drop ALL, add what’s needed |
| Forgetting emptyDir with readOnly | App can’t write temp files | Mount writable volumes |
| Only setting pod-level context | Container can override | Set at both levels |
-
You deploy an nginx pod with
runAsNonRoot: truebut without specifyingrunAsUser. The pod fails with “container has runAsNonRoot and image will run as root.” A developer says “just remove runAsNonRoot.” What’s the correct fix that maintains security?Answer
Don't remove `runAsNonRoot` -- add `runAsUser: 1000` (or any non-zero UID) to force the container to run as a non-root user. The nginx official image defaults to root, so `runAsNonRoot: true` correctly catches this. Better yet, use the `nginx:alpine` image with a non-root user configured, or build a custom image with `USER 1000` in the Dockerfile. The `runAsNonRoot` check is a safety net -- it validates at pod admission time that the container won't run as UID 0, preventing privilege escalation if an attacker compromises the application. -
During a security audit, you find a container running with all default Linux capabilities (14 capabilities including NET_RAW, SYS_CHROOT, and SETUID). The application only serves HTTP on port 8080. A penetration tester uses NET_RAW to perform ARP spoofing within the pod network. How should the security context be configured to prevent this?
Answer
Use `capabilities: drop: ["ALL"]` to remove all 14 default capabilities. Since the app runs on port 8080 (above 1024), it doesn't even need `NET_BIND_SERVICE`. The "drop all, add back only what's needed" approach is the gold standard. NET_RAW enables packet crafting and ARP spoofing, SETUID enables privilege escalation, and SYS_CHROOT can be used for container escape techniques. By dropping all capabilities, the penetration tester's ARP spoofing attack is blocked because NET_RAW is unavailable. Always combine with `allowPrivilegeEscalation: false` to prevent regaining capabilities through setuid binaries. -
Your application writes logs to
/var/log/app/, caches data in/tmp/, and needs to update/etc/nginx/conf.d/at startup. You setreadOnlyRootFilesystem: true. The pod crashes. What emptyDir volumes do you need, and is there a path you should NOT make writable?Answer
Mount emptyDir volumes at `/var/log/app/` and `/tmp/` for logs and cache. For `/etc/nginx/conf.d/`, mount an emptyDir there too if nginx must write configs at startup. However, be cautious about making `/etc/` paths writable -- an attacker who compromises the application could modify nginx configuration to redirect traffic or expose sensitive data. The better approach is to use an initContainer to generate configs into a shared emptyDir volume, then mount it read-only in the main container. `readOnlyRootFilesystem: true` is powerful because it prevents attackers from writing malware, scripts, or modified binaries to the container filesystem. -
A multi-container pod has security context set at both pod and container level. The pod-level sets
runAsUser: 1000, but one container setsrunAsUser: 0(root). Which setting takes effect for that container, and how do you prevent individual containers from overriding pod-level security?Answer
The container-level setting wins -- that container runs as root (UID 0) despite the pod-level setting of 1000. Container-level security context always overrides pod-level for the same field. To prevent this override, use Pod Security Admission with the `restricted` profile in `enforce` mode on the namespace. PSA validates the final effective security context (including container-level overrides) and rejects pods where any container runs as root. This is why security contexts alone are insufficient -- they can be set by anyone who can create pods. PSA provides the policy enforcement layer that prevents bypasses.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Create a fully hardened pod.
# Step 1: Create an insecure pod firstcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: insecurespec: containers: - name: app image: nginxEOF
# Step 2: Check its security contextkubectl get pod insecure -o yaml | grep -A 20 securityContext# (Likely empty or minimal)
# Step 3: Create hardened versioncat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: hardenedspec: securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: app image: nginx securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL add: - NET_BIND_SERVICE volumeMounts: - name: cache mountPath: /var/cache/nginx - name: run mountPath: /var/run - name: tmp mountPath: /tmp volumes: - name: cache emptyDir: {} - name: run emptyDir: {} - name: tmp emptyDir: {}EOF
# Step 4: Wait for podskubectl wait --for=condition=Ready pod/hardened --timeout=60s
# Step 5: Verify security contextkubectl get pod hardened -o jsonpath='{.spec.securityContext}' | jq .kubectl get pod hardened -o jsonpath='{.spec.containers[0].securityContext}' | jq .
# Step 6: Test that writing to root filesystem failskubectl exec hardened -- touch /etc/test 2>&1 || echo "Write blocked (expected)"
# Step 7: Test that writable volume workskubectl exec hardened -- touch /tmp/test && echo "Write to /tmp succeeded"
# Cleanupkubectl delete pod insecure hardenedSuccess criteria: Hardened pod runs with all security settings applied.
Summary
Section titled “Summary”Essential Security Context Settings:
runAsNonRoot: true+runAsUser: 1000allowPrivilegeEscalation: falsereadOnlyRootFilesystem: truecapabilities: drop: [ALL]
Never Use:
privileged: true(unless absolutely required)- Running as root without justification
Best Practices:
- Set at both pod and container level
- Drop all capabilities, add selectively
- Mount emptyDir for writable paths
Exam Tips:
- Know the full hardened pod template
- Understand pod vs container context
- Debug common errors
Next Module
Section titled “Next Module”Module 4.2: Pod Security Admission - Enforcing security standards at namespace level.