Module 3.2: Seccomp Profiles
Complexity:
[MEDIUM]- System-level securityTime to Complete: 45-50 minutes
Prerequisites: Module 3.1 (AppArmor), Linux system calls knowledge
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Create custom seccomp profiles that allow only required system calls for a workload
- Configure pods to use seccomp profiles via the securityContext field
- Trace blocked syscalls to diagnose application failures under seccomp enforcement
- Audit running containers for missing or overly permissive seccomp profiles
Why This Module Matters
Section titled “Why This Module Matters”Imagine a nightclub with a strict, unbribable bouncer at the door to the VIP lounge (the Linux kernel). Instead of checking IDs, this bouncer checks every single request a guest (container process) makes. Want to read a file? “Allowed.” Want to change the system clock or trace another process? “Absolutely not.” That bouncer is Seccomp (Secure Computing Mode).
Containers share the host kernel. Because of this, a compromised container can weaponize obscure or dangerous system calls to attack the host or escape its isolation. By strictly limiting which syscalls a process can execute, Seccomp dramatically shrinks the attack surface. CKS tests your ability to configure this bouncer by applying default policies and writing custom rules.
What is Seccomp?
Section titled “What is Seccomp?”┌─────────────────────────────────────────────────────────────┐│ SECCOMP OVERVIEW │├─────────────────────────────────────────────────────────────┤│ ││ Seccomp = Secure Computing Mode ││ ───────────────────────────────────────────────────────── ││ • Linux kernel feature (since 2.6.12) ││ • Filters system calls at kernel level ││ • Very low overhead ││ • Works with Docker, containerd, CRI-O ││ ││ Application ──► syscall ──► Seccomp Filter ──► Kernel ││ │ ││ ┌────────┴────────┐ ││ ▼ ▼ ││ ┌─────────┐ ┌─────────┐ ││ │ ALLOW │ │ BLOCK │ ││ │ execute │ │ or KILL │ ││ └─────────┘ └─────────┘ ││ ││ Actions when syscall matches: ││ • SCMP_ACT_ALLOW - Allow syscall ││ • SCMP_ACT_ERRNO - Block, return error ││ • SCMP_ACT_KILL - Kill the process ││ • SCMP_ACT_TRAP - Send SIGSYS signal ││ • SCMP_ACT_LOG - Log and allow ││ │└─────────────────────────────────────────────────────────────┘Seccomp vs AppArmor
Section titled “Seccomp vs AppArmor”┌─────────────────────────────────────────────────────────────┐│ SECCOMP vs APPARMOR │├─────────────────────────────────────────────────────────────┤│ ││ Seccomp │ AppArmor ││ ──────────────────────────────────────────────────────────││ Filters syscalls │ Filters file/network access ││ Very low level │ Higher level abstraction ││ JSON profiles │ Text-based profiles ││ No file path awareness │ File path based rules ││ Lightweight │ More complex rules ││ Defense in depth │ Defense in depth ││ ││ Best practice: Use BOTH together ││ Seccomp: Block dangerous syscalls ││ AppArmor: Control resource access ││ │└─────────────────────────────────────────────────────────────┘Stop and think: A container application only needs about 40-50 system calls out of 300+ available in the Linux kernel. The rest are potential attack surface. If you set
defaultAction: SCMP_ACT_ERRNO(deny all by default) and only allow the 50 syscalls your app needs, what percentage of the kernel’s syscall attack surface have you eliminated?
Default Seccomp Profile
Section titled “Default Seccomp Profile”War Story: Stopping Dirty COW and Container Escapes In 2016, the “Dirty COW” vulnerability (CVE-2016-5195) allowed privilege escalation via the
ptracesystem call. Attackers who compromised a container could useptraceto manipulate host processes and break out. Simply having a Seccomp profile that blockedptracestopped this container escape dead in its tracks, long before patches were applied.
Kubernetes 1.22+ applies the RuntimeDefault profile by default when Pod Security Admission is configured.
# Check if default seccomp is appliedkubectl get pod mypod -o jsonpath='{.spec.securityContext.seccompProfile}'
# The RuntimeDefault profile typically blocks:# - keyctl (kernel keyring)# - ptrace (process tracing)# - personality (change execution domain)# - unshare (namespace manipulation)# - mount/umount (filesystem mounting)# - clock_settime (change system time)# And about 40+ other dangerous syscallsOperational Overhead: Custom vs. RuntimeDefault
Section titled “Operational Overhead: Custom vs. RuntimeDefault”Writing custom Seccomp profiles for every application offers the absolute lowest attack surface, but it comes with immense operational overhead. Every time an application updates a library or changes its behavior, it might need a new syscall (like epoll_wait instead of select), instantly crashing the app in production.
For 95% of workloads, the RuntimeDefault profile strikes the perfect balance. It automatically blocks the ~40 most dangerous system calls (like ptrace, mount, and kexec_load used for container escapes) while allowing the standard ~260 syscalls that normal applications need. You should only maintain custom profiles for highly sensitive, static workloads where the exact system call footprint is known and heavily tested.
Seccomp Profile Location
Section titled “Seccomp Profile Location”# Kubernetes looks for profiles in:/var/lib/kubelet/seccomp/
# Profile path in pod spec is relative to this directory# Example: profiles/my-profile.json# Full path: /var/lib/kubelet/seccomp/profiles/my-profile.json
# Create directory if it doesn't existsudo mkdir -p /var/lib/kubelet/seccomp/profilesProfile Structure
Section titled “Profile Structure”{ "defaultAction": "SCMP_ACT_ERRNO", "architectures": [ "SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_AARCH64" ], "syscalls": [ { "names": [ "accept", "access", "arch_prctl", "bind", "brk" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "ptrace" ], "action": "SCMP_ACT_ERRNO", "errnoRet": 1 } ]}Profile Fields Explained
Section titled “Profile Fields Explained”┌─────────────────────────────────────────────────────────────┐│ SECCOMP PROFILE FIELDS │├─────────────────────────────────────────────────────────────┤│ ││ defaultAction ││ └── What to do for syscalls not explicitly listed ││ SCMP_ACT_ALLOW = allow by default (whitelist others) ││ SCMP_ACT_ERRNO = deny by default (blacklist others) ││ ││ architectures ││ └── CPU architectures to apply (x86_64, arm64, etc.) ││ ││ syscalls ││ └── Array of syscall rules: ││ names: ["syscall1", "syscall2"] ││ action: SCMP_ACT_ALLOW | SCMP_ACT_ERRNO | etc. ││ errnoRet: error number to return (optional) ││ args: filter on syscall arguments (optional) ││ │└─────────────────────────────────────────────────────────────┘Applying Seccomp in Kubernetes
Section titled “Applying Seccomp in Kubernetes”Method 1: Pod Security Context (Recommended)
Section titled “Method 1: Pod Security Context (Recommended)”apiVersion: v1kind: Podmetadata: name: seccomp-podspec: securityContext: seccompProfile: type: RuntimeDefault # Use runtime's default profile containers: - name: app image: nginxMethod 2: Localhost Profile
Section titled “Method 2: Localhost Profile”apiVersion: v1kind: Podmetadata: name: custom-seccomp-podspec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/custom.json # Relative to /var/lib/kubelet/seccomp/ containers: - name: app image: nginxMethod 3: Container-Level Profile
Section titled “Method 3: Container-Level Profile”apiVersion: v1kind: Podmetadata: name: multi-container-podspec: containers: - name: app image: nginx securityContext: seccompProfile: type: RuntimeDefault - name: sidecar image: busybox securityContext: seccompProfile: type: Localhost localhostProfile: profiles/sidecar.jsonSeccomp Profile Types
Section titled “Seccomp Profile Types”# RuntimeDefault - Container runtime's default profileseccompProfile: type: RuntimeDefault
# Localhost - Custom profile from node filesystemseccompProfile: type: Localhost localhostProfile: profiles/my-profile.json
# Unconfined - No seccomp filtering (dangerous!)seccompProfile: type: UnconfinedWhat would happen if: You create a custom seccomp profile and place it in
/etc/seccomp/profiles/custom.jsonon the node. Your pod spec referenceslocalhostProfile: profiles/custom.json. The pod fails to start. The profile JSON is valid. What path mistake did you make?
Creating Custom Profiles
Section titled “Creating Custom Profiles”Profile That Blocks ptrace
Section titled “Profile That Blocks ptrace”{ "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["ptrace"], "action": "SCMP_ACT_ERRNO", "errnoRet": 1 } ]}Profile That Only Allows Specific Syscalls
Section titled “Profile That Only Allows Specific Syscalls”{ "defaultAction": "SCMP_ACT_ERRNO", "architectures": ["SCMP_ARCH_X86_64"], "syscalls": [ { "names": [ "read", "write", "open", "close", "fstat", "lseek", "mmap", "mprotect", "munmap", "brk", "exit_group" ], "action": "SCMP_ACT_ALLOW" } ]}Profile That Logs Suspicious Calls
Section titled “Profile That Logs Suspicious Calls”{ "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["ptrace", "process_vm_readv", "process_vm_writev"], "action": "SCMP_ACT_LOG" }, { "names": ["mount", "umount2", "pivot_root"], "action": "SCMP_ACT_ERRNO" } ]}Real Exam Scenarios
Section titled “Real Exam Scenarios”Scenario 1: Apply RuntimeDefault
Section titled “Scenario 1: Apply RuntimeDefault”# Create pod with RuntimeDefault seccompcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: secure-podspec: securityContext: seccompProfile: type: RuntimeDefault containers: - name: app image: nginxEOF
# Verifykubectl get pod secure-pod -o jsonpath='{.spec.securityContext.seccompProfile}' | jq .Scenario 2: Apply Custom Profile
Section titled “Scenario 2: Apply Custom Profile”# Create profile on nodesudo mkdir -p /var/lib/kubelet/seccomp/profilessudo tee /var/lib/kubelet/seccomp/profiles/block-chmod.json << 'EOF'{ "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["chmod", "fchmod", "fchmodat"], "action": "SCMP_ACT_ERRNO", "errnoRet": 1 } ]}EOF
# Apply to podcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: no-chmod-podspec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/block-chmod.json containers: - name: app image: busybox command: ["sleep", "3600"]EOF
# Test chmod is blockedkubectl exec no-chmod-pod -- chmod 777 /tmp# Should fail with "Operation not permitted"Scenario 3: Debug Seccomp Issues
Section titled “Scenario 3: Debug Seccomp Issues”# Check if seccomp is appliedkubectl get pod mypod -o yaml | grep -A5 seccompProfile
# Check node audit logs for seccomp denialssudo dmesg | grep -i seccompsudo journalctl | grep -i seccomp
# Common error messages# "seccomp: syscall X denied"# "operation not permitted"Pause and predict: You apply a seccomp profile with
defaultAction: SCMP_ACT_KILLinstead ofSCMP_ACT_ERRNO. Your application makes an unlisted syscall. What happens to the container process compared to usingSCMP_ACT_ERRNO?
Finding Syscalls Used by Application
Section titled “Finding Syscalls Used by Application”# Use strace to find syscalls (on a test system, not production)strace -c -f <command>
# Example output:# % time seconds usecs/call calls errors syscall# ------ ----------- ----------- --------- --------- ----------------# 25.00 0.000010 0 50 read# 25.00 0.000010 0 30 write# 12.50 0.000005 0 20 open# ...
# Or use sysdigsysdig -p "%proc.name %syscall.type" container.name=mycontainerDid You Know?
Section titled “Did You Know?”-
Docker’s default seccomp profile blocks about 44 syscalls out of 300+. It’s a good baseline but may need customization.
-
Seccomp-bpf (Berkeley Packet Filter) is the modern implementation. It allows complex filtering logic beyond simple allow/deny.
-
Breaking a seccomp profile is extremely difficult. Unlike AppArmor which can be tricked with symlinks sometimes, seccomp operates at syscall level.
-
The
RuntimeDefaultprofile became default in Kubernetes 1.22 with Pod Security Admission. Before that, containers ran unconfined.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Hurts | Solution |
|---|---|---|
| Profile path wrong | Pod fails to start | Check /var/lib/kubelet/seccomp/ |
| Missing syscall | App crashes | Audit with strace first |
| Using Unconfined | No protection | Use RuntimeDefault minimum |
| Profile not on all nodes | Pod scheduling fails | Use DaemonSet to deploy |
| JSON syntax error | Profile fails to load | Validate JSON |
-
An application container keeps crashing with “operation not permitted” errors. The pod has a custom seccomp profile applied. The same container runs fine without the profile. How do you identify which syscalls the profile is blocking, and what’s the safest debugging approach?
Answer
Use `SCMP_ACT_LOG` as the default action temporarily -- this allows all syscalls but logs the ones that would have been blocked. Check kernel logs with `dmesg | grep seccomp` or `journalctl | grep seccomp` to see which syscalls are being denied. Alternatively, use `strace -c -f` on a test system to enumerate all syscalls the application uses. Once you know the needed syscalls, add them to the allow list. Never debug in production by switching to `Unconfined` -- use `SCMP_ACT_LOG` to maintain visibility while temporarily allowing traffic. The safest approach is to run `strace` in a staging environment and build the profile from that data. -
During a CKS exam, you create a seccomp profile at
/var/lib/kubelet/seccomp/profiles/block-mount.jsonand reference it aslocalhostProfile: block-mount.jsonin the pod spec. The pod entersCreateContainerError. What’s wrong with the path?Answer
The `localhostProfile` path is relative to `/var/lib/kubelet/seccomp/`, so the full path Kubernetes looks for is `/var/lib/kubelet/seccomp/block-mount.json` -- but your file is at `/var/lib/kubelet/seccomp/profiles/block-mount.json`. The correct reference is `localhostProfile: profiles/block-mount.json` (include the `profiles/` subdirectory). This is a common exam gotcha because the path is relative, not absolute. Always verify the file exists at the expected full path: `ls /var/lib/kubelet/seccomp/`. -
Your security team wants to block the
ptracesyscall cluster-wide because it enables container escape techniques. You have 50 namespaces with different workloads. What’s the most efficient way to enforce this without creating 50 individual seccomp profiles?Answer
Use the `RuntimeDefault` seccomp profile which already blocks `ptrace` (along with ~44 other dangerous syscalls). Apply it cluster-wide by configuring Pod Security Admission with the `restricted` profile in `enforce` mode on all workload namespaces -- this requires `RuntimeDefault` or `Localhost` seccomp. Alternatively, create a single custom profile that uses `defaultAction: SCMP_ACT_ALLOW` with only `ptrace` blocked, and deploy it to all nodes via a DaemonSet. Then reference it at the pod level. The `RuntimeDefault` approach is simpler and blocks more than just ptrace, providing broader security. -
You have a multi-container pod with an nginx reverse proxy and a Python application. The nginx container needs
accept,bind, andlistensyscalls for networking. The Python container needsforkandexecvefor subprocesses. Can you apply different seccomp profiles to each container, and how?Answer
Yes, seccomp profiles can be set at the container level, not just the pod level. Set `securityContext.seccompProfile` on each container individually: nginx gets a profile allowing network-related syscalls, Python gets a profile allowing process-related syscalls. Place each profile in `/var/lib/kubelet/seccomp/profiles/` and reference them separately. If set at both pod and container level, the container-level setting takes precedence. This follows least privilege -- each container only gets the syscalls it needs, reducing attack surface. A compromised nginx container can't fork subprocesses, and a compromised Python container can't bind to ports.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Create and apply a seccomp profile that blocks ptrace syscall.
# Step 1: Create profile directory on nodesudo mkdir -p /var/lib/kubelet/seccomp/profiles
# Step 2: Create the profilesudo tee /var/lib/kubelet/seccomp/profiles/no-ptrace.json << 'EOF'{ "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "names": ["ptrace"], "action": "SCMP_ACT_ERRNO", "errnoRet": 1 } ]}EOF
# Step 3: Verify file existscat /var/lib/kubelet/seccomp/profiles/no-ptrace.json
# Step 4: Create pod with the profilecat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: no-ptrace-podspec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/no-ptrace.json containers: - name: app image: busybox command: ["sleep", "3600"]EOF
# Step 5: Wait for podkubectl wait --for=condition=Ready pod/no-ptrace-pod --timeout=60s
# Step 6: Verify seccomp is appliedkubectl get pod no-ptrace-pod -o jsonpath='{.spec.securityContext.seccompProfile}' | jq .
# Step 7: Test that ptrace would be blocked# (strace uses ptrace internally)kubectl exec no-ptrace-pod -- strace -f echo test 2>&1 || echo "strace blocked (expected)"
# Step 8: Create comparison pod without seccomp restrictionkubectl run allowed-pod --image=busybox --rm -it --restart=Never -- \ sh -c "ls /proc/self/status && echo 'No seccomp issues'"
# Cleanupkubectl delete pod no-ptrace-podSuccess criteria: Pod runs but ptrace operations are blocked.
Summary
Section titled “Summary”Seccomp Basics:
- Linux kernel syscall filter
- JSON-based profiles
- Low overhead, high security
Profile Types:
RuntimeDefault- Use runtime’s defaultLocalhost- Custom profileUnconfined- No filtering (avoid!)
Profile Location:
/var/lib/kubelet/seccomp/- Path in pod spec is relative
Actions:
SCMP_ACT_ALLOW- Allow syscallSCMP_ACT_ERRNO- Block with errorSCMP_ACT_KILL- Kill process
Exam Tips:
- Know profile syntax
- Practice creating profiles
- Understand RuntimeDefault
Next Module
Section titled “Next Module”Module 3.3: Linux Kernel Hardening - Reducing OS attack surface.