Module 4.4: Runtime Sandboxing
Complexity:
[MEDIUM]- Advanced container isolationTime to Complete: 40-45 minutes
Prerequisites: Module 4.3 (Secrets Management), container runtime concepts
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure gVisor (runsc) and Kata Containers as alternative container runtimes
- Deploy workloads with RuntimeClass to select sandboxed runtime environments
- Compare isolation guarantees of standard runc, gVisor, and Kata Containers
- Evaluate when runtime sandboxing is worth the performance overhead for sensitive workloads
Why This Module Matters
Section titled “Why This Module Matters”Standard containers share the host kernel directly. If an attacker exploits a kernel vulnerability from within a container, they can escape to the host and compromise all workloads. Runtime sandboxing adds an extra isolation layer between containers and the kernel.
CKS tests your understanding of container isolation techniques.
The Container Isolation Problem
Section titled “The Container Isolation Problem”┌─────────────────────────────────────────────────────────────┐│ STANDARD CONTAINER ISOLATION │├─────────────────────────────────────────────────────────────┤│ ││ Standard containers (runc): ││ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Container A │ │ Container B │ │ Container C │ ││ │ │ │ │ │ (attacker) │ ││ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ ││ │ │ │ ││ └────────────────┼────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────┐ ││ │ HOST KERNEL │ ││ │ │ ││ │ 🎯 Kernel exploit from any container │ ││ │ = Access to ALL containers and host │ ││ │ │ ││ └──────────────────────────────────────────────────────┘ ││ ││ ⚠️ Single point of failure: the shared kernel ││ │└─────────────────────────────────────────────────────────────┘Sandboxing Solutions
Section titled “Sandboxing Solutions”┌─────────────────────────────────────────────────────────────┐│ RUNTIME SANDBOXING OPTIONS │├─────────────────────────────────────────────────────────────┤│ ││ gVisor (runsc) ││ ───────────────────────────────────────────────────────── ││ • User-space kernel written in Go ││ • Intercepts syscalls, implements in user space ││ • Low overhead, medium isolation ││ • Good for: untrusted workloads, multi-tenant ││ ││ Kata Containers ││ ───────────────────────────────────────────────────────── ││ • Lightweight VM per container ││ • Real Linux kernel per container ││ • Higher overhead, maximum isolation ││ • Good for: strict isolation requirements ││ ││ Firecracker ││ ───────────────────────────────────────────────────────── ││ • MicroVM technology (used by AWS Lambda) ││ • Minimal virtual machine monitor ││ • Fast boot, small footprint ││ │└─────────────────────────────────────────────────────────────┘Stop and think: Standard containers share the host kernel directly — all 300+ syscalls go straight to the kernel. gVisor intercepts these syscalls and reimplements them in userspace. What does this mean for an attacker trying to exploit a kernel vulnerability from inside a gVisor-sandboxed container?
gVisor Architecture
Section titled “gVisor Architecture”┌─────────────────────────────────────────────────────────────┐│ gVisor (runsc) ARCHITECTURE │├─────────────────────────────────────────────────────────────┤│ ││ ┌───────────────────────────────────────────────────────┐ ││ │ Container │ ││ │ Application │ ││ └───────────────────────┬───────────────────────────────┘ ││ │ syscalls ││ ▼ ││ ┌───────────────────────────────────────────────────────┐ ││ │ gVisor Sentry (user-space) │ ││ │ │ ││ │ • Implements ~300 Linux syscalls │ ││ │ • Runs in user space, not kernel │ ││ │ • Written in Go (memory-safe) │ ││ │ • Can't be exploited by kernel CVEs │ ││ │ │ ││ └───────────────────────┬───────────────────────────────┘ ││ │ limited syscalls ││ ▼ ││ ┌───────────────────────────────────────────────────────┐ ││ │ Host Kernel │ ││ │ │ ││ │ Sentry only uses ~50 syscalls from host │ ││ │ Much smaller attack surface │ ││ │ │ ││ └───────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────┘Kata Containers Architecture
Section titled “Kata Containers Architecture”┌─────────────────────────────────────────────────────────────┐│ KATA CONTAINERS ARCHITECTURE │├─────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Container A │ │ Container B │ ││ └────────┬────────┘ └────────┬────────┘ ││ │ │ ││ ▼ ▼ ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Guest VM │ │ Guest VM │ ││ │ ┌───────────┐ │ │ ┌───────────┐ │ ││ │ │ Guest │ │ │ │ Guest │ │ ││ │ │ Kernel │ │ │ │ Kernel │ │ ││ │ └───────────┘ │ │ └───────────┘ │ ││ └────────┬────────┘ └────────┬────────┘ ││ │ │ ││ └────────┬───────────┘ ││ ▼ ││ ┌──────────────────────────────────────────────────────┐ ││ │ Hypervisor (QEMU/Cloud Hypervisor) │ ││ └──────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌──────────────────────────────────────────────────────┐ ││ │ Host Kernel │ ││ └──────────────────────────────────────────────────────┘ ││ ││ Each container has its own kernel - full isolation ││ │└─────────────────────────────────────────────────────────────┘What would happen if: You deploy a high-performance database (PostgreSQL) inside a gVisor sandbox. The database uses memory-mapped files and direct I/O heavily. Would you expect the same performance as runc, and what trade-off are you making?
RuntimeClass
Section titled “RuntimeClass”Kubernetes uses RuntimeClass to specify which container runtime to use.
Define RuntimeClass
Section titled “Define RuntimeClass”apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runsc # Name in containerd config---apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: katahandler: kata-qemu # Name in containerd configUse RuntimeClass in Pod
Section titled “Use RuntimeClass in Pod”apiVersion: v1kind: Podmetadata: name: sandboxed-podspec: runtimeClassName: gvisor # Use gVisor instead of runc containers: - name: app image: nginxInstalling gVisor
Section titled “Installing gVisor”On the Node
Section titled “On the Node”# Add gVisor repository (Debian/Ubuntu)curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpgecho "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list
# Installsudo apt update && sudo apt install -y runsc
# Verifyrunsc --versionConfigure containerd
Section titled “Configure containerd”# Add after [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc] runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options] TypeUrl = "io.containerd.runsc.v1.options"Restart containerd
Section titled “Restart containerd”sudo systemctl restart containerdCreate RuntimeClass
Section titled “Create RuntimeClass”cat <<EOF | kubectl apply -f -apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runscEOFUsing gVisor
Section titled “Using gVisor”Create Sandboxed Pod
Section titled “Create Sandboxed Pod”apiVersion: v1kind: Podmetadata: name: gvisor-testspec: runtimeClassName: gvisor containers: - name: test image: nginxVerify gVisor is Running
Section titled “Verify gVisor is Running”# Create the podkubectl apply -f gvisor-pod.yaml
# Check runtimekubectl get pod gvisor-test -o jsonpath='{.spec.runtimeClassName}'# Output: gvisor
# Inside the container, check kernel versionkubectl exec gvisor-test -- uname -a# Output shows "gVisor" instead of host kernel version
# Check dmesg (gVisor intercepts this)kubectl exec gvisor-test -- dmesg 2>&1 | head -5# Output shows gVisor's simulated kernel messagesgVisor Limitations
Section titled “gVisor Limitations”┌─────────────────────────────────────────────────────────────┐│ gVisor LIMITATIONS │├─────────────────────────────────────────────────────────────┤│ ││ Not all syscalls supported: ││ ├── Some advanced syscalls not implemented ││ ├── May break certain applications ││ └── Check compatibility before using ││ ││ Performance overhead: ││ ├── ~5-15% for compute workloads ││ ├── Higher for I/O intensive workloads ││ └── Syscall interception has cost ││ ││ Not compatible with: ││ ├── Host networking (hostNetwork: true) ││ ├── Host PID namespace (hostPID: true) ││ ├── Privileged containers ││ └── Some volume types ││ ││ Good for: ││ ├── Web applications ││ ├── Microservices ││ ├── Untrusted workloads ││ └── Multi-tenant environments ││ │└─────────────────────────────────────────────────────────────┘Scheduling with RuntimeClass
Section titled “Scheduling with RuntimeClass”NodeSelector for RuntimeClass
Section titled “NodeSelector for RuntimeClass”apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runscscheduling: nodeSelector: gvisor.kubernetes.io/enabled: "true" # Only schedule on these nodes tolerations: - key: "gvisor" operator: "Equal" value: "true" effect: "NoSchedule"Ensure Workloads Use Correct Nodes
Section titled “Ensure Workloads Use Correct Nodes”# Label nodes that have gVisor installedkubectl label node worker1 gvisor.kubernetes.io/enabled=true
# Now pods with runtimeClassName: gvisor will only schedule on labeled nodesReal Exam Scenarios
Section titled “Real Exam Scenarios”Scenario 1: Create RuntimeClass and Use It
Section titled “Scenario 1: Create RuntimeClass and Use It”# Step 1: Create RuntimeClasscat <<EOF | kubectl apply -f -apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runscEOF
# Step 2: Create pod using RuntimeClasscat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: untrusted-workloadspec: runtimeClassName: gvisor containers: - name: app image: nginxEOF
# Step 3: Verifykubectl get pod untrusted-workload -o yaml | grep runtimeClassNamekubectl exec untrusted-workload -- uname -a # Shows gVisorScenario 2: Identify Pods Not Using Sandboxing
Section titled “Scenario 2: Identify Pods Not Using Sandboxing”# Find all pods without runtimeClassNamekubectl get pods -A -o json | jq -r ' .items[] | select(.spec.runtimeClassName == null) | "\(.metadata.namespace)/\(.metadata.name)"'
# Find pods with specific RuntimeClasskubectl get pods -A -o json | jq -r ' .items[] | select(.spec.runtimeClassName == "gvisor") | "\(.metadata.namespace)/\(.metadata.name)"'Scenario 3: Enforce RuntimeClass for Namespace
Section titled “Scenario 3: Enforce RuntimeClass for Namespace”# Use a ValidatingAdmissionPolicy (K8s 1.28+) or OPA/Gatekeeper# Example with namespace annotation for documentation
apiVersion: v1kind: Namespacemetadata: name: untrusted-workloads labels: security.kubernetes.io/sandbox-required: "true"Pause and predict: Your cluster runs both trusted internal microservices and untrusted customer-submitted code (like a CI/CD runner). Which workloads benefit most from runtime sandboxing, and would you sandbox everything or just specific workloads?
Comparison: runc vs gVisor vs Kata
Section titled “Comparison: runc vs gVisor vs Kata”┌───────────────────────────────────────────────────────────────────┐│ RUNTIME COMPARISON │├─────────────────┬─────────────────┬─────────────────┬─────────────┤│ Feature │ runc (default) │ gVisor │ Kata │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Isolation │ Namespaces only │ User-space │ VM per pod ││ │ │ kernel │ │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Kernel sharing │ Shared │ Intercepted │ Not shared │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Overhead │ Minimal │ Low-Medium │ Medium-High │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Boot time │ ~100ms │ ~200ms │ ~500ms │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Memory │ Low │ Low-Medium │ Higher │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Compatibility │ Full │ Most apps │ Most apps │├─────────────────┼─────────────────┼─────────────────┼─────────────┤│ Use case │ General │ Untrusted │ High ││ │ │ workloads │ security │└─────────────────┴─────────────────┴─────────────────┴─────────────┘Did You Know?
Section titled “Did You Know?”-
gVisor was developed by Google and is used in Google Cloud Run and other GCP services. It intercepts about 300 Linux syscalls and implements them in user space.
-
Kata Containers merged from Intel Clear Containers and Hyper runV. It uses the same OCI interface as runc, so it’s a drop-in replacement.
-
The handler name in RuntimeClass must match the runtime name configured in containerd/CRI-O. Common names:
runsc(gVisor),kata-qemuorkata(Kata). -
AWS Fargate uses Firecracker, another micro-VM technology similar to Kata but optimized for fast boot times.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Hurts | Solution |
|---|---|---|
| Wrong handler name | Pod fails to schedule | Match containerd config |
| No RuntimeClass | Uses default runc | Create RuntimeClass first |
| gVisor on incompatible workload | App crashes | Test compatibility first |
| Missing node selector | Schedules on wrong node | Use scheduling in RuntimeClass |
| Expecting full syscall support | App fails | Check gVisor syscall table |
-
A critical kernel CVE is announced that allows container escape via a specific syscall. Your cluster runs 200 pods with standard runc and 10 pods with gVisor. Which pods are vulnerable, and why does gVisor protect against this class of attack?
Answer
The 200 runc pods are vulnerable because their syscalls go directly to the host kernel -- the CVE exploit works directly. The 10 gVisor pods are likely protected because gVisor intercepts syscalls in its own userspace "Sentry" process, reimplementing them without touching the host kernel for most operations. The vulnerable syscall either isn't implemented by gVisor (blocked by default) or is handled in userspace where the kernel exploit doesn't apply. This is gVisor's core security model: reducing the kernel attack surface from 300+ syscalls to ~50 that actually reach the host kernel. -
Your team wants to sandbox CI/CD runner pods that execute untrusted customer code. They test with gVisor but the runners fail because they need to build Docker images (which requires
mountsyscalls andoverlayfs). What alternative sandboxing approach would work for this use case?Answer
Kata Containers would be a better fit. Kata runs each pod in a lightweight VM with its own kernel, providing hardware-level isolation while supporting the full Linux syscall interface (including `mount`). gVisor doesn't support all syscalls needed for container-in-container builds. Alternatively, use rootless BuildKit or Kaniko for image building inside gVisor (they don't need privileged syscalls). Another option is dedicating specific nodes with Kata runtime for CI/CD workloads and using RuntimeClass (`spec.runtimeClassName: kata`) to schedule them appropriately. The trade-off with Kata is higher resource overhead (each pod gets a VM) but full syscall compatibility. -
You create a RuntimeClass called
gvisorand a pod withruntimeClassName: gvisor. The pod starts onnode-1successfully but fails onnode-2with “handler not found.” What’s the likely cause, and how do you ensure consistent runtime availability?Answer
The gVisor runtime handler (`runsc`) is installed and configured in containerd on `node-1` but not on `node-2`. RuntimeClass is a cluster-level resource, but the actual runtime binary must be installed on each node. Fix: (1) Install gVisor on all nodes, or (2) Use RuntimeClass `scheduling` field with `nodeSelector` to ensure gVisor pods only schedule on nodes with the runtime installed. Label gVisor-capable nodes (e.g., `runtime/gvisor: "true"`) and set `scheduling.nodeSelector` in the RuntimeClass. This prevents scheduling failures and ensures consistent behavior. -
Your security architect says “sandbox everything with gVisor for maximum security.” Your performance team objects because database pods show 30% I/O latency increase under gVisor. How do you balance security and performance across different workload types?
Answer
Don't sandbox everything uniformly. Use a risk-based approach: (1) High-risk workloads (untrusted code execution, public-facing services, multi-tenant workloads) get gVisor or Kata sandboxing via RuntimeClass. (2) Performance-sensitive workloads (databases, caches, message queues) stay on runc but get hardened with seccomp, AppArmor, non-root, read-only filesystem, and dropped capabilities. (3) Internal trusted services get standard security contexts without sandboxing. Create multiple RuntimeClasses (`standard`, `gvisor`, `kata`) and assign them based on workload risk profile. The 30% I/O overhead for databases is unacceptable, but for a web frontend handling untrusted input, it's a worthwhile security trade-off.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Create and use a RuntimeClass for sandboxed workloads.
# Step 1: Check if gVisor is available (on lab environment)runsc --version 2>/dev/null || echo "gVisor not installed (expected in exam environment)"
# Step 2: Check existing RuntimeClasseskubectl get runtimeclass
# Step 3: Create RuntimeClass (works even if gVisor not installed)cat <<EOF | kubectl apply -f -apiVersion: node.k8s.io/v1kind: RuntimeClassmetadata: name: gvisorhandler: runscEOF
# Step 4: Create pod without sandboxingcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: standard-podspec: containers: - name: test image: busybox command: ["sleep", "3600"]EOF
# Step 5: Create pod with sandboxingcat <<EOF | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: sandboxed-podspec: runtimeClassName: gvisor containers: - name: test image: busybox command: ["sleep", "3600"]EOF
# Step 6: Compare pod specsecho "=== Standard Pod ==="kubectl get pod standard-pod -o jsonpath='{.spec.runtimeClassName}'echo ""
echo "=== Sandboxed Pod ==="kubectl get pod sandboxed-pod -o jsonpath='{.spec.runtimeClassName}'echo ""
# Step 7: List all RuntimeClasseskubectl get runtimeclass -o wide
# Cleanupkubectl delete pod standard-pod sandboxed-podkubectl delete runtimeclass gvisorSuccess criteria: Understand RuntimeClass configuration and pod assignment.
Summary
Section titled “Summary”Why Sandboxing?
- Containers share host kernel
- Kernel exploit = escape to host
- Sandboxing adds isolation layer
gVisor:
- User-space kernel
- Intercepts syscalls
- Low overhead
- Good for untrusted workloads
Kata Containers:
- VM per container
- Full kernel isolation
- Higher overhead
- Maximum security
RuntimeClass:
- Kubernetes abstraction for runtimes
- Handler matches containerd config
- Pod uses
runtimeClassName
Exam Tips:
- Know RuntimeClass YAML format
- Understand gVisor vs Kata tradeoffs
- Be able to apply RuntimeClass to pods
Part 4 Complete!
Section titled “Part 4 Complete!”You’ve finished Minimize Microservice Vulnerabilities (20% of CKS). You now understand:
- Security Contexts for pods and containers
- Pod Security Admission standards
- Secrets management and encryption
- Runtime sandboxing with gVisor
Next Part: Part 5: Supply Chain Security - Securing container images and the software supply chain.