Skip to content

Module 4.4: Runtime Sandboxing

Hands-On Lab Available
K8s Cluster advanced 30 min
Launch Lab ↗

Opens in Killercoda in a new tab

Complexity: [MEDIUM] - Advanced container isolation

Time to Complete: 40-45 minutes

Prerequisites: Module 4.3 (Secrets Management), container runtime concepts


After completing this module, you will be able to:

  1. Configure gVisor (runsc) and Kata Containers as alternative container runtimes
  2. Deploy workloads with RuntimeClass to select sandboxed runtime environments
  3. Compare isolation guarantees of standard runc, gVisor, and Kata Containers
  4. Evaluate when runtime sandboxing is worth the performance overhead for sensitive workloads

Standard containers share the host kernel directly. If an attacker exploits a kernel vulnerability from within a container, they can escape to the host and compromise all workloads. Runtime sandboxing adds an extra isolation layer between containers and the kernel.

CKS tests your understanding of container isolation techniques.


┌─────────────────────────────────────────────────────────────┐
│ STANDARD CONTAINER ISOLATION │
├─────────────────────────────────────────────────────────────┤
│ │
│ Standard containers (runc): │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Container A │ │ Container B │ │ Container C │ │
│ │ │ │ │ │ (attacker) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ HOST KERNEL │ │
│ │ │ │
│ │ 🎯 Kernel exploit from any container │ │
│ │ = Access to ALL containers and host │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ⚠️ Single point of failure: the shared kernel │
│ │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ RUNTIME SANDBOXING OPTIONS │
├─────────────────────────────────────────────────────────────┤
│ │
│ gVisor (runsc) │
│ ───────────────────────────────────────────────────────── │
│ • User-space kernel written in Go │
│ • Intercepts syscalls, implements in user space │
│ • Low overhead, medium isolation │
│ • Good for: untrusted workloads, multi-tenant │
│ │
│ Kata Containers │
│ ───────────────────────────────────────────────────────── │
│ • Lightweight VM per container │
│ • Real Linux kernel per container │
│ • Higher overhead, maximum isolation │
│ • Good for: strict isolation requirements │
│ │
│ Firecracker │
│ ───────────────────────────────────────────────────────── │
│ • MicroVM technology (used by AWS Lambda) │
│ • Minimal virtual machine monitor │
│ • Fast boot, small footprint │
│ │
└─────────────────────────────────────────────────────────────┘

Stop and think: Standard containers share the host kernel directly — all 300+ syscalls go straight to the kernel. gVisor intercepts these syscalls and reimplements them in userspace. What does this mean for an attacker trying to exploit a kernel vulnerability from inside a gVisor-sandboxed container?

┌─────────────────────────────────────────────────────────────┐
│ gVisor (runsc) ARCHITECTURE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Container │ │
│ │ Application │ │
│ └───────────────────────┬───────────────────────────────┘ │
│ │ syscalls │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ gVisor Sentry (user-space) │ │
│ │ │ │
│ │ • Implements ~300 Linux syscalls │ │
│ │ • Runs in user space, not kernel │ │
│ │ • Written in Go (memory-safe) │ │
│ │ • Can't be exploited by kernel CVEs │ │
│ │ │ │
│ └───────────────────────┬───────────────────────────────┘ │
│ │ limited syscalls │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Host Kernel │ │
│ │ │ │
│ │ Sentry only uses ~50 syscalls from host │ │
│ │ Much smaller attack surface │ │
│ │ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ KATA CONTAINERS ARCHITECTURE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Container A │ │ Container B │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Guest VM │ │ Guest VM │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Guest │ │ │ │ Guest │ │ │
│ │ │ Kernel │ │ │ │ Kernel │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └────────┬───────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Hypervisor (QEMU/Cloud Hypervisor) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Host Kernel │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Each container has its own kernel - full isolation │
│ │
└─────────────────────────────────────────────────────────────┘

What would happen if: You deploy a high-performance database (PostgreSQL) inside a gVisor sandbox. The database uses memory-mapped files and direct I/O heavily. Would you expect the same performance as runc, and what trade-off are you making?

Kubernetes uses RuntimeClass to specify which container runtime to use.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc # Name in containerd config
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata-qemu # Name in containerd config
apiVersion: v1
kind: Pod
metadata:
name: sandboxed-pod
spec:
runtimeClassName: gvisor # Use gVisor instead of runc
containers:
- name: app
image: nginx

Terminal window
# Add gVisor repository (Debian/Ubuntu)
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list
# Install
sudo apt update && sudo apt install -y runsc
# Verify
runsc --version
/etc/containerd/config.toml
# Add after [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
TypeUrl = "io.containerd.runsc.v1.options"
Terminal window
sudo systemctl restart containerd
Terminal window
cat <<EOF | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
EOF

apiVersion: v1
kind: Pod
metadata:
name: gvisor-test
spec:
runtimeClassName: gvisor
containers:
- name: test
image: nginx
Terminal window
# Create the pod
kubectl apply -f gvisor-pod.yaml
# Check runtime
kubectl get pod gvisor-test -o jsonpath='{.spec.runtimeClassName}'
# Output: gvisor
# Inside the container, check kernel version
kubectl exec gvisor-test -- uname -a
# Output shows "gVisor" instead of host kernel version
# Check dmesg (gVisor intercepts this)
kubectl exec gvisor-test -- dmesg 2>&1 | head -5
# Output shows gVisor's simulated kernel messages

┌─────────────────────────────────────────────────────────────┐
│ gVisor LIMITATIONS │
├─────────────────────────────────────────────────────────────┤
│ │
│ Not all syscalls supported: │
│ ├── Some advanced syscalls not implemented │
│ ├── May break certain applications │
│ └── Check compatibility before using │
│ │
│ Performance overhead: │
│ ├── ~5-15% for compute workloads │
│ ├── Higher for I/O intensive workloads │
│ └── Syscall interception has cost │
│ │
│ Not compatible with: │
│ ├── Host networking (hostNetwork: true) │
│ ├── Host PID namespace (hostPID: true) │
│ ├── Privileged containers │
│ └── Some volume types │
│ │
│ Good for: │
│ ├── Web applications │
│ ├── Microservices │
│ ├── Untrusted workloads │
│ └── Multi-tenant environments │
│ │
└─────────────────────────────────────────────────────────────┘

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
scheduling:
nodeSelector:
gvisor.kubernetes.io/enabled: "true" # Only schedule on these nodes
tolerations:
- key: "gvisor"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Terminal window
# Label nodes that have gVisor installed
kubectl label node worker1 gvisor.kubernetes.io/enabled=true
# Now pods with runtimeClassName: gvisor will only schedule on labeled nodes

Scenario 1: Create RuntimeClass and Use It

Section titled “Scenario 1: Create RuntimeClass and Use It”
Terminal window
# Step 1: Create RuntimeClass
cat <<EOF | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
EOF
# Step 2: Create pod using RuntimeClass
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: untrusted-workload
spec:
runtimeClassName: gvisor
containers:
- name: app
image: nginx
EOF
# Step 3: Verify
kubectl get pod untrusted-workload -o yaml | grep runtimeClassName
kubectl exec untrusted-workload -- uname -a # Shows gVisor

Scenario 2: Identify Pods Not Using Sandboxing

Section titled “Scenario 2: Identify Pods Not Using Sandboxing”
Terminal window
# Find all pods without runtimeClassName
kubectl get pods -A -o json | jq -r '
.items[] |
select(.spec.runtimeClassName == null) |
"\(.metadata.namespace)/\(.metadata.name)"
'
# Find pods with specific RuntimeClass
kubectl get pods -A -o json | jq -r '
.items[] |
select(.spec.runtimeClassName == "gvisor") |
"\(.metadata.namespace)/\(.metadata.name)"
'

Scenario 3: Enforce RuntimeClass for Namespace

Section titled “Scenario 3: Enforce RuntimeClass for Namespace”
# Use a ValidatingAdmissionPolicy (K8s 1.28+) or OPA/Gatekeeper
# Example with namespace annotation for documentation
apiVersion: v1
kind: Namespace
metadata:
name: untrusted-workloads
labels:
security.kubernetes.io/sandbox-required: "true"

Pause and predict: Your cluster runs both trusted internal microservices and untrusted customer-submitted code (like a CI/CD runner). Which workloads benefit most from runtime sandboxing, and would you sandbox everything or just specific workloads?

┌───────────────────────────────────────────────────────────────────┐
│ RUNTIME COMPARISON │
├─────────────────┬─────────────────┬─────────────────┬─────────────┤
│ Feature │ runc (default) │ gVisor │ Kata │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Isolation │ Namespaces only │ User-space │ VM per pod │
│ │ │ kernel │ │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Kernel sharing │ Shared │ Intercepted │ Not shared │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Overhead │ Minimal │ Low-Medium │ Medium-High │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Boot time │ ~100ms │ ~200ms │ ~500ms │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Memory │ Low │ Low-Medium │ Higher │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Compatibility │ Full │ Most apps │ Most apps │
├─────────────────┼─────────────────┼─────────────────┼─────────────┤
│ Use case │ General │ Untrusted │ High │
│ │ │ workloads │ security │
└─────────────────┴─────────────────┴─────────────────┴─────────────┘

  • gVisor was developed by Google and is used in Google Cloud Run and other GCP services. It intercepts about 300 Linux syscalls and implements them in user space.

  • Kata Containers merged from Intel Clear Containers and Hyper runV. It uses the same OCI interface as runc, so it’s a drop-in replacement.

  • The handler name in RuntimeClass must match the runtime name configured in containerd/CRI-O. Common names: runsc (gVisor), kata-qemu or kata (Kata).

  • AWS Fargate uses Firecracker, another micro-VM technology similar to Kata but optimized for fast boot times.


MistakeWhy It HurtsSolution
Wrong handler namePod fails to scheduleMatch containerd config
No RuntimeClassUses default runcCreate RuntimeClass first
gVisor on incompatible workloadApp crashesTest compatibility first
Missing node selectorSchedules on wrong nodeUse scheduling in RuntimeClass
Expecting full syscall supportApp failsCheck gVisor syscall table

  1. A critical kernel CVE is announced that allows container escape via a specific syscall. Your cluster runs 200 pods with standard runc and 10 pods with gVisor. Which pods are vulnerable, and why does gVisor protect against this class of attack?

    Answer The 200 runc pods are vulnerable because their syscalls go directly to the host kernel -- the CVE exploit works directly. The 10 gVisor pods are likely protected because gVisor intercepts syscalls in its own userspace "Sentry" process, reimplementing them without touching the host kernel for most operations. The vulnerable syscall either isn't implemented by gVisor (blocked by default) or is handled in userspace where the kernel exploit doesn't apply. This is gVisor's core security model: reducing the kernel attack surface from 300+ syscalls to ~50 that actually reach the host kernel.
  2. Your team wants to sandbox CI/CD runner pods that execute untrusted customer code. They test with gVisor but the runners fail because they need to build Docker images (which requires mount syscalls and overlayfs). What alternative sandboxing approach would work for this use case?

    Answer Kata Containers would be a better fit. Kata runs each pod in a lightweight VM with its own kernel, providing hardware-level isolation while supporting the full Linux syscall interface (including `mount`). gVisor doesn't support all syscalls needed for container-in-container builds. Alternatively, use rootless BuildKit or Kaniko for image building inside gVisor (they don't need privileged syscalls). Another option is dedicating specific nodes with Kata runtime for CI/CD workloads and using RuntimeClass (`spec.runtimeClassName: kata`) to schedule them appropriately. The trade-off with Kata is higher resource overhead (each pod gets a VM) but full syscall compatibility.
  3. You create a RuntimeClass called gvisor and a pod with runtimeClassName: gvisor. The pod starts on node-1 successfully but fails on node-2 with “handler not found.” What’s the likely cause, and how do you ensure consistent runtime availability?

    Answer The gVisor runtime handler (`runsc`) is installed and configured in containerd on `node-1` but not on `node-2`. RuntimeClass is a cluster-level resource, but the actual runtime binary must be installed on each node. Fix: (1) Install gVisor on all nodes, or (2) Use RuntimeClass `scheduling` field with `nodeSelector` to ensure gVisor pods only schedule on nodes with the runtime installed. Label gVisor-capable nodes (e.g., `runtime/gvisor: "true"`) and set `scheduling.nodeSelector` in the RuntimeClass. This prevents scheduling failures and ensures consistent behavior.
  4. Your security architect says “sandbox everything with gVisor for maximum security.” Your performance team objects because database pods show 30% I/O latency increase under gVisor. How do you balance security and performance across different workload types?

    Answer Don't sandbox everything uniformly. Use a risk-based approach: (1) High-risk workloads (untrusted code execution, public-facing services, multi-tenant workloads) get gVisor or Kata sandboxing via RuntimeClass. (2) Performance-sensitive workloads (databases, caches, message queues) stay on runc but get hardened with seccomp, AppArmor, non-root, read-only filesystem, and dropped capabilities. (3) Internal trusted services get standard security contexts without sandboxing. Create multiple RuntimeClasses (`standard`, `gvisor`, `kata`) and assign them based on workload risk profile. The 30% I/O overhead for databases is unacceptable, but for a web frontend handling untrusted input, it's a worthwhile security trade-off.

Task: Create and use a RuntimeClass for sandboxed workloads.

Terminal window
# Step 1: Check if gVisor is available (on lab environment)
runsc --version 2>/dev/null || echo "gVisor not installed (expected in exam environment)"
# Step 2: Check existing RuntimeClasses
kubectl get runtimeclass
# Step 3: Create RuntimeClass (works even if gVisor not installed)
cat <<EOF | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
EOF
# Step 4: Create pod without sandboxing
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: standard-pod
spec:
containers:
- name: test
image: busybox
command: ["sleep", "3600"]
EOF
# Step 5: Create pod with sandboxing
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: sandboxed-pod
spec:
runtimeClassName: gvisor
containers:
- name: test
image: busybox
command: ["sleep", "3600"]
EOF
# Step 6: Compare pod specs
echo "=== Standard Pod ==="
kubectl get pod standard-pod -o jsonpath='{.spec.runtimeClassName}'
echo ""
echo "=== Sandboxed Pod ==="
kubectl get pod sandboxed-pod -o jsonpath='{.spec.runtimeClassName}'
echo ""
# Step 7: List all RuntimeClasses
kubectl get runtimeclass -o wide
# Cleanup
kubectl delete pod standard-pod sandboxed-pod
kubectl delete runtimeclass gvisor

Success criteria: Understand RuntimeClass configuration and pod assignment.


Why Sandboxing?

  • Containers share host kernel
  • Kernel exploit = escape to host
  • Sandboxing adds isolation layer

gVisor:

  • User-space kernel
  • Intercepts syscalls
  • Low overhead
  • Good for untrusted workloads

Kata Containers:

  • VM per container
  • Full kernel isolation
  • Higher overhead
  • Maximum security

RuntimeClass:

  • Kubernetes abstraction for runtimes
  • Handler matches containerd config
  • Pod uses runtimeClassName

Exam Tips:

  • Know RuntimeClass YAML format
  • Understand gVisor vs Kata tradeoffs
  • Be able to apply RuntimeClass to pods

You’ve finished Minimize Microservice Vulnerabilities (20% of CKS). You now understand:

  • Security Contexts for pods and containers
  • Pod Security Admission standards
  • Secrets management and encryption
  • Runtime sandboxing with gVisor

Next Part: Part 5: Supply Chain Security - Securing container images and the software supply chain.