Policy as Code & Governance
Цей контент ще не доступний вашою мовою.
Policy as Code & Governance
Section titled “Policy as Code & Governance”Kubernetes RBAC controls who can perform an action, but it cannot inspect the payload of that action. If a developer has permission to create a Deployment, RBAC cannot prevent them from running the container as root, using the latest image tag, or mounting the host filesystem.
Policy as Code fills this gap by intercepting API requests and evaluating them against predefined rules. On bare-metal environments, where you lack cloud provider guardrails (like AWS IAM roles for service accounts or managed security hubs), robust cluster governance and runtime enforcement are mandatory to prevent node compromise and lateral movement.
Learning Outcomes
Section titled “Learning Outcomes”- Evaluate and select between OPA Gatekeeper and Kyverno based on operational constraints and declarative paradigms.
- Deploy and configure validating and mutating admission webhooks without degrading control plane latency.
- Implement runtime security enforcement policies using eBPF-based tooling (Falco and Tetragon).
- Design deterministic exemption workflows that balance developer velocity with strict security guardrails.
- Construct automated CI/CD pipelines to test and validate policies against static manifests before cluster deployment.
The Governance Architecture
Section titled “The Governance Architecture”Kubernetes implements policy primarily through Admission Controllers. When an authenticated and authorized API request reaches the API server, it passes through two sequential webhook phases before persisting to etcd.
sequenceDiagram participant User participant API Server participant Mutating Webhook participant Object Schema Validation participant Validating Webhook participant Etcd
User->>API Server: POST /api/v1/pods API Server->>Mutating Webhook: AdmissionReview (Request) Mutating Webhook-->>API Server: AdmissionReview (Patch) API Server->>Object Schema Validation: Validate OpenAPI v3 schema Object Schema Validation-->>API Server: Schema OK API Server->>Validating Webhook: AdmissionReview (Request) Validating Webhook-->>API Server: AdmissionReview (Allowed/Denied) API Server->>Etcd: Persist Object API Server-->>User: 201 Created (or 403 Forbidden)- Mutating Admission: Modifies the incoming object (e.g., injecting sidecars, appending default labels).
- Validating Admission: Inspects the final state of the object and strictly allows or denies the request.
Admission Control: OPA Gatekeeper vs Kyverno
Section titled “Admission Control: OPA Gatekeeper vs Kyverno”The ecosystem has converged on two primary engines for Kubernetes admission control: OPA Gatekeeper and Kyverno.
OPA Gatekeeper
Section titled “OPA Gatekeeper”Gatekeeper is a Kubernetes-specific implementation of the Open Policy Agent (OPA). It uses Rego, a purpose-built query language, to evaluate policies.
Gatekeeper separates policy logic from policy instantiation:
- ConstraintTemplate: The CRD defining the Rego logic and the schema for parameters.
- Constraint: The CRD that instantiates the template, binding it to specific Kubernetes resources and supplying parameters.
Example ConstraintTemplate (Rego):
apiVersion: templates.gatekeeper.sh/v1kind: ConstraintTemplatemetadata: name: k8srequiredlabelsspec: crd: spec: names: kind: K8sRequiredLabels validation: openAPIV3Schema: type: object properties: labels: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredlabels violation[{"msg": msg, "details": {"missing_labels": missing}}] { provided := {label | input.review.object.metadata.labels[label]} required := {label | label := input.parameters.labels[_]} missing := required - provided count(missing) > 0 msg := sprintf("you must provide labels: %v", [missing]) }Kyverno
Section titled “Kyverno”Kyverno is designed specifically for Kubernetes. Instead of a bespoke language like Rego, policies are written as native Kubernetes YAML using overlays, variables, and wildcards.
Example Kyverno Policy (YAML):
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-labelsspec: validationFailureAction: Enforce rules: - name: check-for-labels match: any: - resources: kinds: - Pod validate: message: "The label `app.kubernetes.io/name` is required." pattern: metadata: labels: app.kubernetes.io/name: "?*"Comparison Matrix
Section titled “Comparison Matrix”| Feature | OPA Gatekeeper | Kyverno |
|---|---|---|
| Language | Rego (Declarative Query Language) | Native YAML |
| Learning Curve | High (Requires learning Rego) | Low (Familiar to K8s engineers) |
| Mutation | Supported (via distinct Mutation CRDs) | Native, highly capable (JSONPatches) |
| External Data | external_data provider framework | API calls via context natively in YAML |
| Generation | Not supported natively | Native (can generate RoleBindings, ConfigMaps) |
| Performance | Extremely high (Rego is optimized) | Moderate (Heavy regex/API calls can slow it) |
Architectural Recommendation: For pure validation with complex logical conditions across multiple data structures, Gatekeeper’s Rego is mathematically safer and heavily tested. For teams prioritizing speed of policy authoring, mutation, and resource generation, Kyverno is preferred.
Policy Libraries and Violation Dashboards
Section titled “Policy Libraries and Violation Dashboards”Do not write policies from scratch. Both projects maintain exhaustive libraries covering the Pod Security Standards (Restricted/Baseline).
- Gatekeeper Library: Deploy the
library/pod-security-policydirectory. - Kyverno Policies: Install via Helm chart
kyverno-policies.
Exemption Workflows
Section titled “Exemption Workflows”In production, you will encounter vendor helm charts that violate strict policies (e.g., running as root). You must design deterministic exemptions rather than modifying the core policy logic.
Best Practice: Label-based exemptions via MatchConditions
Starting in Kubernetes v1.28, MatchConditions natively filter requests at the API server level before sending them to the webhook, significantly reducing webhook latency and failure-open risks.
apiVersion: admissionregistration.k8s.io/v1kind: ValidatingWebhookConfigurationmetadata: name: gatekeeper-validating-webhook-configurationwebhooks: - name: validation.gatekeeper.sh matchConditions: - name: exclude-exempt-namespaces expression: "request.namespace != 'kube-system' && request.namespace != 'monitoring'"If using older clusters or tool-specific exclusions, use namespaceSelector or specific exempt CRDs. Never hardcode exemptions inside Rego or Kyverno rule blocks; abstract them to a ConfigMap or custom resource that can be audited independently.
Policy CI/CD and Testing
Section titled “Policy CI/CD and Testing”A broken policy can bring down a cluster. Policies must be treated as application code: versioned, linted, and tested against a suite of valid and invalid Kubernetes manifests in CI.
Testing Gatekeeper Policies
Section titled “Testing Gatekeeper Policies”Use gator, the CLI tool for Gatekeeper. You define a suite of tests providing the ConstraintTemplate, the Constraint, and dummy Kubernetes manifests.
# Verify a policy suite locallygator test --image-pull-policy=Always ./policies/Testing Kyverno Policies
Section titled “Testing Kyverno Policies”Kyverno provides a standalone CLI to run policies against local manifests without a cluster.
name: require-labels-testpolicies: - require-labels.yamlresources: - bad-pod.yaml - good-pod.yamlresults: - policy: require-labels rule: check-for-labels resource: bad-pod kind: Pod result: fail - policy: require-labels rule: check-for-labels resource: good-pod kind: Pod result: passExecute in CI:
kyverno test .Runtime Security: Falco vs Tetragon
Section titled “Runtime Security: Falco vs Tetragon”Admission controllers only evaluate resources at creation or update. They cannot detect an attacker who exploits a vulnerability in a running application to gain a shell, or malware that executes unauthorized system calls.
Runtime security monitors the underlying Linux kernel to detect and prevent anomalous behavior in real-time. On bare-metal deployments, this is critical, as node compromise grants direct access to physical hardware networks.
Falco (a CNCF graduated project) hooks into the Linux kernel (via kernel module or eBPF probe) to parse system calls. It evaluates these syscalls against a rules engine to detect threats.
Falco Architecture:
- Event Source: eBPF probe captures syscalls (
execve,open,socket). - Rules Engine: Compares events against
falco_rules.yaml. - Outputs: Sends alerts to stdout, file, gRPC, or external tools (Slack, PagerDuty, Falco Talon for response).
Example Falco Rule:
- rule: Terminal shell in container desc: A shell was used as the entrypoint/exec point into a container with an attached terminal. condition: > spawned_process and container and shell_procs and proc.tty != 0 and container_entrypoint output: > A shell was spawned in a container with an attached terminal (user=%user.name user_loginuid=%user.loginuid %container.info shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository) priority: NOTICE tags: [container, shell, mitre_execution]Tetragon (Cilium)
Section titled “Tetragon (Cilium)”Tetragon (part of the Cilium family) is a pure eBPF-based runtime security enforcement and observability tool.
Unlike Falco, which relies on asynchronous ring buffers to analyze syscalls (meaning the malicious action often completes before the alert fires), Tetragon hooks deep into kernel functions and can block the system call synchronously.
Example Tetragon TracingPolicy (Enforcement):
apiVersion: cilium.io/v1alpha1kind: TracingPolicymetadata: name: block-shell-in-podspec: kprobes: - call: "sys_execve" syscall: true args: - index: 0 type: "string" selectors: - matchArgs: - index: 0 operator: "Equal" values: - "/bin/bash" - "/bin/sh" matchActions: - action: SigkillComparison: Runtime Engines
Section titled “Comparison: Runtime Engines”| Capability | Falco | Tetragon |
|---|---|---|
| Primary Paradigm | Audit and Alert | Enforce and Block |
| Instrumentation | Kernel Module or eBPF | eBPF only |
| Ecosystem Maturity | Very High (Standardized rulesets) | Growing rapidly (Tied to Cilium ecosystem) |
| Overhead | Moderate (Moves data to userspace) | Low (Filters/blocks in kernel space) |
Hands-on Lab
Section titled “Hands-on Lab”In this lab, we will deploy Kyverno, implement a strict policy, test an exemption, and deploy Falco to monitor a runtime violation.
Prerequisites:
kind(Kubernetes IN Docker) v0.20+kubectlv1.32+helmv3.14+
Step 1: Bootstrap the Cluster
Section titled “Step 1: Bootstrap the Cluster”kind create cluster --name policy-labkubectl cluster-infoStep 2: Install Kyverno
Section titled “Step 2: Install Kyverno”Deploy Kyverno using the official Helm chart. We configure it with high availability disabled for the lab environment.
helm repo add kyverno https://kyverno.github.io/kyverno/helm repo updatehelm install kyverno kyverno/kyverno \ -n kyverno --create-namespace \ --set admissionController.replicas=1 \ --set backgroundController.replicas=1 \ --set cleanupController.replicas=1 \ --set reportsController.replicas=1Wait for the webhooks to become active:
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=admission-controller -n kyverno --timeout=90sStep 3: Apply the “Disallow Latest Tag” Policy
Section titled “Step 3: Apply the “Disallow Latest Tag” Policy”Create a policy that prevents any pod from using the :latest image tag.
cat <<EOF | kubectl apply -f -apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: disallow-latest-tagspec: validationFailureAction: Enforce background: false rules: - name: require-image-tag match: any: - resources: kinds: - Pod validate: message: "Using 'latest' image tag is prohibited." pattern: spec: containers: - image: "!*:latest"EOFStep 4: Verify Enforcement
Section titled “Step 4: Verify Enforcement”Attempt to run an NGINX pod with the latest tag.
kubectl run test-nginx --image=nginx:latestExpected Output:
Error from server: admission webhook "validate.kyverno.svc-fail" denied the request:
resource Pod/default/test-nginx was blocked due to the following policies
disallow-latest-tag: require-image-tag: "validation error: Using 'latest' image tag is prohibited. rule require-image-tag failed at path /spec/containers/0/image/"Now try with a specific tag:
kubectl run test-nginx-good --image=nginx:1.25Expected Output: pod/test-nginx-good created
Step 5: Test Exemption Workflow
Section titled “Step 5: Test Exemption Workflow”We will exempt a specific namespace legacy-apps from this policy. Update the policy:
cat <<EOF | kubectl apply -f -apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: disallow-latest-tagspec: validationFailureAction: Enforce background: false rules: - name: require-image-tag match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - legacy-apps validate: message: "Using 'latest' image tag is prohibited." pattern: spec: containers: - image: "!*:latest"EOFTest the exemption:
kubectl create namespace legacy-appskubectl run legacy-nginx --image=nginx:latest -n legacy-appsExpected Output: pod/legacy-nginx created
Step 6: Install Falco
Section titled “Step 6: Install Falco”Install Falco via Helm. We use the eBPF probe configuration. Note: kind shares the host kernel, so Falco will monitor events across the Docker daemon running the kind nodes.
helm repo add falcosecurity https://falcosecurity.github.io/chartshelm repo updatehelm install falco falcosecurity/falco \ -n falco --create-namespace \ --set driver.kind=ebpf \ --set tty=trueWait for Falco to deploy:
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=falco -n falco --timeout=120sStep 7: Trigger and View a Runtime Alert
Section titled “Step 7: Trigger and View a Runtime Alert”Exec into our running test-nginx-good pod and read a sensitive file. This violates the default “Read sensitive file untrusted” Falco rule.
kubectl exec -it test-nginx-good -- cat /etc/shadowOutput will likely be a permission denied error from the container OS itself, but the syscall openat was still attempted.
Check the Falco logs to see the detection:
kubectl logs -l app.kubernetes.io/name=falco -n falco | grep "shadow"Expected Output:
{"output":"Warning Sensitive file opened for reading by non-trusted program (file=/etc/shadow gparent=<NA> ggparent=<NA> gggparent=<NA> fd.name=/etc/shadow...Lab Cleanup
Section titled “Lab Cleanup”kind delete cluster --name policy-labTroubleshooting the Lab
Section titled “Troubleshooting the Lab”- Webhook Timeout creating Pods: If Kyverno is installed but pods hang during creation, the Kyverno admission controller pods might be crashing or OOMing. Check
kubectl get pods -n kyverno. - Falco eBPF Driver Fails to Load: If the Falco pod is in
CrashLoopBackOffwith driver errors, your host OS (Mac/Windows running Docker Desktop) might have an incompatible kernel for eBPF mapping. Switch--set driver.kind=modern_ebpfor fallback to the generic module if testing on a full Linux VM.
Practitioner Gotchas
Section titled “Practitioner Gotchas”- Failing Open vs Failing Closed: Setting a validating webhook’s
failurePolicytoFailensures absolute security but guarantees cluster outages if the policy engine becomes unavailable (e.g., node rotation, CNI failure). Best practice: UseIgnorebut alert heavily on webhook reachability metrics. If compliance mandatesFail, ensure the policy engine runs with high availability, PodDisruptionBudgets, and strict anti-affinity rules. - Resource Exhaustion from Audit Scans: Both Gatekeeper and Kyverno periodically fetch all objects matching a policy to check for violations. On clusters with large object counts (e.g., heavily sharded databases generating thousands of Secrets), this causes memory spikes and OOMKills. Tune the
auditIntervaland restrict memory limits. - Regex ReDoS in Rego: OPA Rego regex parsing can fall victim to Regular Expression Denial of Service (ReDoS). A poorly written regex evaluating a 50KB ConfigMap string will lock the single evaluation thread, causing webhook timeouts and API server backpressure. Always bound regex logic and benchmark Rego policies locally using
opa bench. - eBPF Overhead on Heavy Workloads: Tools like Falco attach to system calls. If you write a custom rule monitoring
readorwritesyscalls without extremely tight pre-filtering, a database pod doing heavy I/O will overwhelm the eBPF ring buffer, leading to dropped events and measurable node CPU spikes. Monitorfalco_drop_countmetrics closely.
1. A cluster upgrade has stalled because new kube-system pods cannot be scheduled. The API server logs indicate timeouts reaching the Gatekeeper validating webhook. What is the most robust architectural fix to prevent this?
A) Scale Gatekeeper to 5 replicas.
B) Modify the Gatekeeper ValidatingWebhookConfiguration to include a namespaceSelector that ignores the kube-system namespace.
C) Change the Kubernetes API server configuration to bypass webhooks during upgrades.
D) Convert all Gatekeeper policies to Kyverno policies.
Answer
B2. In OPA Gatekeeper, what is the correct relationship between a ConstraintTemplate and a Constraint?
A) The Constraint contains the Rego code, and the ConstraintTemplate defines the parameters.
B) The ConstraintTemplate contains the Rego code and parameter schema, while the Constraint instantiates the policy against specific Kubernetes resources.
C) The ConstraintTemplate generates Kyverno YAML, which is executed by the Constraint.
D) The ConstraintTemplate dictates Mutating policies, while the Constraint dictates Validating policies.
Answer
B3. You require an admission controller that can automatically inject a sidecar container into any Pod annotated with sidecar.io/inject: "true". Which tool provides the most native and straightforward approach for this mutation?
A) OPA Gatekeeper (using Rego)
B) Falco
C) Kyverno (using JSONPatch overlays)
D) Tetragon
Answer
C4. A critical zero-day vulnerability requires you to block the execution of a specific binary (/usr/bin/vuln) inside all running containers immediately, without restarting the containers. Which technology is capable of this?
A) Validating Admission Webhooks
B) Kyverno Enforce Policies
C) Falco alerting rules
D) Tetragon TracingPolicies with Sigkill actions
Answer
D5. You need to implement an exemption for a legacy application that violates your “No Root” policy. According to best practices, how should this exemption be handled?
A) Hardcode the pod name into the Rego/YAML policy logic.
B) Use MatchConditions in the webhook configuration or exclude blocks to bypass the namespace/labels before evaluation.
C) Disable the “No Root” policy cluster-wide until the application is fixed.
D) Assign the application deployment to the kube-system namespace.