Module 4.7: Kyverno
Toolkit Track | Complexity:
[MEDIUM]| Time: 45-55 minutes
Prerequisites: Security Principles Foundations, Module 4.2: OPA & Gatekeeper recommended, and a working understanding of Kubernetes admission controllers.
Learning Outcomes
Section titled “Learning Outcomes”After completing this module, you will be able to:
- Design Kyverno validation, mutation, generation, and image verification policies that match real platform security requirements without surprising application teams.
- Debug policy behavior by reading admission errors, PolicyReport resources, match and exclude blocks, and Kyverno CLI output.
- Evaluate when Kyverno is a better fit than OPA Gatekeeper, and when Gatekeeper or another policy engine remains the stronger choice.
- Implement a safe rollout path that starts in audit mode, measures existing violations, and moves selected rules to enforcement with clear exceptions.
- Compare policy choices against operational risks such as webhook availability, namespace blast radius, background scans, and multi-team ownership.
Why This Module Matters
Section titled “Why This Module Matters”A platform team at a payments company inherited a cluster where every team could deploy directly from its CI pipeline. The system looked healthy during ordinary releases, but the guardrails were mostly social agreements: “use non-root containers,” “do not deploy latest,” and “remember resource limits.” Those rules lived in docs, pull request comments, and incident retrospectives, which meant they failed exactly when the delivery pressure increased.
One Friday afternoon, a service owner shipped a container running as root, with no CPU limit, and with a floating image tag from a public registry. The change passed CI because the YAML was syntactically valid, and it passed deployment because the API server had no policy saying the workload was unsafe. By Monday morning, the team was investigating noisy neighbors, unexplained cost growth, and an uncomfortable security review that asked why the cluster accepted a workload the organization already knew was forbidden.
Kyverno turns those platform agreements into Kubernetes-native policy. It reads and writes ordinary Kubernetes resources, uses YAML instead of a separate policy language, and can validate, mutate, generate, and verify resources at the point where they enter the cluster. The important lesson is not that Kyverno is “easy”; the important lesson is that a readable policy engine gives a platform team a practical path from documented intent to enforceable behavior.
This module teaches Kyverno as a production control system, not as a command reference. You will start with the admission path, then write policies that solve realistic problems, then learn how to roll them out without breaking existing workloads. By the end, you should be able to reason about both the YAML and the operational consequences of installing it.
1. Admission Control as a Platform Contract
Section titled “1. Admission Control as a Platform Contract”Kyverno sits on the Kubernetes admission path, which means it evaluates resources after authentication and authorization but before the object is stored in etcd. That location matters because a policy engine can stop unsafe resources before controllers schedule them, mutate resources before other controllers observe them, and create companion resources when a trigger appears. When you use Kyverno well, the API server becomes a contract boundary between application teams and the platform.
The contract should be explicit and observable. A rule that denies every Pod without telling the developer what to fix creates friction, even if the security goal is valid. A rule that audits first, reports violations clearly, and then moves to enforcement after teams have remediated existing workloads is easier to defend during a production incident.
Kyverno is implemented through admission webhooks plus background controllers. The validating webhook can allow, deny, or audit a request. The mutating webhook can patch the resource before it is persisted. The generate controller can create dependent resources such as NetworkPolicies, RoleBindings, or ConfigMaps after a matching trigger resource is created.
KYVERNO ADMISSION AND BACKGROUND FLOW===============================================================
Developer or CI kubectl apply / Helm / GitOps | v +------------------+ AuthN/AuthZ +------------------+ | API Server |------------------------->| Kubernetes RBAC | +------------------+ +------------------+ | | AdmissionReview v +----------------------------------------------------------------+ | Kyverno | | | | +----------------+ +----------------+ +----------------+ | | | Validate | | Mutate | | Generate | | | | allow/deny | | patch object | | create related | | | | audit reports | | before store | | resources | | | +----------------+ +----------------+ +----------------+ | | | | ClusterPolicy / Policy resources drive each decision. | +----------------------------------------------------------------+ | | allow, deny, mutate, or reconcile later v +------------------+ Stored Object +------------------+ | etcd |<-------------------------| Controllers | +------------------+ +------------------+The first design decision is not which YAML field to type. The first decision is whether the rule should block an admission request, modify it, create something related, or verify supply-chain evidence. Those actions look similar in a policy file, but they create different responsibilities for the platform team.
| Kyverno action | Production purpose | Typical example | Operational risk to manage |
|---|---|---|---|
| Validate | Reject or audit resources that violate a rule | Deny Pods without resource limits | A broad rule can block emergency releases if rolled out too quickly |
| Mutate | Patch resources before they are stored | Add default labels or security settings | Hidden mutation can surprise teams if it changes ownership-sensitive fields |
| Generate | Create related resources from a trigger | Create a default-deny NetworkPolicy for each new namespace | Generated resources need clear ownership and synchronization choices |
| VerifyImages | Check signatures and attestations for container images | Require signed images from a trusted issuer | Signature policies can block builds when the signing pipeline is unavailable |
A good platform policy has three layers of clarity. It states the intent in the policy name and message, scopes the rule narrowly enough that the blast radius is known, and exposes results through PolicyReport resources or CI output. If one of those layers is missing, the rule may still work technically while failing as a platform product.
Pause and predict: A team creates a Pod that violates a Kyverno validate rule configured with validationFailureAction: Audit. Will the API server store the Pod, and where should the platform team look afterward to see that it violated policy?
The answer is that the Pod is stored because audit mode does not block admission, and the violation appears in PolicyReport or ClusterPolicyReport resources depending on scope. This is why audit mode is useful during rollout: it lets you discover the current state before deciding what to enforce. It is also why audit mode is not a complete control for a high-risk rule; it observes violations, but it does not prevent them.
Kyverno has two policy resource scopes, and the scope should match the ownership model. A ClusterPolicy applies across the cluster and is normally owned by the platform or security team. A namespaced Policy applies only inside its namespace and can be useful for team-specific requirements that should not become global rules.
| Resource kind | Scope | Best use | Ownership signal |
|---|---|---|---|
| ClusterPolicy | Cluster-wide | Shared baselines such as non-root containers and disallowed registries | Platform team owns the default behavior |
| Policy | One namespace | Team-specific rules such as required labels for a single product area | Namespace owner can adapt rules inside their boundary |
The mistake to avoid is treating scope as a convenience setting. A ClusterPolicy with weak exceptions can accidentally govern system namespaces, observability workloads, and third-party controllers that were never part of the original requirement. A namespaced Policy can be safer during experimentation, but it will not protect the cluster if critical namespaces are forgotten.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-platform-labelsspec: validationFailureAction: Audit background: true rules: - name: require-app-and-team match: any: - resources: kinds: - Pod validate: message: "Pods must include app and team labels so ownership is visible during incidents." pattern: metadata: labels: app: "?*" team: "?*"This first policy is intentionally gentle. It audits instead of enforcing, and its message explains the operational reason for the labels. The pattern checks that both labels exist and are not empty, which gives responders enough metadata to group workload issues by owning team.
Use kubectl first, then you may use the k alias if your shell defines it. The module examples spell out kubectl in setup commands because clarity matters when learners copy commands into a fresh environment.
kubectl apply -f require-platform-labels.yamlkubectl run unlabeled-demo --image=nginx:1.27kubectl get clusterpolicyreport -o wideThe expected behavior in audit mode is that the Pod is created and the report records a failure. If you change validationFailureAction to Enforce, the same Pod should be denied at admission with the message from the policy. That small field change is a large operational change, which is why rollout discipline matters.
2. Writing Validation Policies That Explain the Rule
Section titled “2. Writing Validation Policies That Explain the Rule”Validation is the most familiar Kyverno capability because it maps directly to platform guardrails. A request comes in, Kyverno compares the object with the rule, and the request is allowed, denied, or recorded as a violation. The quality difference between a useful validation policy and an irritating one is usually the message, the scope, and the rollout path.
Start validation policies with one concrete risk. “All Pods must be secure” is too broad to express or debug cleanly. “Application Pods must not use floating image tags” is specific enough to test, explain, and tune. The tighter rule also helps you decide what exceptions are legitimate.
A strong validation policy includes an actionable failure message. Developers should know whether they need to add a field, change an image, use a different namespace, or ask for an exception. A message that only says “policy violation” transfers the diagnosis work to the person trying to ship.
The following worked example blocks floating latest image tags for ordinary application Pods while excluding system namespaces. It uses enforcement because the risk is easy to understand and the remediation is usually straightforward: pin the image to an immutable version or digest.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: disallow-floating-image-tagsspec: validationFailureAction: Enforce background: true rules: - name: require-explicit-container-tags match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kube-public - kyverno validate: message: "Floating image tags are not allowed. Use a version tag or digest so rollbacks and audits are deterministic." pattern: spec: containers: - image: "!*:latest"The pattern block describes what must be true. The image expression !*:latest means each matched container image must not end with the forbidden tag. This is not a complete supply-chain strategy, but it removes a common source of nondeterministic deploys and makes later image verification policies easier to reason about.
kubectl apply -f disallow-floating-image-tags.yaml
kubectl run bad-tag --image=nginx:latest --labels="app=demo,team=platform"
kubectl run pinned-tag --image=nginx:1.27 --labels="app=demo,team=platform"The first Pod should be denied and the second should be accepted. If both are accepted, check whether the policy was applied, whether Kyverno pods are ready, and whether the resource kind and namespace match the rule. If both are denied, inspect the failure message and confirm whether another ClusterPolicy is also evaluating the Pod.
A common validation problem appears when a policy checks only containers and forgets initContainers or ephemeralContainers. Attackers and ordinary mistakes do not care which array holds the risky image or security context. A production policy should cover the workload shapes your cluster actually allows.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-resource-limitsspec: validationFailureAction: Audit background: true rules: - name: require-container-limits match: any: - resources: kinds: - Pod validate: message: "Every application container must define CPU and memory limits so one workload cannot starve the node." pattern: spec: containers: - resources: limits: cpu: "?*" memory: "?*" =(initContainers): - resources: limits: cpu: "?*" memory: "?*"The =(initContainers) anchor means the pattern applies if initContainers exists. Without that conditional anchor, the rule could require every Pod to have init containers, which is not the intent. This detail is a good example of why Kyverno is approachable but still deserves careful testing.
Pause and predict: A Pod has two regular containers. One has both CPU and memory limits, and the other has only a memory limit. Under the policy above, should the Pod pass or fail, and what should the failure message help the developer fix?
The Pod should fail because every matched container must satisfy the pattern. The message should point the developer toward adding the missing CPU limit, not toward learning Kyverno syntax. In a real rollout, you would also inspect PolicyReports to see whether a small number of shared Helm charts are responsible for many violations.
Some rules need richer comparisons than a simple pattern provides. Kyverno supports deny conditions with variables and operators, which are useful when the decision depends on request context or field comparisons. Use deny rules carefully because they can become harder to read than patterns when conditions pile up.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: restrict-loadbalancer-servicesspec: validationFailureAction: Enforce background: false rules: - name: platform-team-approves-loadbalancers match: any: - resources: kinds: - Service validate: message: "Only platform-owned namespaces may create LoadBalancer Services; request an ingress or an approved exception." deny: conditions: all: - key: "{{ request.object.spec.type || '' }}" operator: Equals value: LoadBalancer - key: "{{ request.namespace || '' }}" operator: NotIn value: - platform-ingress - edge-servicesThis policy is intentionally set with background: false because it relies on admission request context. Background scans evaluate existing resources, but not every request variable is meaningful outside an admission request. When you design policies, decide whether you need to scan historical objects or only control future writes.
A senior platform engineer reads a Kyverno validation rule as an operational promise. The policy name tells what it protects, match and exclude blocks tell where it applies, the validation block tells how it decides, and the failure message tells how humans recover. If those four parts disagree, the policy will cause confusion even when the YAML is valid.
3. Mutating Policies Without Hiding Ownership
Section titled “3. Mutating Policies Without Hiding Ownership”Mutation changes a resource before it is stored. This is powerful because it can make safe defaults automatic, but it is risky because application teams may not realize the stored object differs from the submitted manifest. Use mutation for defaults that are boring, documented, and safe to apply consistently.
A good mutation policy does not become a secret deployment system. Adding missing labels, setting default annotations, or inserting a known image pull secret may be reasonable. Injecting complex sidecars into arbitrary workloads can be appropriate in a service mesh or logging platform, but it requires versioning, observability, and an escape hatch.
Kyverno supports strategic merge patches, JSON patches, and foreach patterns for mutation. The most common beginner-friendly approach is patchStrategicMerge, which looks like the shape of the resource you want to modify. The +() notation means “add this field only if it is missing,” which protects explicit choices made by the workload owner.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: add-default-platform-labelsspec: rules: - name: add-managed-labels match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kyverno mutate: patchStrategicMerge: metadata: labels: +(managed-by): kyverno +(cost-center): "shared"This policy is a safe first mutation because it adds metadata rather than changing runtime behavior. If a team already sets cost-center, Kyverno leaves it alone. That makes the policy a defaulting mechanism rather than an override mechanism, which is usually easier to justify.
kubectl apply -f add-default-platform-labels.yaml
kubectl run mutate-demo --image=nginx:1.27 --labels="app=web,team=platform"
kubectl get pod mutate-demo -o jsonpath='{.metadata.labels.managed-by}{"\n"}'kubectl get pod mutate-demo -o jsonpath='{.metadata.labels.cost-center}{"\n"}'The expected output is kyverno and shared unless the submitted manifest already supplied a different cost-center. If the labels are missing, inspect the mutating webhook configuration and confirm that the Pod was created after the policy existed. Mutation happens during admission; it does not retroactively patch objects that were already stored.
The next mutation example is more sensitive because it changes workload composition. The policy injects a logging sidecar only when the workload opts in with an annotation. Opt-in scoping gives application teams a clear signal and reduces the chance that the platform team changes the behavior of workloads that were never tested with the sidecar.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: inject-logging-sidecarspec: rules: - name: add-fluent-bit-when-requested match: any: - resources: kinds: - Pod annotations: observability.kubedojo.io/inject-logger: "true" mutate: patchStrategicMerge: spec: containers: - name: platform-log-forwarder image: fluent/fluent-bit:3.0 resources: requests: cpu: "25m" memory: "32Mi" limits: cpu: "50m" memory: "64Mi"This policy should trigger a platform design conversation before it reaches production. Who owns sidecar upgrades? How will teams know the image version changed? What happens when the sidecar crashes? Can a workload opt out during an incident? Kyverno can perform the mutation, but the platform team still owns the product behavior created by that mutation.
What would happen if: A GitOps controller continuously applies a manifest without the managed-by label, while Kyverno mutates the live object to add that label. Would you expect drift noise, and how could you reduce it?
The answer depends on the GitOps tool and diff configuration, but drift noise is possible when the live object contains fields that are absent from Git. You can reduce the problem by committing the defaulted fields, configuring diff ignores for known mutation fields, or limiting mutation to fields that your deployment tooling already treats as server-side defaults. The key is to test policy behavior in the same delivery path teams actually use.
A mutation policy should be documented as part of the platform API. Developers need to know which fields are defaulted, which fields are respected when explicitly set, and which annotations opt into heavier behavior. Without that documentation, mutation can feel like the cluster is rewriting manifests behind their backs.
4. Generate Policies for Namespace Guardrails
Section titled “4. Generate Policies for Namespace Guardrails”Generate policies create related Kubernetes resources when another resource appears. This capability is one of Kyverno’s major differences from Gatekeeper, and it is especially useful for namespace onboarding. A new namespace can automatically receive a default NetworkPolicy, a ResourceQuota, a LimitRange, or a RoleBinding that matches your platform baseline.
Generation is not validation. A generate rule does not deny a namespace because it lacks a NetworkPolicy; it creates the NetworkPolicy when the namespace is created. That difference matters because generated resources need ownership rules, synchronization behavior, and naming conventions that fit your operational model.
The following policy creates a default-deny NetworkPolicy whenever a new ordinary namespace appears. It excludes system and Kyverno namespaces so the policy does not interfere with cluster infrastructure. It also sets synchronize: true, which tells Kyverno to keep the generated resource aligned with the policy over time.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: generate-default-deny-networkpolicyspec: rules: - name: create-default-deny-for-new-namespaces match: any: - resources: kinds: - Namespace exclude: any: - resources: names: - kube-system - kube-public - kube-node-lease - kyverno generate: synchronize: true apiVersion: networking.k8s.io/v1 kind: NetworkPolicy name: default-deny-all namespace: "{{ request.object.metadata.name }}" data: spec: podSelector: {} policyTypes: - Ingress - EgressThis policy is a strong platform default because network isolation is easy to forget during namespace creation. However, it should be paired with onboarding documentation that tells teams how to add allow rules for required traffic. A default-deny policy without a path to expected connectivity can turn security into a deployment blocker.
kubectl apply -f generate-default-deny-networkpolicy.yaml
kubectl create namespace payments-dev
kubectl get networkpolicy -n payments-dev default-deny-all -o yamlThe expected result is a NetworkPolicy named default-deny-all in the new namespace. If it does not appear immediately, wait briefly and check Kyverno controller logs because generation is reconciled by a controller after admission. If it appears and then is removed by another controller, you need to inspect ownership and synchronization behavior.
GENERATE POLICY OWNERSHIP MODEL===============================================================
Namespace created: payments-dev | v +-----------------------------+ | Kyverno generate rule | | name: default-deny-all | | synchronize: true | +-----------------------------+ | v +-----------------------------+ | payments-dev namespace | | | | NetworkPolicy | | default-deny-all | | | | Team-owned allow policies | | allow-dns | | allow-ingress-controller | +-----------------------------+The generated default should be stable, small, and easy to recognize. Team-owned allow policies should be separate resources so teams can manage their own connectivity without editing the platform baseline. This separation makes audits clearer because the generated resource states the default posture while team resources state intentional exceptions.
Generate policies are also useful for ResourceQuota and LimitRange defaults. Those resources are not security controls in the same way as NetworkPolicy, but they protect cluster reliability by preventing accidental resource exhaustion. The same ownership pattern applies: platform defaults should be generated predictably, while team-specific exceptions should be explicit.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: generate-namespace-resource-defaultsspec: rules: - name: create-default-quota match: any: - resources: kinds: - Namespace exclude: any: - resources: names: - kube-system - kube-public - kube-node-lease - kyverno generate: synchronize: true apiVersion: v1 kind: ResourceQuota name: namespace-default-quota namespace: "{{ request.object.metadata.name }}" data: spec: hard: requests.cpu: "4" requests.memory: "8Gi" limits.cpu: "8" limits.memory: "16Gi" pods: "40"Be careful with synchronized generation when teams need to customize generated objects. If synchronize: true is used, Kyverno may restore the generated object to the policy-defined state after someone changes it. That is exactly what you want for a security baseline, but it may frustrate teams if the generated object is supposed to be a starting template rather than a managed control.
Pause and decide: Your organization wants every new namespace to start with a ResourceQuota, but product teams need different quota sizes after onboarding. Should the generated quota use synchronize: true, or should the platform use another pattern?
A reasonable answer is to avoid synchronized generation for a quota that teams are expected to customize, or to generate a conservative default and require teams to request a managed override through a clear workflow. If the platform wants quotas to remain centrally managed, synchronization is appropriate, but then the exception process must be part of the design. The right answer depends on ownership, not only on Kyverno syntax.
Generation policies are most successful when they make the secure path the default path. A namespace should not be born empty and then wait for a human to remember ten setup steps. Kyverno lets the platform team encode those steps as reconciliation behavior, but the generated resources still need names, labels, and documentation that humans can understand during incidents.
5. Image Verification and Supply Chain Controls
Section titled “5. Image Verification and Supply Chain Controls”Container image policy has two layers. Basic validation checks the image string, such as whether it uses a forbidden tag or registry. Image verification checks evidence about the image, such as whether it was signed by a trusted identity or has an attestation matching your build requirements.
Kyverno’s verifyImages rules help enforce supply-chain policy at admission time. The goal is not to make Kubernetes a full artifact security platform. The goal is to ensure that the cluster only accepts images that came through the approved build and signing process.
A simple progression works best. First block obviously unsafe image references, such as latest or unknown registries. Then require images from approved registries. Then require signatures for sensitive namespaces. Finally, add attestation checks when the build pipeline reliably produces the required metadata.
The following policy sketches a registry restriction. It is not a signature policy yet, but it is a useful stepping stone because it reduces the number of places from which workloads can pull code.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: restrict-image-registriesspec: validationFailureAction: Enforce background: true rules: - name: require-approved-registries match: any: - resources: kinds: - Pod validate: message: "Images must come from approved registries so vulnerability scanning and provenance controls apply." pattern: spec: containers: - image: "registry.kubedojo.internal/* | ghcr.io/kube-dojo/*"Registry restrictions are valuable, but they do not prove that the image was built by your pipeline. An attacker who can push to an approved registry could still publish an image with a valid-looking name. Signature verification closes part of that gap by requiring cryptographic evidence from a trusted signer.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: verify-platform-imagesspec: validationFailureAction: Enforce webhookTimeoutSeconds: 10 rules: - name: require-signed-platform-images match: any: - resources: kinds: - Pod namespaces: - production - payments verifyImages: - imageReferences: - "registry.kubedojo.internal/platform/*" attestors: - entries: - keyless: subject: "https://github.com/kube-dojo/platform/.github/workflows/release.yaml@refs/heads/main" issuer: "https://token.actions.githubusercontent.com"Treat this example as a teaching shape, not as a universal copy-paste policy. Real signature verification depends on your signing tool, identity provider, registry, and Kyverno version. The important design point is that image verification policies encode trust in the build identity, not just trust in the image name.
Verification policies can fail closed during registry, transparency log, identity provider, or network issues depending on configuration. That may be the correct security posture for production namespaces, but it should be a deliberate decision. For lower-risk environments, teams often start with audit mode or narrower matching until the signing pipeline is stable.
| Supply-chain control | What it proves | What it does not prove | Rollout advice |
|---|---|---|---|
Disallow latest | Deploys refer to a stable tag shape | The tag cannot be moved in the registry | Enforce early because remediation is simple |
| Restrict registries | Images come from known locations | The image was built by an approved workflow | Audit current usage before enforcing |
| Require signatures | A trusted identity signed the image | The image is free of vulnerabilities | Enforce by namespace after signing is reliable |
| Require attestations | Build metadata or SBOM evidence exists | The application is logically secure | Start with critical services and clear exceptions |
What would happen if: You enforce signed images in the production namespace before every production build job signs images consistently. What failure pattern would application teams see, and what should the platform team measure before enforcement?
Teams would see admission denials for images that may otherwise be valid application releases. Before enforcement, the platform team should measure unsigned image frequency, identify which pipelines fail to sign, and verify that emergency patch workflows can produce signed artifacts. The policy is only as usable as the supply chain behind it.
Image verification is where Kyverno stops being only a YAML-friendly admission tool and becomes part of the delivery trust boundary. That boundary includes CI identities, registry permissions, artifact retention, signing keys or keyless identities, and incident response. Kyverno enforces the final admission decision, but the evidence must be produced earlier in the software delivery system.
6. Rollout, Reports, CLI Testing, and Tool Choice
Section titled “6. Rollout, Reports, CLI Testing, and Tool Choice”The safest Kyverno rollout starts outside enforcement. Install Kyverno, deploy a small number of audit policies, inspect reports, fix the most common violations, then move selected rules to enforcement. This path gives the platform team evidence and gives application teams time to adapt.
PolicyReports are the feedback loop for audit mode and background scans. Namespaced PolicyReport resources show violations for namespaced resources, while ClusterPolicyReport resources cover cluster-scoped results. In production, many teams connect these reports to dashboards, alerts, or periodic review jobs rather than expecting humans to run kubectl manually.
kubectl get clusterpolicyreport
kubectl get policyreport -A
kubectl get policyreport -n default -o yamlWhen a report shows many failures, do not immediately assume teams ignored guidance. Look for shared templates, stale Helm chart values, generated manifests, or defaults that your platform can improve centrally. Policy data is most useful when it helps you find leverage points instead of only assigning blame.
The Kyverno CLI lets you test policies against manifests before they reach the cluster. This is useful in CI because a pull request can fail with a policy explanation rather than waiting for a deployment failure. It is also useful for policy authors because it shortens the edit-test loop.
brew install kyverno
kyverno apply policies/ --resource k8s-manifests/ --detailed-results
kyverno test .A minimal repository layout keeps policies, test resources, and expected results together. The exact shape can vary, but the principle is stable: every policy that blocks production should have at least one passing resource and one failing resource in version control. Without tests, policy changes become risky refactors.
KYVERNO POLICY REPOSITORY SHAPE===============================================================
platform-policies/ | +-- policies/ | +-- disallow-floating-image-tags.yaml | +-- require-resource-limits.yaml | +-- generate-default-deny-networkpolicy.yaml | +-- tests/ | +-- good-pod.yaml | +-- bad-latest-tag-pod.yaml | +-- namespace-trigger.yaml | +-- kyverno-test.yaml +-- README.mdA GitHub Actions workflow can run the CLI against manifests before merge. This does not replace admission control, because clusters still need to reject unsafe changes from every path. It does reduce feedback time and catches problems before they interrupt a deployment.
name: Kyverno Policy Check
on: pull_request: push: branches: - main
jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Install Kyverno CLI run: | curl -LO https://github.com/kyverno/kyverno/releases/latest/download/kyverno-cli_linux_amd64.tar.gz tar -xzf kyverno-cli_linux_amd64.tar.gz sudo mv kyverno /usr/local/bin/kyverno
- name: Validate Kubernetes manifests run: kyverno apply policies/ --resource k8s-manifests/ --detailed-resultsKyverno and OPA Gatekeeper overlap, but they are not interchangeable in every situation. Kyverno is often easier for Kubernetes platform teams because policies look like Kubernetes resources and support validation, mutation, generation, and image verification in one model. Gatekeeper is often stronger when teams need highly expressive logic, mature Rego expertise, or policy reuse beyond Kubernetes.
| Decision point | Kyverno is usually stronger when… | Gatekeeper is usually stronger when… | Senior trade-off |
|---|---|---|---|
| Policy authoring | Kubernetes engineers will maintain policies in YAML | A policy team already writes Rego well | Choose the language your maintainers can review correctly |
| Mutation and generation | Defaults and companion resources are core requirements | Validation is the main requirement | Generation can simplify onboarding but creates ownership questions |
| Complex logic | Rules are mostly structural and admission-context based | Rules need advanced data structures or cross-domain logic | Readability matters, but expressiveness may be necessary |
| Tool reach | Kubernetes is the primary enforcement target | Policies must also cover Terraform, Envoy, or services outside Kubernetes | Avoid forcing one tool into a domain it was not built to own |
| Developer experience | Teams need clear YAML examples and fast adoption | Teams can absorb Rego training and centralized policy review | The best control is the one teams can understand under pressure |
Many organizations use both tools. For example, Kyverno can own Kubernetes-native defaults, image verification, and namespace generation, while OPA handles broader policy-as-code across infrastructure repositories. The important architectural choice is to avoid duplicate rules that produce conflicting messages or inconsistent decisions.
A production Kyverno installation also needs operational care. Admission webhooks are part of the write path, so availability, timeouts, and failure policies matter. Policies should be reviewed like application code because a small match change can affect every namespace in the cluster.
SAFE ROLLOUT LADDER===============================================================
1. Draft policy locally | v 2. Test with Kyverno CLI against passing and failing manifests | v 3. Deploy to a non-production cluster in Audit mode | v 4. Review PolicyReports and fix shared templates | v 5. Enforce in one low-risk namespace | v 6. Expand enforcement with documented exceptions | v 7. Monitor reports, admission denials, and webhook healthThis ladder is slower than dropping a ClusterPolicy into production, but it is faster than recovering from a broken deployment path. It also creates artifacts that reviewers can inspect: test cases, audit reports, exception decisions, and measured impact. Senior engineers care about that evidence because policy engines fail socially as often as they fail technically.
Did You Know?
Section titled “Did You Know?”- Kyverno policies are Kubernetes resources: You can manage them with GitOps tools such as Argo CD or Flux, review them with ordinary pull requests, and apply RBAC around who can change cluster-wide policy.
- Generate rules make Kyverno more than a validator: A platform team can create default NetworkPolicies, ResourceQuotas, and other namespace scaffolding automatically when teams onboard.
- Audit mode is a rollout tool, not a security boundary: It records violations so teams can measure impact, but it still allows violating resources to be admitted.
- PolicyReports follow a shared Kubernetes policy reporting model: This makes it easier to build dashboards and workflows around pass, fail, warn, error, and skip results.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Better approach |
|---|---|---|
| Enforcing a new ClusterPolicy immediately across every namespace | Existing workloads can be blocked without warning, and emergency releases may fail for reasons teams have never seen | Start in Audit, review PolicyReports, fix shared templates, then enforce in scoped phases |
| Forgetting system namespace exclusions | Controllers in kube-system, Kyverno itself, or node-related components can be affected by application-focused rules | Add explicit excludes for system namespaces and test platform add-ons before broad enforcement |
| Treating mutation as invisible convenience | Developers may see GitOps drift or runtime behavior that differs from submitted manifests | Document defaults, prefer add-if-missing patches, and test mutation through the real delivery path |
| Using generate rules without deciding ownership | Teams may edit generated resources and then have Kyverno revert them, or they may assume the platform owns every related exception | Decide whether generated resources are managed controls or starting templates, then set synchronize accordingly |
Checking only containers in security rules | Risky settings can remain in initContainers or other supported container arrays | Include conditional patterns for relevant container arrays and test manifests with multiple workload shapes |
| Writing vague denial messages | Developers cannot tell whether to change a field, request an exception, or contact the platform team | Write messages that explain the risk and the concrete remediation path |
| Skipping CLI tests for policies | A small match, exclude, or pattern mistake can affect many namespaces before anyone notices | Keep passing and failing manifests beside each policy and run kyverno apply or kyverno test in CI |
| Choosing Kyverno only because YAML feels easy | YAML policies can still encode subtle logic, broad blast radius, and operational coupling | Evaluate maintainability, expressiveness, ownership, rollout path, and integration needs before standardizing |
Question 1
Section titled “Question 1”Your platform team wants to require app and team labels on Pods because incident responders struggle to find owners. The cluster already has hundreds of unlabeled Pods across many namespaces. How should you roll out the Kyverno policy, and what evidence should you collect before switching to enforcement?
Show Answer
Start with a ClusterPolicy using validationFailureAction: Audit and background: true, then review PolicyReport and ClusterPolicyReport results to identify existing violations. Group failures by namespace, team, and shared deployment template so you can fix common sources instead of treating every workload as a separate exception. After the violation count is low and teams know the remediation path, enforce in a limited set of namespaces before expanding cluster-wide.
Question 2
Section titled “Question 2”A developer reports that their Pod was denied with the message “Floating image tags are not allowed,” even though the image field is registry.kubedojo.internal/api:v1.8.2. What should you check before assuming Kyverno has a bug?
Show Answer
Check whether another container, init container, or generated sidecar uses latest, and inspect the full admission error to identify the rule that denied the request. Then review all matching ClusterPolicies because more than one policy may evaluate the same Pod. Also verify that the submitted manifest is the one that reached the API server, since Helm, Kustomize, or CI templating may have rendered a different image than the developer expected.
Question 3
Section titled “Question 3”A namespace onboarding policy generates a default-deny NetworkPolicy with synchronize: true. A team deletes the generated NetworkPolicy during troubleshooting, and it reappears. Was Kyverno behaving correctly, and how should the platform team handle the team’s need for connectivity?
Show Answer
Kyverno behaved correctly because synchronize: true tells it to reconcile the generated resource. The platform team should keep the generated default-deny policy as the baseline and help the team add separate allow policies for required traffic, such as DNS or ingress-controller access. If teams are expected to customize the generated object itself, then synchronization may be the wrong ownership model.
Question 4
Section titled “Question 4”Your organization wants to inject a logging sidecar into selected workloads. A teammate proposes a Kyverno mutation policy that adds the sidecar to every Pod in every namespace. What risks should you identify, and what safer design would you recommend?
Show Answer
The broad mutation could change runtime behavior, resource consumption, startup time, and failure modes for workloads that were never tested with the sidecar. It could also create GitOps drift and unclear ownership for sidecar upgrades. A safer design is to scope the policy by namespace or annotation, require explicit opt-in at first, document the injected container, set resource requests and limits, and provide an emergency opt-out process.
Question 5
Section titled “Question 5”A production signature verification policy blocks a release because the image is unsigned. The application team says the image came from the approved registry and passed vulnerability scanning. How should you explain the denial and what should you investigate?
Show Answer
Registry approval and vulnerability scanning do not prove that the image was signed by the trusted build identity required by the policy. The denial means the admission evidence did not match the verification rule, not necessarily that the image contains a known vulnerability. Investigate whether the build workflow signed the image, whether the subject and issuer in the policy match the CI identity, whether the image reference points to the signed artifact, and whether emergency release workflows also produce signatures.
Question 6
Section titled “Question 6”A Kyverno policy works when tested against a new Pod creation, but existing violating Pods do not appear in reports. The policy has validationFailureAction: Audit, but the team forgot to set another field. What is likely missing, and why does it matter?
Show Answer
The policy likely lacks background: true, or the rule uses admission-only request context that cannot be evaluated during background scans. Background scanning matters because it evaluates existing resources and reports violations that were already present before the policy was deployed. Without it, audit mode may show only future admission activity, which gives an incomplete picture of production risk.
Question 7
Section titled “Question 7”A platform team is choosing between Kyverno and Gatekeeper for a new policy program. Most rules are Kubernetes-specific defaults, namespace scaffolding, image checks, and simple structural validation. However, the security team already uses Rego for infrastructure policy outside Kubernetes. How would you recommend approaching the decision?
Show Answer
Kyverno is a strong fit for Kubernetes-native defaults, mutation, generation, and image verification because those requirements map directly to its resource model. Gatekeeper or OPA may remain valuable for cross-domain policy where Rego is already established. A pragmatic recommendation is to assign clear ownership boundaries, such as Kyverno for cluster admission defaults and generation, while OPA handles broader infrastructure policy, avoiding duplicate rules that produce conflicting decisions.
Hands-On Exercise
Section titled “Hands-On Exercise”Objective
Section titled “Objective”Install Kyverno in a local Kubernetes cluster, deploy validation and generation policies, test admission behavior, inspect policy reports, and practice a safe audit-to-enforce rollout. The exercise assumes Kubernetes 1.35+ behavior and uses ordinary kubectl commands so you can adapt the flow to Kind, Minikube, or a managed development cluster.
Create a local cluster and install Kyverno with Helm. If you already have a disposable lab cluster, you may skip the cluster creation step, but do not run this exercise against a shared production cluster.
kind create cluster --name kyverno-lab
helm repo add kyverno https://kyverno.github.io/kyverno/helm repo update
helm install kyverno kyverno/kyverno \ --namespace kyverno \ --create-namespace
kubectl wait --for=condition=ready pod \ -l app.kubernetes.io/instance=kyverno \ -n kyverno \ --timeout=180s
kubectl get pods -n kyvernoTask 1: Create an audit-mode label policy
Section titled “Task 1: Create an audit-mode label policy”Create require-platform-labels.yaml and apply it. Keep the policy in audit mode so you can observe behavior before enforcement.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-platform-labelsspec: validationFailureAction: Audit background: true rules: - name: require-app-and-team match: any: - resources: kinds: - Pod validate: message: "Pods must include app and team labels so ownership is visible during incidents." pattern: metadata: labels: app: "?*" team: "?*"kubectl apply -f require-platform-labels.yaml
kubectl run audit-unlabeled --image=nginx:1.27
kubectl get policyreport -Akubectl get clusterpolicyreportThe unlabeled Pod should be admitted because the policy is in audit mode. The important result is the report entry, because that is what tells the platform team how much work enforcement would create. If no report appears immediately, wait briefly and check Kyverno pods before changing the policy.
Task 2: Move the label policy to enforcement and compare behavior
Section titled “Task 2: Move the label policy to enforcement and compare behavior”Edit the policy so validationFailureAction is Enforce, then apply it again. Use one bad Pod and one good Pod so you see both sides of the admission decision.
kubectl apply -f require-platform-labels.yaml
kubectl run enforce-unlabeled --image=nginx:1.27
kubectl run enforce-labeled \ --image=nginx:1.27 \ --labels="app=web,team=platform"The unlabeled Pod should be denied with the policy message, while the labeled Pod should be created. If both fail, inspect the exact error and check whether another policy is involved. If both pass, confirm that the policy is still present and that Kyverno admission webhooks are configured.
Task 3: Add a policy that blocks floating image tags
Section titled “Task 3: Add a policy that blocks floating image tags”Create disallow-floating-image-tags.yaml and apply it. This policy should be enforced because the remediation is simple and the error message is specific.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: disallow-floating-image-tagsspec: validationFailureAction: Enforce background: true rules: - name: require-explicit-container-tags match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kube-public - kyverno validate: message: "Floating image tags are not allowed. Use a version tag or digest so rollbacks and audits are deterministic." pattern: spec: containers: - image: "!*:latest"kubectl apply -f disallow-floating-image-tags.yaml
kubectl run bad-image \ --image=nginx:latest \ --labels="app=web,team=platform"
kubectl run good-image \ --image=nginx:1.27 \ --labels="app=web,team=platform"The bad-image Pod should be denied, and the good-image Pod should be created. Notice that the label policy still applies, which is why the commands include labels. In production, multiple policies combine to define the admission contract.
Task 4: Generate a default-deny NetworkPolicy for new namespaces
Section titled “Task 4: Generate a default-deny NetworkPolicy for new namespaces”Create generate-default-deny-networkpolicy.yaml and apply it. Then create a namespace and confirm that Kyverno adds the NetworkPolicy.
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: generate-default-deny-networkpolicyspec: rules: - name: create-default-deny-for-new-namespaces match: any: - resources: kinds: - Namespace exclude: any: - resources: names: - kube-system - kube-public - kube-node-lease - kyverno generate: synchronize: true apiVersion: networking.k8s.io/v1 kind: NetworkPolicy name: default-deny-all namespace: "{{ request.object.metadata.name }}" data: spec: podSelector: {} policyTypes: - Ingress - Egresskubectl apply -f generate-default-deny-networkpolicy.yaml
kubectl create namespace generated-demo
kubectl get networkpolicy -n generated-demokubectl get networkpolicy -n generated-demo default-deny-all -o yamlThe namespace should contain a default-deny-all NetworkPolicy. If you delete it, Kyverno should recreate it because synchronization is enabled. That behavior is desirable for a managed security baseline, but you should be able to explain why it might be wrong for a team-customized template.
Task 5: Test policies locally with the Kyverno CLI
Section titled “Task 5: Test policies locally with the Kyverno CLI”Install the Kyverno CLI and test at least one policy against a manifest before applying it to the cluster. This step reinforces the shift-left workflow that production teams use in pull requests.
brew install kyverno
mkdir -p kyverno-lab/policies kyverno-lab/resources
cp require-platform-labels.yaml kyverno-lab/policies/
cat > kyverno-lab/resources/bad-pod.yaml <<'EOF'apiVersion: v1kind: Podmetadata: name: bad-podspec: containers: - name: nginx image: nginx:1.27EOF
cat > kyverno-lab/resources/good-pod.yaml <<'EOF'apiVersion: v1kind: Podmetadata: name: good-pod labels: app: web team: platformspec: containers: - name: nginx image: nginx:1.27EOF
kyverno apply kyverno-lab/policies/ --resource kyverno-lab/resources/ --detailed-resultsThe CLI should show different results for the labeled and unlabeled resources. If it does not, inspect whether your policy is still in audit mode, whether the resource kind matches, and whether the manifest contains the fields the pattern expects. Use CLI testing to debug policy intent before you debug cluster behavior.
Task 6: Clean up the lab
Section titled “Task 6: Clean up the lab”Delete the lab resources when you are finished. If you created a disposable Kind cluster, deleting the cluster is the simplest cleanup.
kind delete cluster --name kyverno-labSuccess Criteria
Section titled “Success Criteria”- Kyverno pods are running in the
kyvernonamespace before policies are applied. - An audit-mode label policy admits an unlabeled Pod and records a report result.
- The same label policy blocks an unlabeled Pod after switching to enforcement.
- A labeled Pod with a pinned image is admitted successfully.
- A Pod using
nginx:latestis denied with a clear remediation message. - A new namespace receives a generated
default-deny-allNetworkPolicy. - You can explain why
synchronize: truerecreates the generated NetworkPolicy after deletion. - The Kyverno CLI evaluates at least one passing manifest and one failing manifest before cluster admission.
Bonus Challenge
Section titled “Bonus Challenge”Write a namespaced Policy for a namespace called development that adds environment: dev to Pods only when the label is missing. Then create one Pod without the label and one Pod with environment: test, and verify that Kyverno defaults only the missing value instead of overwriting the explicit one.
Next Module
Section titled “Next Module”Continue to the Security Tools README to review the rest of the security toolkit modules and choose the next control to practice.
Sources
Section titled “Sources”- Kubernetes Admission Controllers — Canonical reference for where admission policy runs in the API request path.
- Kyverno README — Primary upstream overview of Kyverno’s core capabilities and positioning.
- Kyverno ClusterPolicy CRD — Schema-level source for policy settings such as admission, background scanning, and rule types.
- Kubernetes Images — Authoritative source for mutable tags, immutable digests, and production guidance around
:latest. - kubernetes.io: network policies — The Kubernetes NetworkPolicy documentation includes the default deny all ingress and egress pattern.
- kubernetes.io: resource quotas — The Kubernetes ResourceQuota documentation directly defines namespace-level aggregate resource constraints.
- kubernetes.io: limit range — The Kubernetes LimitRange documentation directly describes these constraints and defaulting behavior.
- nist.gov: security considerations code signing — NIST’s code-signing publication directly describes integrity and source-authentication guarantees, which supports the module’s distinction between signing and vulnerability assessment.
- raw.githubusercontent.com: howto.md — The Gatekeeper how-to documentation directly explains ConstraintTemplates, constraints, and Rego use.
- raw.githubusercontent.com: README.md — The OPA README directly describes OPA as a general-purpose policy engine with integrations across the stack.
- kubernetes.io: admission webhooks good practices — The Kubernetes admission webhook good-practices page directly covers latency, timeouts, availability, and failure-policy concerns.