Skip to content

Module 10.2: Cloud Governance & Policy as Code

Complexity: [COMPLEX] | Time to Complete: 2.5h | Prerequisites: Enterprise Landing Zones (Module 10.1), Kubernetes RBAC basics, YAML/JSON fundamentals

After completing this module, you will be able to:

  • Implement policy-as-code using OPA Gatekeeper, Kyverno, and cloud-native policy engines for Kubernetes governance
  • Configure cloud governance frameworks (AWS Config Rules, Azure Policy, GCP Organization Policies) for Kubernetes infrastructure
  • Design tag-based governance strategies that enforce cost allocation, ownership, and compliance across clusters
  • Deploy automated remediation workflows that detect and correct governance violations without human intervention

In September 2022, a mid-size e-commerce company experienced a data breach that exposed 2.3 million customer records. The root cause was embarrassingly simple: a developer created an S3 bucket with public read access to store product images, then accidentally uploaded a database export to the same bucket. The company had an AWS SCP that was supposed to prevent public S3 buckets, but it had been written eighteen months earlier and only covered the s3:PutBucketAcl action — it did not account for the newer s3:PutBucketPublicAccessBlock API. Meanwhile, their EKS clusters had no admission control at all. Kubernetes Services of type LoadBalancer were routinely creating internet-facing ALBs without any review, and environment variables in pod specs regularly contained plaintext database credentials.

The investigation revealed a pattern that is alarmingly common: the cloud governance team and the Kubernetes platform team operated in completely separate silos. The SCP team did not know what Kyverno was. The platform team did not know what SCPs were. There was no unified view of “what policies exist” or “are we compliant” across the stack. The breach cost the company $4.2 million in regulatory fines, notification costs, and lost customer trust.

Policy as Code solves this problem by treating governance rules the same way you treat application code: version-controlled, tested, peer-reviewed, and automatically enforced. In this module, you will learn how cloud provider policy systems work (SCPs, Azure Policy, GCP Org Policies), how Kubernetes policy engines (Kyverno, OPA Gatekeeper) complement them, how to build a unified governance model that covers the entire stack, and how to manage exceptions without creating security holes.


Governance in a cloud-native enterprise is not a single layer. It is a pyramid with each layer handling different concerns at different points in the resource lifecycle.

┌─────────────────────────────────────────────────┐
│ │
│ Layer 5: RUNTIME DETECTION │
│ Falco, KubeArmor, GuardDuty │
│ "Is something bad happening NOW?" │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Layer 4: K8s ADMISSION CONTROL │ │
│ │ Kyverno, OPA Gatekeeper │ │
│ │ "Should this K8s resource be │ │
│ │ created?" │ │
│ └──────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────┐ │
│ │ Layer 3: IaC VALIDATION │ │
│ │ Checkov, tfsec, OPA/Conftest │ │
│ │ "Is the Terraform/Bicep correct │ │
│ │ before we apply it?" │ │
│ └───────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Layer 2: CLOUD PROVIDER POLICIES │ │
│ │ SCPs, Azure Policy, GCP Org Policies │ │
│ │ "What API calls are allowed?" │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Layer 1: IDENTITY & ACCESS (IAM) │ │
│ │ "Who can do what?" │ │
│ └──────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────┘

Each layer catches things the layer below cannot. IAM controls who can call APIs, but it cannot enforce that every S3 bucket has encryption. SCPs can enforce encryption, but they cannot inspect Kubernetes manifests. Admission control can reject bad manifests, but it cannot stop a running container from spawning a reverse shell. Runtime detection catches that.


SCPs are the top-level preventive control in AWS Organizations. They define the maximum permissions available to any principal in an account. Think of them as a ceiling — even if an IAM policy grants s3:*, an SCP that denies s3:DeleteBucket wins.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnapprovedRegions",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"us-east-1",
"us-west-2",
"eu-west-1"
]
},
"ForAllValues:StringNotLike": {
"aws:PrincipalArn": [
"arn:aws:iam::*:role/OrganizationAdmin"
]
}
}
},
{
"Sid": "DenyPublicEKS",
"Effect": "Deny",
"Action": [
"eks:CreateCluster",
"eks:UpdateClusterConfig"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"eks:endpointPublicAccess": "true"
}
}
},
{
"Sid": "RequireIMDSv2",
"Effect": "Deny",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:instance/*",
"Condition": {
"StringNotEquals": {
"ec2:MetadataHttpTokens": "required"
}
}
},
{
"Sid": "DenyLeaveOrganization",
"Effect": "Deny",
"Action": "organizations:LeaveOrganization",
"Resource": "*"
}
]
}

SCP gotchas that trip up most teams:

  • SCPs do not grant permissions, they only restrict. An SCP that allows s3:* does not give anyone S3 access — it just does not restrict it.
  • SCPs do not apply to the management account. This is by design but is a common blindspot.
  • SCPs have a 5,120 character limit per policy. Complex organizations often hit this and need to split policies across multiple SCPs.
  • SCPs are evaluated with an implicit deny. If an action is not explicitly allowed by an SCP in the path from root to account, it is denied.

Azure Policy takes a different approach: instead of simple allow/deny, it supports multiple effects that determine what happens when a resource violates the policy.

{
"properties": {
"displayName": "AKS clusters must use Azure CNI with network policy",
"policyType": "Custom",
"mode": "Indexed",
"description": "Ensures all AKS clusters use Azure CNI networking with Calico or Azure network policy enabled.",
"parameters": {},
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.ContainerService/managedClusters"
},
{
"anyOf": [
{
"field": "Microsoft.ContainerService/managedClusters/networkProfile.networkPlugin",
"notEquals": "azure"
},
{
"field": "Microsoft.ContainerService/managedClusters/networkProfile.networkPolicy",
"exists": "false"
}
]
}
]
},
"then": {
"effect": "Deny"
}
}
}
}

Azure Policy effects:

EffectBehaviorUse Case
DenyBlock resource creation/updatePreventive: enforce hard requirements
AuditAllow but log non-complianceDetective: visibility without blocking
DeployIfNotExistsAuto-remediate by deploying a resourceEnforce logging, diagnostics, extensions
ModifyAlter resource properties during creationAdd tags, enable encryption automatically
DisabledPolicy exists but is not enforcedTesting or temporary exception
DenyActionBlock specific actions on existing resourcesPrevent deletion of critical resources

The DeployIfNotExists effect is particularly powerful for Kubernetes. You can create a policy that automatically installs Azure Policy for AKS (which runs OPA Gatekeeper inside the cluster) whenever an AKS cluster is created:

{
"if": {
"field": "type",
"equals": "Microsoft.ContainerService/managedClusters"
},
"then": {
"effect": "DeployIfNotExists",
"details": {
"type": "Microsoft.ContainerService/managedClusters/extensions",
"name": "microsoft-azure-policy",
"existenceCondition": {
"field": "Microsoft.ContainerService/managedClusters/extensions/extensionType",
"equals": "microsoft.policyinsights"
},
"deployment": {
"properties": {
"mode": "incremental",
"template": {
"resources": [
{
"type": "Microsoft.ContainerService/managedClusters/extensions",
"apiVersion": "2023-05-01",
"name": "[concat(field('name'), '/microsoft-azure-policy')]",
"properties": {
"extensionType": "microsoft.policyinsights",
"autoUpgradeMinorVersion": true
}
}
]
}
}
}
}
}
}

GCP Organization Policies use constraints — predefined or custom rules that restrict resource configurations across the organization hierarchy.

# Deny public GKE clusters
constraint: constraints/container.restrictPublicCluster
listPolicy:
allValues: DENY
---
# Restrict which regions can host GKE clusters
constraint: constraints/gcp.resourceLocations
listPolicy:
allowedValues:
- us-central1
- us-east4
- europe-west1
---
# Require Shielded GKE Nodes
constraint: constraints/container.requireShieldedNodes
booleanPolicy:
enforced: true

GCP also supports custom organization policy constraints using Common Expression Language (CEL):

# Custom constraint: GKE clusters must have Binary Authorization enabled
apiVersion: orgpolicy.googleapis.com/v2
kind: CustomConstraint
metadata:
name: custom.gkeRequireBinaryAuthorization
spec:
resourceTypes:
- container.googleapis.com/Cluster
condition: >
resource.binaryAuthorization.evaluationMode == "PROJECT_SINGLETON_POLICY_ENFORCE"
actionType: DENY
displayName: "GKE clusters must enable Binary Authorization"
description: "All GKE clusters must have Binary Authorization enabled to enforce container image signing."

Kubernetes Policy Engines: Kyverno vs OPA Gatekeeper

Section titled “Kubernetes Policy Engines: Kyverno vs OPA Gatekeeper”

Cloud provider policies stop at the cloud API boundary. Once a Kubernetes cluster exists, you need an in-cluster policy engine to govern what gets deployed inside it.

Kyverno uses Kubernetes-native YAML to define policies. If you can write a Kubernetes manifest, you can write a Kyverno policy. This is its primary advantage: the learning curve is minimal for teams already fluent in YAML.

# Kyverno: Deny containers running as root
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: deny-run-as-root
annotations:
policies.kyverno.io/title: Deny Running as Root
policies.kyverno.io/severity: high
spec:
validationFailureAction: Enforce
background: true
rules:
- name: check-run-as-non-root
match:
any:
- resources:
kinds:
- Pod
validate:
message: >
Running as root is not allowed. Set
spec.securityContext.runAsNonRoot to true or
spec.containers[*].securityContext.runAsNonRoot to true.
pattern:
spec:
=(initContainers):
- securityContext:
runAsNonRoot: true
containers:
- securityContext:
runAsNonRoot: true
---
# Kyverno: Mutate to add default resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-resources
spec:
rules:
- name: add-default-limits
match:
any:
- resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
containers:
- (name): "*"
resources:
limits:
+(memory): "256Mi"
+(cpu): "200m"
requests:
+(memory): "128Mi"
+(cpu): "100m"
---
# Kyverno: Generate NetworkPolicy for every new namespace
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-networkpolicy
spec:
rules:
- name: default-deny-ingress
match:
any:
- resources:
kinds:
- Namespace
exclude:
any:
- resources:
namespaces:
- kube-system
- kube-public
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-ingress
namespace: "{{request.object.metadata.name}}"
synchronize: true
data:
spec:
podSelector: {}
policyTypes:
- Ingress

Gatekeeper uses Rego, a purpose-built policy language from Open Policy Agent. Rego is more powerful than YAML patterns but has a steeper learning curve. Gatekeeper separates the policy template (the logic) from the constraint (the configuration).

# Gatekeeper: ConstraintTemplate (the logic)
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8sdenypublicloadbalancer
spec:
crd:
spec:
names:
kind: K8sDenyPublicLoadBalancer
validation:
openAPIV3Schema:
type: object
properties:
allowedNamespaces:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sdenypublicloadbalancer
violation[{"msg": msg}] {
input.review.object.kind == "Service"
input.review.object.spec.type == "LoadBalancer"
not input.review.object.metadata.annotations["service.beta.kubernetes.io/aws-load-balancer-internal"]
namespace := input.review.object.metadata.namespace
not namespace_allowed(namespace)
msg := sprintf("Public LoadBalancer services are not allowed in namespace '%v'. Add annotation 'service.beta.kubernetes.io/aws-load-balancer-internal: true' or use an allowed namespace.", [namespace])
}
namespace_allowed(namespace) {
allowed := input.parameters.allowedNamespaces[_]
namespace == allowed
}
---
# Gatekeeper: Constraint (the configuration)
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDenyPublicLoadBalancer
metadata:
name: deny-public-lb-except-ingress
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Service"]
parameters:
allowedNamespaces:
- ingress-nginx
- istio-system
FeatureKyvernoOPA Gatekeeper
Policy languageKubernetes-native YAMLRego (purpose-built)
Learning curveLow (YAML knowledge sufficient)High (Rego is a new language)
ValidationYesYes
MutationYes (native)Yes (external data, more complex)
GenerationYes (create resources from policies)No
Image verificationYes (cosign, Notary)Via external data
Background scanningYes (audit existing resources)Yes (audit existing resources)
Policy exceptionsPolicyException resourceConfig constraint match exclusions
Multi-tenancyNamespace-scoped policiesNamespace-scoped constraints
CNCF statusIncubatingGraduated (via OPA)
Best forTeams wanting simplicity and mutation/generationTeams needing complex logic or already using OPA

Mapping Cloud Policies to Kubernetes Policies

Section titled “Mapping Cloud Policies to Kubernetes Policies”

The real power of Policy as Code comes from creating a unified governance model where cloud policies and Kubernetes policies reinforce each other. Here is a mapping of common enterprise requirements across both layers:

RequirementCloud PolicyKubernetes Policy
No public endpointsSCP: deny public EKS/AKS/GKEKyverno: deny Service type LoadBalancer without internal annotation
Encryption at restSCP: deny unencrypted EBS/disksKyverno: require encrypted StorageClass
Image provenanceECR/ACR/Artifact Registry policiesKyverno: verify image signatures (cosign)
Resource taggingSCP: deny untagged resourcesKyverno: require labels matching cloud tags
Network segmentationSCP: deny public subnets in EKS VPCsKyverno: generate NetworkPolicy on namespace creation
Least privilegeIAM: minimal role permissionsKyverno: deny privileged containers, deny hostNetwork
LoggingSCP: require CloudTrail/audit logsKyverno: require sidecar logging or FluentBit DaemonSet
Cost controlAWS Budgets / Azure Cost AlertsKyverno: enforce resource limits, deny unrestricted replicas

Example: Unified Policy for Image Provenance

Section titled “Example: Unified Policy for Image Provenance”

This is how a single governance requirement (“all container images must be signed”) spans both cloud and Kubernetes layers:

Terminal window
# Layer 1 (Cloud): ECR repository policy requiring signed images
aws ecr put-lifecycle-policy --repository-name my-app \
--lifecycle-policy-text '{
"rules": [{
"rulePriority": 1,
"description": "Keep only signed images",
"selection": {
"tagStatus": "tagged",
"tagPrefixList": ["v"],
"countType": "imageCountMoreThan",
"countNumber": 50
},
"action": { "type": "expire" }
}]
}'
# Layer 2 (Kubernetes): Kyverno policy verifying cosign signatures
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: Enforce
webhookTimeoutSeconds: 30
rules:
- name: verify-cosign-signature
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "123456789012.dkr.ecr.*.amazonaws.com/*"
attestors:
- entries:
- keyless:
subject: "https://github.com/my-org/*"
issuer: "https://token.actions.githubusercontent.com"
rekor:
url: "https://rekor.sigstore.dev"
mutateDigest: true
verifyDigest: true

Every governance system needs a way to handle legitimate exceptions. The question is whether exceptions are managed through bureaucratic approval processes or through code.

BAD: Exception via email
1. Developer emails security team: "I need public LB for 2 weeks"
2. Security team discusses in Slack
3. Someone says "ok" in a thread
4. Developer manually edits the policy
5. Exception is never removed
6. 18 months later, auditor finds 200 "temporary" exceptions
# Kyverno PolicyException (the right way)
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
name: allow-public-lb-for-demo
namespace: demo-team
labels:
exception-owner: security-team
exception-ticket: SEC-4521
exception-expires: "2025-04-15"
spec:
exceptions:
- policyName: deny-public-loadbalancer
ruleNames:
- check-internal-annotation
match:
any:
- resources:
kinds:
- Service
namespaces:
- demo-team
names:
- demo-frontend-svc
conditions:
any:
- key: "{{request.object.metadata.annotations.exception-approved-by}}"
operator: Equals
value: "security-team"
# Automated exception lifecycle with expiry checking
cat <<'SCRIPT' > /tmp/check-expired-exceptions.sh
#!/bin/bash
# Run this in CI/CD or as a CronJob
TODAY=$(date +%Y-%m-%d)
kubectl get policyexception -A -o json | jq -r \
'.items[] |
select(.metadata.labels."exception-expires" != null) |
select(.metadata.labels."exception-expires" < "'$TODAY'") |
"\(.metadata.namespace)/\(.metadata.name) expired on \(.metadata.labels."exception-expires")"'
SCRIPT

Shift-Left: Catching Policy Violations Before Deployment

Section titled “Shift-Left: Catching Policy Violations Before Deployment”

The most effective governance catches violations as early as possible — ideally before the code leaves the developer’s machine.

┌──────────────────────────────────────────────────────────┐
│ SHIFT-LEFT POLICY ENFORCEMENT PIPELINE │
│ │
│ Local Dev ──► CI Pipeline ──► Admission ──► │
│ (pre-commit) (PR checks) (runtime) │
│ │
│ kyverno-cli kyverno-cli Kyverno webhook │
│ conftest checkov Gatekeeper │
│ kubeconform tfsec webhook │
│ OPA/Conftest │
│ │
│ Cost: $0 Cost: $0 Cost: Blocked │
│ (instant fix) (PR feedback) deployment │
└──────────────────────────────────────────────────────────┘
Terminal window
# Pre-commit hook: validate K8s manifests against Kyverno policies
# .pre-commit-config.yaml entry:
# - repo: local
# hooks:
# - id: kyverno-validate
# name: Kyverno Policy Check
# entry: bash -c 'kyverno apply policies/ --resource $@' --
# files: '\.ya?ml$'
# CI pipeline step: validate with kyverno-cli
kyverno apply policies/ \
--resource deployment.yaml \
--policy-report \
--output /tmp/policy-report.json
# Check the report
cat /tmp/policy-report.json | jq '.summary'
# { "pass": 12, "fail": 0, "warn": 2, "error": 0 }
# CI pipeline step: validate Terraform with Checkov
checkov -d ./terraform \
--framework terraform \
--check CKV_AWS_39,CKV_AWS_58,CKV_AWS_337 \
--output json
# CI pipeline step: validate with Conftest (OPA for config files)
conftest test deployment.yaml \
--policy ./opa-policies/ \
--output json

  1. AWS SCPs were originally limited to 2,048 characters when launched in 2017. Enterprise customers hit this limit so frequently that AWS increased it to 5,120 characters in 2019. Even at 5,120 characters, complex organizations regularly hit the ceiling and must split single logical policies across multiple SCP documents. Azure Policy has no such character limit, which is one reason Azure policies tend to be more detailed than AWS SCPs.

  2. OPA (Open Policy Agent) evaluates over 10 billion policy decisions per day across its production deployments globally. Netflix alone processes over 500 million OPA decisions daily for API authorization. The Rego language was designed specifically because existing policy languages (like AWS IAM JSON) could not express complex cross-resource relationships needed for modern authorization.

  3. Kyverno was created in 2019 by Nirmata specifically because OPA Gatekeeper’s Rego language was a barrier to adoption for Kubernetes platform teams. In a 2024 CNCF survey, 58% of organizations using Kubernetes policy engines chose Kyverno, compared to 34% for OPA Gatekeeper. The most commonly cited reason was “we already know YAML.”

  4. The concept of “Policy as Code” was formalized by HashiCorp’s Sentinel in 2017, but the practice dates back to SELinux mandatory access control policies in the early 2000s. The key innovation was not writing policies in code (that was always possible) but treating policies with the same software engineering practices as application code: version control, testing, peer review, and continuous deployment.


MistakeWhy It HappensHow to Fix It
Cloud policies and K8s policies designed in silosDifferent teams own each layer. Cloud team does not understand K8s. Platform team does not understand SCPs.Create a unified governance council. Map every compliance requirement across both layers. Use the mapping table approach from this module.
Audit-only policies that never become enforcedTeams set policies to “audit” mode for testing and never flip to “enforce” because they fear breaking things.Set a timeline: 30 days audit, then enforce. Use CI/CD policy validation (shift-left) to catch violations before they hit the cluster.
Exception without expiryRushed exception granted to unblock a release. No one tracks the cleanup.Every exception must have a ticket number, an owner, and an expiry date. Automate expiry checking with a CronJob or CI pipeline.
Over-relying on cloud policies, ignoring K8s”Our SCPs are comprehensive, we do not need Kyverno.” But SCPs cannot see inside a Kubernetes cluster.Cloud policies protect cloud resources. K8s policies protect workloads. You need both. A public LoadBalancer can bypass cloud firewall rules entirely.
Writing Rego when YAML would sufficeEngineering team defaults to Gatekeeper because “OPA is the standard.” But their policies are simple pattern matching.Evaluate both Kyverno and Gatekeeper honestly. If your policies are mostly “require label X” or “deny privilege Y,” Kyverno is simpler. Use Gatekeeper when you need cross-resource logic.
Not testing policies before deploymentPolicy deployed directly to production cluster. A mutation policy with a typo adds invalid annotations to every pod. Cluster-wide outage.Always test policies in a staging cluster first. Use kyverno apply or gator test in CI. Run in audit mode before enforce mode.
SCP character limit workarounds using wildcardsSCP too long, so engineer replaces specific actions with * wildcards. This makes the policy either too permissive or too restrictive.Split complex SCPs into multiple policies. Use condition keys instead of action lists where possible. AWS allows up to 5 SCPs per OU.
Forgetting the management accountSCPs do not apply to the management account. Critical governance gaps exist there.Use the management account only for Organizations management. Move all workloads to member accounts. Apply detective controls (Config rules, GuardDuty) to the management account separately.

Question 1: An AWS SCP denies all actions in region ap-southeast-1. A developer assumes an IAM role that has full S3 access (s3:*) and tries to create a bucket in ap-southeast-1. What happens and why?

The request is denied. SCPs define the maximum available permissions for all principals in an account. Even though the IAM policy grants s3:*, the SCP creates a ceiling that blocks all actions in ap-southeast-1. The effective permissions are the intersection of the IAM policy and all SCPs in the path from the organization root to the account. SCPs do not grant permissions — they only restrict them. This is a fundamental difference from IAM policies and is the reason SCPs are so effective as guardrails: no one in the account, regardless of their IAM permissions, can bypass the SCP.

Question 2: Your company uses Azure Policy with a "Deny" effect to prevent AKS clusters without network policy enabled. A team creates an AKS cluster via Terraform and the deployment fails. They request an exception. How should you handle this?

Create a policy exemption in Azure Policy using the Microsoft.Authorization/policyExemptions resource. Azure Policy supports two exemption categories: Waiver (the resource is non-compliant but exempted) and Mitigated (the requirement is met through other means). The exemption should be scoped to the specific resource, have an expiration date, and reference a tracking ticket. In code:

{
"properties": {
"policyAssignmentId": "/providers/Microsoft.Management/managementGroups/.../providers/Microsoft.Authorization/policyAssignments/...",
"exemptionCategory": "Waiver",
"expiresOn": "2025-06-15T00:00:00Z",
"description": "Temporary: team-beta needs kubenet for legacy migration. Ticket: SEC-2341"
}
}

Never disable the entire policy. Scope exceptions to the narrowest possible resource.

Question 3: Explain the difference between Kyverno's "validate", "mutate", and "generate" rule types. Give one example of when you would use each.

Validate checks whether a resource meets criteria and blocks it if not. Example: require all pods to have runAsNonRoot: true. Mutate modifies a resource during admission, adding or changing fields. Example: automatically inject a sidecar container or add default resource limits to pods that lack them. Generate creates a new resource when a triggering resource is created. Example: automatically create a NetworkPolicy with default-deny-ingress whenever a new namespace is created. Validate is the most common (enforcement), mutate reduces developer burden (sane defaults), and generate ensures baseline configuration (consistency). An enterprise typically uses all three: generate for namespace baselines, mutate for defaults, and validate for hard security requirements.

Question 4: Your CI pipeline uses Checkov to validate Terraform and kyverno-cli to validate Kubernetes manifests. A developer's PR passes both checks, but the deployment is still rejected by the Kyverno webhook in the cluster. How is this possible?

Several scenarios can cause this: (1) The cluster has newer policies that the CI pipeline does not have. The pipeline validates against a snapshot of policies, but someone added a new policy to the cluster after the CI policy set was last updated. (2) The Kyverno webhook evaluates the final rendered resource, which may differ from the manifest file. Helm templating, Kustomize overlays, or ArgoCD sync might modify the manifest between CI validation and cluster admission. (3) The policy uses external data (like ConfigMap references or API calls) that produce different results in the cluster than in CI. The fix is to keep CI policy sets in sync with cluster policies via GitOps, and to validate the fully-rendered manifests (post-Helm, post-Kustomize), not the source templates.

Question 5: An organization has 300 Kubernetes clusters across 3 cloud providers. They want a single policy set that works everywhere. Should they use Kyverno or OPA Gatekeeper? Justify your answer.

For pure Kubernetes policy, either can work since both run as admission webhooks and are cloud-agnostic. However, if the goal is a single policy language across the entire stack (cloud resources + Kubernetes + CI/CD + API authorization), OPA Gatekeeper has an advantage because Rego policies can be reused with standalone OPA for Terraform validation (Conftest), API authorization, and custom services. Kyverno is Kubernetes-specific — it cannot validate Terraform or authorize API calls. That said, at 300 clusters, operational simplicity matters enormously. Kyverno’s YAML-based policies are easier for platform teams to maintain across a large fleet without deep Rego expertise. The pragmatic choice for most organizations: Kyverno for Kubernetes admission control, Conftest/OPA for IaC validation, and a mapping document that ensures both policy sets enforce the same governance requirements.

Question 6: What happens if a Kyverno mutation policy and a validation policy conflict? For example, the mutation policy adds a label, but the validation policy requires a different value for that label.

Kyverno processes mutations before validations in a deterministic order. If a mutation policy adds label env: default and a validation policy requires env to match the namespace annotation environment, the validation will fail because the mutated value does not match. This is a policy design bug, not a Kyverno bug. The fix is to ensure mutation and validation policies are designed as a coherent set. Best practice: the mutation policy should read the desired value from context (like namespace labels) rather than hardcoding. Use kyverno apply in CI to test the full pipeline (mutation + validation) together against sample resources to catch these conflicts before deployment.


Hands-On Exercise: Build a Unified Cloud + K8s Governance Pipeline

Section titled “Hands-On Exercise: Build a Unified Cloud + K8s Governance Pipeline”

In this exercise, you will create a multi-layer governance system that validates both infrastructure code and Kubernetes manifests, implements shift-left policy checking, and manages exceptions properly.

What you will build:

┌──────────────────────────────────────────────────────┐
│ Governance Pipeline │
│ │
│ 1. Pre-commit: validate YAML with kubeconform │
│ 2. CI: check manifests against Kyverno policies │
│ 3. CI: check Terraform against OPA/Conftest │
│ 4. Cluster: enforce with Kyverno admission webhook │
│ 5. Audit: scan existing resources for compliance │
│ 6. Exception: managed PolicyException with expiry │
└──────────────────────────────────────────────────────┘

Task 1: Set Up the Policy Test Environment

Section titled “Task 1: Set Up the Policy Test Environment”

Create a kind cluster and install Kyverno for admission control.

Solution
Terminal window
# Create the governance lab cluster
kind create cluster --name governance-lab
# Install Kyverno
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno -n kyverno --create-namespace --wait
# Verify Kyverno is running
k get pods -n kyverno

Deploy a set of enterprise policies that cover security, cost, and operational concerns.

Solution
Terminal window
cat <<'EOF' | k apply -f -
# Policy 1: Deny privileged containers
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: deny-privileged
spec:
validationFailureAction: Enforce
rules:
- name: deny-privileged-containers
match:
any:
- resources:
kinds:
- Pod
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
validate:
message: "Privileged containers are denied by governance policy GOV-001."
pattern:
spec:
containers:
- securityContext:
privileged: "!true"
---
# Policy 2: Require resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-limits
match:
any:
- resources:
kinds:
- Pod
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
validate:
message: "All containers must have CPU and memory limits (GOV-002)."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
---
# Policy 3: Deny latest tag
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: deny-latest-tag
spec:
validationFailureAction: Enforce
rules:
- name: check-image-tag
match:
any:
- resources:
kinds:
- Pod
exclude:
any:
- resources:
namespaces:
- kube-system
- kyverno
validate:
message: "Images must use a specific tag, not ':latest' (GOV-003)."
pattern:
spec:
containers:
- image: "!*:latest"
---
# Policy 4: Require team and cost-center labels on namespaces
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-namespace-labels
spec:
validationFailureAction: Enforce
rules:
- name: check-labels
match:
any:
- resources:
kinds:
- Namespace
exclude:
any:
- resources:
names:
- kube-*
- default
- kyverno
validate:
message: "Namespaces must have 'team' and 'cost-center' labels (GOV-004)."
pattern:
metadata:
labels:
team: "?*"
cost-center: "?*"
---
# Policy 5: Generate default NetworkPolicy on namespace creation
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-network-policy
spec:
rules:
- name: default-deny-ingress
match:
any:
- resources:
kinds:
- Namespace
exclude:
any:
- resources:
names:
- kube-*
- default
- kyverno
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-ingress
namespace: "{{request.object.metadata.name}}"
synchronize: true
data:
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Policy 6: Mutate to add standard labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-governance-labels
spec:
rules:
- name: add-managed-by-label
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
mutate:
patchStrategicMerge:
metadata:
labels:
+(governance.company.com/policy-version): "2.1"
+(governance.company.com/managed-by): "platform-team"
EOF
echo "Policies deployed:"
k get clusterpolicy

Install kyverno-cli and validate manifests before they reach the cluster.

Solution
Terminal window
# Install kyverno CLI
# On macOS:
brew install kyverno
# Or download directly:
# curl -LO https://github.com/kyverno/kyverno/releases/latest/download/kyverno-cli_v1.13.0_linux_amd64.tar.gz
# Create a sample set of manifests to validate
mkdir -p /tmp/governance-lab/manifests /tmp/governance-lab/policies
# Export cluster policies to local files for CLI validation
k get clusterpolicy deny-privileged -o yaml > /tmp/governance-lab/policies/deny-privileged.yaml
k get clusterpolicy require-resource-limits -o yaml > /tmp/governance-lab/policies/require-limits.yaml
k get clusterpolicy deny-latest-tag -o yaml > /tmp/governance-lab/policies/deny-latest.yaml
# Create a GOOD manifest
cat <<'EOF' > /tmp/governance-lab/manifests/good-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: team-alpha
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web
image: nginx:1.27.3
securityContext:
privileged: false
runAsNonRoot: true
runAsUser: 1000
resources:
limits:
cpu: 200m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
EOF
# Create a BAD manifest (multiple violations)
cat <<'EOF' > /tmp/governance-lab/manifests/bad-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: yolo-app
namespace: team-beta
spec:
replicas: 1
selector:
matchLabels:
app: yolo
template:
metadata:
labels:
app: yolo
spec:
containers:
- name: danger
image: nginx:latest
securityContext:
privileged: true
EOF
# Validate with kyverno-cli
echo "=== Validating GOOD manifest ==="
kyverno apply /tmp/governance-lab/policies/ \
--resource /tmp/governance-lab/manifests/good-deployment.yaml 2>&1 || true
echo ""
echo "=== Validating BAD manifest ==="
kyverno apply /tmp/governance-lab/policies/ \
--resource /tmp/governance-lab/manifests/bad-deployment.yaml 2>&1 || true

Task 4: Test Policy Enforcement in the Cluster

Section titled “Task 4: Test Policy Enforcement in the Cluster”

Create a namespace and test that policies are enforced at admission time.

Solution
Terminal window
# Create a compliant namespace
cat <<'EOF' | k apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: team-alpha
labels:
team: alpha
cost-center: cc-4521
EOF
# Verify NetworkPolicy was auto-generated
echo "Auto-generated NetworkPolicy:"
k get networkpolicy -n team-alpha
# Test: Deploy a non-compliant pod (should fail)
echo "--- Testing non-compliant pod (expect failure) ---"
cat <<'EOF' | k apply -f - 2>&1 || true
apiVersion: v1
kind: Pod
metadata:
name: bad-pod
namespace: team-alpha
spec:
containers:
- name: bad
image: nginx:latest
securityContext:
privileged: true
EOF
# Test: Deploy a compliant pod (should succeed)
echo "--- Testing compliant pod (expect success) ---"
cat <<'EOF' | k apply -f -
apiVersion: v1
kind: Pod
metadata:
name: good-pod
namespace: team-alpha
spec:
containers:
- name: web
image: nginx:1.27.3
securityContext:
privileged: false
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
EOF
k get pods -n team-alpha

Create a managed exception for a legitimate use case with tracking and expiry.

Solution
Terminal window
# Scenario: team-alpha needs a privileged init container for a volume
# permission fix (a common pattern with persistent volumes)
cat <<'EOF' | k apply -f -
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
name: allow-init-privileged-alpha
namespace: team-alpha
labels:
exception-owner: security-team
exception-ticket: SEC-8823
exception-expires: "2025-06-01"
spec:
exceptions:
- policyName: deny-privileged
ruleNames:
- deny-privileged-containers
match:
any:
- resources:
kinds:
- Pod
namespaces:
- team-alpha
conditions:
any:
- key: "{{request.object.metadata.labels.exception-approved}}"
operator: Equals
value: "SEC-8823"
EOF
# Now test: a pod with the exception label should be allowed
echo "--- Testing with exception label ---"
cat <<'EOF' | k apply -f -
apiVersion: v1
kind: Pod
metadata:
name: volume-fix-pod
namespace: team-alpha
labels:
exception-approved: "SEC-8823"
spec:
containers:
- name: app
image: nginx:1.27.3
securityContext:
privileged: true
resources:
limits:
cpu: 100m
memory: 128Mi
EOF
# List all exceptions
echo "Active exceptions:"
k get policyexception -A

Task 6: Generate a Governance Compliance Report

Section titled “Task 6: Generate a Governance Compliance Report”

Build a script that reports on policy compliance across all namespaces.

Solution
cat <<'SCRIPT' > /tmp/governance-report.sh
#!/bin/bash
echo "============================================="
echo " GOVERNANCE COMPLIANCE REPORT"
echo " Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "============================================="
echo ""
echo "--- Cluster Policies ---"
kubectl get clusterpolicy -o custom-columns=\
NAME:.metadata.name,\
ACTION:.spec.validationFailureAction,\
READY:.status.ready,\
RULES:.status.rulecount.validate
echo ""
echo "--- Policy Reports by Namespace ---"
for NS in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | grep -v kube- | grep -v default | grep -v kyverno); do
PASS=$(kubectl get policyreport -n $NS -o jsonpath='{.items[*].summary.pass}' 2>/dev/null)
FAIL=$(kubectl get policyreport -n $NS -o jsonpath='{.items[*].summary.fail}' 2>/dev/null)
WARN=$(kubectl get policyreport -n $NS -o jsonpath='{.items[*].summary.warn}' 2>/dev/null)
echo " $NS: pass=${PASS:-0} fail=${FAIL:-0} warn=${WARN:-0}"
done
echo ""
echo "--- Active Exceptions ---"
kubectl get policyexception -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
TICKET:.metadata.labels.exception-ticket,\
EXPIRES:.metadata.labels.exception-expires 2>/dev/null || echo " No exceptions found"
echo ""
echo "--- Non-Compliant Resources (from background scan) ---"
kubectl get clusterpolicyreport -o json 2>/dev/null | \
jq -r '.items[].results[] | select(.result == "fail") | " \(.policy): \(.resources[0].namespace)/\(.resources[0].name) - \(.message)"' 2>/dev/null \
|| echo " No violations found in background scan"
echo ""
echo "============================================="
SCRIPT
chmod +x /tmp/governance-report.sh
bash /tmp/governance-report.sh
Terminal window
kind delete cluster --name governance-lab
rm -rf /tmp/governance-lab /tmp/governance-report.sh
  • I deployed 6 Kyverno policies covering security, cost, and operational concerns
  • I validated manifests locally using kyverno-cli (shift-left)
  • I verified that non-compliant resources are blocked at admission
  • I confirmed that compliant resources are admitted
  • I verified that namespace creation auto-generates a NetworkPolicy
  • I created a managed PolicyException with ticket tracking and expiry
  • I generated a governance compliance report across all namespaces
  • I can explain the five layers of the Policy Pyramid

Now that you understand how to enforce governance across cloud and Kubernetes, it is time to connect these policies to compliance frameworks. Head to Module 10.3: Continuous Compliance & CSPM to learn how to map your policies to SOC2, PCI-DSS, and HIPAA controls, automate evidence collection, and integrate Trivy and Falco with cloud security hubs.