Module 10.1: Enterprise Landing Zones & Account Vending
Complexity: [COMPLEX] | Time to Complete: 3h | Prerequisites: Cloud Essentials (AWS/Azure/GCP), Kubernetes Basics, Cloud Architecture Patterns
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design enterprise landing zones using AWS Control Tower, Azure Landing Zones, and GCP Organization Hierarchy
- Implement automated account vending machines that provision cloud accounts with Kubernetes clusters in under 30 minutes
- Configure guardrails (SCPs, Azure Policy, Organization Policies) that enforce security baselines across all accounts
- Deploy landing zone customizations that integrate Kubernetes cluster bootstrapping with GitOps from day zero
Why This Module Matters
Section titled “Why This Module Matters”In March 2023, a Fortune 500 insurance company attempted to launch a new Kubernetes-based claims processing platform. The development team had been building for nine months. When they requested a production AWS account, the cloud team told them the wait time was fourteen weeks. The reason: every account was manually provisioned. A senior cloud architect had to create the VPC, configure the Transit Gateway attachment, set up IAM roles, create the SCPs, register the account in the CMDB, provision the DNS delegation, and configure logging to the central SIEM. This architect handled three accounts per week. There were twenty-two teams in the queue.
The claims platform missed its launch window. A competitor released an equivalent product. The insurance company later estimated the delay cost them $8.6 million in lost first-mover revenue. The problem was not cloud technology. The problem was that the organization treated cloud account creation as an artisanal craft instead of an automated factory line.
Enterprise Landing Zones solve this exact problem. They are the foundational architecture that defines how an organization uses cloud at scale — the account structure, the networking topology, the security guardrails, the identity model, and the automation that provisions all of it in minutes instead of weeks. When Kubernetes enters the picture, Landing Zones become even more critical: every cluster needs networking, identity, logging, and policy from day zero. In this module, you will learn how AWS Control Tower, Azure Landing Zones, and GCP Organization Hierarchy work, how to automate account vending with Kubernetes bootstrap included, and how to wire it all together so a team can go from “I need a cluster” to “I have a production-ready cluster” in under thirty minutes.
The Landing Zone Mental Model
Section titled “The Landing Zone Mental Model”Before diving into specific cloud implementations, you need to understand what a Landing Zone actually is. Think of it as the building code for a city. Before anyone constructs a building, the city has already defined the zoning regulations, the sewer and electrical grid connections, the fire code, and the permit process. A Landing Zone does the same thing for cloud infrastructure.
The Four Pillars
Section titled “The Four Pillars”Every enterprise Landing Zone, regardless of cloud provider, addresses four pillars:
flowchart TD subgraph LZ [ENTERPRISE LANDING ZONE] direction TB subgraph Pillars [The Four Pillars] direction LR ID["Identity & Access<br/>- SSO/IdP<br/>- IAM roles<br/>- RBAC<br/>- Federation"] NET["Network Topology<br/>- Hub-spoke<br/>- Transit GW<br/>- DNS<br/>- Firewall"] SEC["Security & Compliance<br/>- SCPs/Policy<br/>- Guardrails<br/>- Logging<br/>- Encryption"] end subgraph AVM [ACCOUNT VENDING MACHINE] Flow["Template → Provision → Wire → Validate → Deliver"] end Pillars --> AVM Time["Time: Request to Ready = < 30 minutes"] AVM --- Time style Time fill:none,stroke:none endIdentity and Access: Who can do what, across every account, with centralized SSO and federated identity. This must extend from cloud IAM into Kubernetes RBAC seamlessly.
Network Topology: How accounts connect to each other, to on-premises data centers, and to the internet. Every Kubernetes cluster needs a network that is already wired into this topology from birth.
Security and Compliance: The guardrails that prevent teams from doing dangerous things (like opening port 22 to the internet) while enabling them to move fast on everything else. These guardrails must cover both cloud resources and Kubernetes configurations.
Account Vending: The automation that provisions new accounts (or subscriptions, or projects) with all three pillars pre-configured. This is the factory line that eliminates the fourteen-week wait.
AWS Control Tower and Account Factory
Section titled “AWS Control Tower and Account Factory”AWS Control Tower is Amazon’s opinionated Landing Zone solution. It builds on top of AWS Organizations, AWS SSO (now IAM Identity Center), AWS Config, and AWS CloudTrail to create a multi-account environment with pre-configured guardrails.
Architecture Overview
Section titled “Architecture Overview”flowchart TD Root[Root OU] --> Sec[Security OU] Root --> Infra[Infrastructure OU] Root --> Sand[Sandbox OU] Root --> Work[Workloads OU] Root --> Susp[Suspended OU]
Sec --> LogArchive["Log Archive Account<br/>(CloudTrail, Config logs)"] Sec --> Audit["Audit Account<br/>(Security Hub, GuardDuty)"]
Infra --> NetHub["Network Hub Account<br/>(Transit GW, DNS, firewalls)"] Infra --> Shared["Shared Services Account<br/>(CI/CD, container registry)"]
Sand --> DevSand[Developer sandbox accounts]
Work --> Prod["Production OU<br/>(strict guardrails)"] Work --> NonProd[Non-Production OU]
Prod --> AlphaProd[Team-Alpha-Prod] Prod --> BetaProd[Team-Beta-Prod]
NonProd --> AlphaDev[Team-Alpha-Dev] NonProd --> BetaStage[Team-Beta-Staging]
Susp --> Decomm[Decommissioned accounts]Pause and predict: If your organization acquires a startup running a legacy, high-risk monolithic application, which AWS Organizational Unit (OU) would you place their accounts in to isolate them from your core workloads?
Setting Up Control Tower
Section titled “Setting Up Control Tower”# Control Tower is set up via the AWS Console, but you can manage it via CLI after setup# List enrolled accountsaws controltower list-enabled-controls \ --target-identifier "arn:aws:organizations::123456789012:ou/o-abc123/ou-xyz789"
# Check guardrail statusaws controltower list-enabled-controls \ --target-identifier "arn:aws:organizations::123456789012:ou/o-abc123/ou-xyz789" \ --query 'enabledControls[*].{Control:controlIdentifier, Status:statusSummary.status}'Account Factory for Terraform (AFT)
Section titled “Account Factory for Terraform (AFT)”The real power comes from Account Factory for Terraform (AFT), which turns account vending into a GitOps workflow. You define an account in a Terraform file, push to a repo, and AFT provisions the account with all Landing Zone configurations.
module "team_alpha_prod" { source = "./modules/aft-account-request"
control_tower_parameters = { AccountEmail = "team-alpha-prod@company.com" AccountName = "Team-Alpha-Production" ManagedOrganizationalUnit = "Workloads/Production" SSOUserEmail = "team-alpha-lead@company.com" SSOUserFirstName = "Platform" SSOUserLastName = "Team" }
account_tags = { team = "alpha" environment = "production" cost-center = "CC-4521" data-class = "confidential" }
# Custom fields that trigger account customizations account_customizations_name = "k8s-production-baseline"
change_management_parameters = { change_requested_by = "platform-team" change_reason = "New production workload account for Team Alpha" }}Kubernetes Bootstrap in Account Vending
Section titled “Kubernetes Bootstrap in Account Vending”The critical extension for Kubernetes-centric organizations is wiring cluster provisioning into the account vending pipeline. When an account is created, the customization pipeline can automatically:
- Create a VPC with the standard CIDR from the IPAM pool
- Attach the VPC to the Transit Gateway
- Provision an EKS cluster with the organization’s baseline configuration
- Install mandatory add-ons (logging, monitoring, policy enforcement)
- Configure Access Entries for the team’s IAM roles
- Register the cluster with the central Backstage catalog
#!/bin/bash# AFT account customization script: k8s-production-baseline# This runs automatically after account creation
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)REGION="us-east-1"
# Step 1: Create VPC from IPAM poolVPC_CIDR=$(aws ec2 allocate-ipam-pool-cidr \ --ipam-pool-id ipam-pool-0abc123 \ --netmask-length 20 \ --query 'IpamPoolAllocation.Cidr' --output text)
# Step 2: Deploy baseline infrastructure via Terraformcd /opt/aft/customizations/k8s-baselineterraform initterraform apply -auto-approve \ -var="account_id=${ACCOUNT_ID}" \ -var="vpc_cidr=${VPC_CIDR}" \ -var="cluster_name=eks-${ACCOUNT_ID}-prod" \ -var="cluster_version=1.32"
# Step 3: Register cluster in Backstage catalogCLUSTER_ENDPOINT=$(aws eks describe-cluster \ --name "eks-${ACCOUNT_ID}-prod" \ --query 'cluster.endpoint' --output text)
curl -X POST "https://backstage.internal.company.com/api/catalog/entities" \ -H "Content-Type: application/json" \ -d "{ \"apiVersion\": \"backstage.io/v1alpha1\", \"kind\": \"Resource\", \"metadata\": { \"name\": \"eks-${ACCOUNT_ID}-prod\", \"annotations\": { \"kubernetes.io/cluster-name\": \"eks-${ACCOUNT_ID}-prod\" } }, \"spec\": { \"type\": \"kubernetes-cluster\", \"owner\": \"team-alpha\", \"lifecycle\": \"production\" } }"
echo "Account ${ACCOUNT_ID} fully provisioned with EKS cluster"Azure Landing Zones and Subscription Vending
Section titled “Azure Landing Zones and Subscription Vending”Azure takes a similar but structurally different approach. Instead of accounts, Azure uses Subscriptions organized under Management Groups. The Azure Landing Zone architecture (formerly known as Enterprise-Scale) is a reference architecture maintained by Microsoft’s Cloud Adoption Framework team.
Azure Landing Zone Architecture
Section titled “Azure Landing Zone Architecture”flowchart TD Tenant[Azure AD Tenant] --> RootMG[Root Management Group]
RootMG --> Platform[Platform] RootMG --> LZ[Landing Zones] RootMG --> Sandbox[Sandbox] RootMG --> Decomm[Decommissioned]
Platform --> Mgmt["Management<br/>(Log Analytics, Automation)"] Platform --> Ident["Identity<br/>(Active Directory, DNS)"] Platform --> Conn["Connectivity<br/>(Hub VNet, ExpressRoute, Firewall)"]
LZ --> Corp["Corp<br/>(internal apps, private networking)"] LZ --> Online["Online<br/>(internet-facing apps)"]
Corp --> AlphaCorp[Team-Alpha-Prod Subscription] Corp --> BetaCorp[Team-Beta-Prod Subscription]
Online --> AlphaWeb[Team-Alpha-Web Subscription] Online --> BetaAPI[Team-Beta-API Subscription]
Sandbox --> DevSubs["Developer subscriptions<br/>(relaxed policies)"]Subscription Vending with Bicep
Section titled “Subscription Vending with Bicep”targetScope = 'managementGroup'
@description('Name of the workload team')param teamName string
@description('Environment: dev, staging, production')param environment string
@description('Whether to provision an AKS cluster')param provisionAKS bool = true
// Create the subscriptionmodule subscription 'modules/subscription.bicep' = { name: 'sub-${teamName}-${environment}' params: { subscriptionName: 'sub-${teamName}-${environment}' managementGroupId: environment == 'production' ? 'mg-landing-zones-corp' : 'mg-landing-zones-sandbox' billingScope: '/providers/Microsoft.Billing/billingAccounts/1234/enrollmentAccounts/5678' tags: { team: teamName environment: environment costCenter: 'CC-${teamName}' } }}
// Deploy networking into the new subscriptionmodule networking 'modules/spoke-vnet.bicep' = { name: 'net-${teamName}-${environment}' scope: subscription params: { vnetName: 'vnet-${teamName}-${environment}' vnetAddressSpace: '10.${uniqueOctet}.0.0/16' hubVnetId: '/subscriptions/hub-sub-id/resourceGroups/rg-hub/providers/Microsoft.Network/virtualNetworks/vnet-hub' firewallPrivateIp: '10.0.1.4' }}
// Deploy AKS if requestedmodule aks 'modules/aks-baseline.bicep' = if (provisionAKS) { name: 'aks-${teamName}-${environment}' scope: subscription params: { clusterName: 'aks-${teamName}-${environment}' kubernetesVersion: '1.32' subnetId: networking.outputs.aksSubnetId aadAdminGroupId: '${teamName}-k8s-admins' // Azure AD group enableDefender: environment == 'production' enablePolicyAddon: true }}Identity Integration: Azure AD to AKS
Section titled “Identity Integration: Azure AD to AKS”Azure’s biggest advantage for enterprises already using Microsoft is the seamless identity chain from Azure AD through to Kubernetes RBAC:
# Azure AD Group → AKS RBAC (no aws-auth equivalent needed)# The AKS cluster natively understands Azure AD tokens
# Create an Azure AD group for cluster adminsaz ad group create --display-name "aks-team-alpha-admins" \ --mail-nickname "aks-team-alpha-admins"
# Assign the group as AKS cluster adminaz role assignment create \ --assignee-object-id $(az ad group show -g "aks-team-alpha-admins" --query id -o tsv) \ --role "Azure Kubernetes Service Cluster Admin Role" \ --scope "/subscriptions/$SUB_ID/resourceGroups/rg-alpha/providers/Microsoft.ContainerService/managedClusters/aks-alpha-prod"
# Developers get namespace-scoped accessaz role assignment create \ --assignee-object-id $(az ad group show -g "aks-team-alpha-devs" --query id -o tsv) \ --role "Azure Kubernetes Service Cluster User Role" \ --scope "/subscriptions/$SUB_ID/resourceGroups/rg-alpha/providers/Microsoft.ContainerService/managedClusters/aks-alpha-prod"Stop and think: Look at the Azure identity integration. If an engineer transfers from Team Alpha to Team Beta, how many Kubernetes role bindings need to be updated to revoke their old access and grant their new access?
GCP Organization Hierarchy and Project Factory
Section titled “GCP Organization Hierarchy and Project Factory”Google Cloud organizes resources under an Organization, with Folders providing the hierarchy and Projects serving as the account boundary.
GCP Landing Zone Structure
Section titled “GCP Landing Zone Structure”flowchart TD Org["GCP Organization<br/>org-policies/"] --> Folders
subgraph Folders [Folders] direction TB Boot["Bootstrap<br/>(Terraform state, CI/CD)"] Common["Common<br/>(shared VPC, logging, DNS)"] Prod["Production"] NonProd["Non-Production"] Sand["Sandbox"] end
Prod --> AlphaProd[team-alpha-prod] Prod --> BetaProd[team-beta-prod]
NonProd --> AlphaDev[team-alpha-dev] NonProd --> BetaStaging[team-beta-staging]
Sand --> DevSand[developer-sandbox-*]Project Factory with Terraform
Section titled “Project Factory with Terraform”Google’s Cloud Foundation Toolkit provides a Project Factory module that automates project vending:
module "team_alpha_prod" { source = "terraform-google-modules/project-factory/google" version = "~> 15.0"
name = "team-alpha-prod" org_id = "123456789" folder_id = google_folder.production.id billing_account = "AABBCC-112233-DDEEFF" default_service_account = "disable"
activate_apis = [ "container.googleapis.com", "compute.googleapis.com", "monitoring.googleapis.com", "logging.googleapis.com", "dns.googleapis.com", ]
shared_vpc = "vpc-host-project" shared_vpc_subnets = [ "projects/vpc-host-project/regions/us-central1/subnetworks/team-alpha-prod-nodes", "projects/vpc-host-project/regions/us-central1/subnetworks/team-alpha-prod-pods", "projects/vpc-host-project/regions/us-central1/subnetworks/team-alpha-prod-services", ]
labels = { team = "alpha" environment = "production" cost_center = "cc_4521" }}
# GKE cluster in the vended projectmodule "gke_alpha_prod" { source = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster" version = "~> 33.0"
project_id = module.team_alpha_prod.project_id name = "gke-alpha-prod" region = "us-central1" network = "vpc-host-network" subnetwork = "team-alpha-prod-nodes" ip_range_pods = "team-alpha-prod-pods" ip_range_services = "team-alpha-prod-services"
enable_private_nodes = true enable_private_endpoint = false master_ipv4_cidr_block = "172.16.0.0/28"
release_channel = "REGULAR"
node_pools = [ { name = "general" machine_type = "e2-standard-4" min_count = 2 max_count = 10 auto_upgrade = true } ]}Guardrails: Preventive and Detective Controls
Section titled “Guardrails: Preventive and Detective Controls”Landing Zones without guardrails are just organized chaos. Guardrails come in two flavors: preventive (stop bad things before they happen) and detective (find bad things after they happen and alert).
Preventive Guardrails Across Clouds
Section titled “Preventive Guardrails Across Clouds”| Guardrail | AWS (SCP) | Azure (Policy) | GCP (Org Policy) |
|---|---|---|---|
| Deny public S3/Storage buckets | SCP on OU | Deny effect policy | constraints/storage.publicAccessPrevention |
| Require encryption at rest | SCP deny unencrypted | DeployIfNotExists | constraints/compute.requireOsLogin |
| Restrict regions | SCP deny non-approved regions | AllowedLocations | constraints/gcp.resourceLocations |
| Deny privilege escalation | SCP deny IAM:* except break-glass | Custom policy definition | constraints/iam.disableServiceAccountKeyCreation |
| Require tags/labels | SCP deny untagged resources | Require tag initiative | Custom org policy |
| Block public Kubernetes API | SCP deny public EKS endpoint | Deny public AKS | constraints/container.restrictPublicCluster |
Example: AWS SCP for Kubernetes Guardrails
Section titled “Example: AWS SCP for Kubernetes Guardrails”{ "Version": "2012-10-17", "Statement": [ { "Sid": "DenyPublicEKSEndpoint", "Effect": "Deny", "Action": [ "eks:CreateCluster", "eks:UpdateClusterConfig" ], "Resource": "*", "Condition": { "ForAnyValue:StringEquals": { "eks:endpointPublicAccess": "true" } } }, { "Sid": "DenyEKSWithoutLogging", "Effect": "Deny", "Action": "eks:CreateCluster", "Resource": "*", "Condition": { "Null": { "eks:logging": "true" } } }, { "Sid": "RequireEKSEncryption", "Effect": "Deny", "Action": "eks:CreateCluster", "Resource": "*", "Condition": { "Null": { "eks:encryptionConfig": "true" } } } ]}Connecting Cloud Guardrails to Kubernetes Policy
Section titled “Connecting Cloud Guardrails to Kubernetes Policy”The key insight that most organizations miss is that cloud guardrails and Kubernetes policy engines must work together as a unified system. A cloud SCP can prevent a public EKS endpoint, but it cannot prevent a Kubernetes Service of type LoadBalancer from creating a public-facing ALB. For that, you need an in-cluster policy engine.
flowchart TD L1["Layer 1: Cloud Provider (Preventive)<br/>SCPs / Azure Policy / Org Policy<br/>What resources can be created?"] L2["Layer 2: Infrastructure as Code<br/>Terraform/Crossplane validation (pre-apply)<br/>Is the configuration correct?"] L3["Layer 3: Kubernetes Admission<br/>Kyverno / OPA Gatekeeper (ValidatingWebhook)<br/>Is the K8s manifest compliant?"] L4["Layer 4: Runtime Detection<br/>Falco / KubeArmor (eBPF runtime policy)<br/>Is the workload behaving correctly?"]
L1 --> L2 --> L3 --> L4Pause and predict: Before we look at Backstage, list out the automated steps a pipeline should take to fulfill a ‘New Kubernetes Cluster’ request. What needs to happen between the developer clicking ‘Submit’ and them receiving a kubeconfig?
Backstage as the Enterprise Front Door
Section titled “Backstage as the Enterprise Front Door”Backstage, originally built by Spotify and now a CNCF incubating project, has become the standard Internal Developer Platform (IDP) for enterprise Landing Zones. It serves as the self-service portal where teams request infrastructure without needing to understand the underlying automation.
How Backstage Fits Into Account Vending
Section titled “How Backstage Fits Into Account Vending”flowchart TD User(["Developer clicks 'New Project' in Backstage"]) --> Wizard[Backstage Template Wizard] Wizard --> Engine[Software Template Engine] Engine --> Repo["Git Repo<br/>(with TF/Crossplane)"] Repo --> Pipeline["CI/CD Pipeline<br/>(AFT / Azure Pipelines)"] Pipeline --> Output["Provisioned Account/Subscription/Project<br/>+ VPC/VNet + EKS/AKS/GKE Cluster<br/>+ GitOps repo + ArgoCD Application<br/>+ Registered in Backstage Catalog"]Backstage Software Template for K8s Environment
Section titled “Backstage Software Template for K8s Environment”apiVersion: scaffolder.backstage.io/v1beta3kind: Templatemetadata: name: new-k8s-environment title: Request New Kubernetes Environment description: Provision a new cloud account with a production-ready K8s cluster tags: - kubernetes - infrastructurespec: owner: platform-team type: environment
parameters: - title: Team Information required: - teamName - costCenter properties: teamName: title: Team Name type: string pattern: '^[a-z][a-z0-9-]{2,20}$' costCenter: title: Cost Center type: string
- title: Environment Configuration required: - environment - cloudProvider - region properties: environment: title: Environment type: string enum: ['development', 'staging', 'production'] cloudProvider: title: Cloud Provider type: string enum: ['aws', 'azure', 'gcp'] region: title: Region type: string enum: ['us-east-1', 'eu-west-1', 'ap-southeast-1']
- title: Cluster Configuration properties: clusterSize: title: Cluster Size type: string enum: ['small', 'medium', 'large'] default: 'medium' description: | small: 2-5 nodes, dev/test workloads medium: 3-20 nodes, production services large: 5-100 nodes, high-traffic production enableServiceMesh: title: Enable Istio Service Mesh type: boolean default: false enableGPU: title: Include GPU Node Pool type: boolean default: false
steps: - id: generate-terraform name: Generate Infrastructure Code action: fetch:template input: url: ./skeleton values: teamName: ${{ parameters.teamName }} environment: ${{ parameters.environment }} cloudProvider: ${{ parameters.cloudProvider }} region: ${{ parameters.region }} clusterSize: ${{ parameters.clusterSize }}
- id: create-repo name: Create Infrastructure Repository action: publish:github input: repoUrl: github.com?owner=company-infra&repo=env-${{ parameters.teamName }}-${{ parameters.environment }} defaultBranch: main
- id: trigger-pipeline name: Trigger Provisioning Pipeline action: github:actions:dispatch input: repoUrl: github.com?owner=company-infra&repo=env-${{ parameters.teamName }}-${{ parameters.environment }} workflowId: provision.yml
- id: register-catalog name: Register in Backstage Catalog action: catalog:register input: repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }} catalogInfoPath: /catalog-info.yaml
output: links: - title: Infrastructure Repository url: ${{ steps['create-repo'].output.remoteUrl }} - title: Provisioning Pipeline url: ${{ steps['trigger-pipeline'].output.runUrl }}War Story: A telecommunications company with 2,300 engineers implemented Backstage-driven account vending in 2024. Before Backstage, their average time from “team needs infrastructure” to “team has a working cluster” was 23 business days. After implementing the template system, it dropped to 38 minutes. The platform team reported that the most surprising benefit was not speed but consistency — every cluster came out identical, with the same monitoring, the same policies, and the same security baseline. The number of “snowflake cluster” incidents dropped by 91%.
Did You Know?
Section titled “Did You Know?”-
AWS Control Tower manages over 350,000 organizational accounts as of early 2025. The Account Factory for Terraform (AFT) was originally an internal AWS tool used by their own teams to provision accounts for new AWS services. They open-sourced it after realizing that enterprise customers were building inferior versions of the same thing independently.
-
Azure Landing Zones were redesigned three times between 2019 and 2023. The original “Enterprise-Scale” architecture was so complex that Microsoft found only 12% of enterprises could implement it successfully. The current “Azure Landing Zones” approach reduced the minimum viable deployment from 6 weeks to 3 days by making more decisions opinionated rather than configurable.
-
The concept of “guardrails vs. gates” revolutionized how enterprises think about cloud governance. Gates require approval before proceeding (slow, bottleneck). Guardrails prevent dangerous actions automatically but allow everything else (fast, scalable). The term was popularized by AWS in 2019, and within two years, every major cloud provider adopted the language. The distinction matters for Kubernetes too: Kyverno and Gatekeeper are guardrails, while manual YAML review is a gate.
-
Backstage crossed 2,800 adopting companies in 2025 and has over 200 community plugins. The most popular plugin category is “infrastructure provisioning,” which directly maps to the account vending pattern. Spotify’s internal Backstage instance has over 4,500 registered software templates, and their average developer provisions new infrastructure 3.2 times per month through it.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| One giant account for everything | Simplicity. “We only have 3 teams, we do not need multiple accounts.” Then the organization grows to 30 teams. | Start with multi-account from day one. The overhead is minimal with automation, and retrofitting is extremely painful. |
| Landing Zone without Kubernetes integration | The Landing Zone team is a separate group from the Kubernetes platform team. They design the zone without considering K8s networking, identity, or policy needs. | Include Kubernetes architects in Landing Zone design. Every account template should include VPC sizing for pod CIDRs, IAM roles for cluster operations, and policy baseline for K8s. |
| Manual account vending | ”We only create accounts once a quarter, automation is overkill.” Then demand spikes and the queue grows to months. | Automate account vending from the start. Even if you provision one account per month, the automation ensures consistency and eliminates human error. |
| Guardrails too restrictive | Security team designs guardrails without developer input. Developers cannot deploy basic workloads. Shadow IT begins. | Co-design guardrails with developers. Start permissive and tighten based on actual incidents. Monitor guardrail denials to find legitimate use cases being blocked. |
| No DNS strategy in the Landing Zone | DNS is treated as an afterthought. Each account manages its own DNS, leading to naming conflicts and resolution failures across the hub-spoke network. | Design DNS delegation as part of the Landing Zone: a central Route53/Azure DNS/Cloud DNS zone with automatic subdomain delegation per account. |
| Ignoring IPAM from the start | VPC CIDR ranges assigned ad-hoc. Within 18 months, overlapping CIDRs prevent Transit Gateway peering, and pod IP exhaustion appears in tightly-sized clusters. | Use AWS VPC IPAM, Azure IPAM, or a third-party tool like NetBox. Assign CIDRs from a centralized pool that accounts for node IPs, pod IPs, and service IPs per cluster. |
| Backstage template without validation | Templates allow any input. Teams create clusters with names that violate DNS conventions or sizes that exceed their budget approval. | Add JSON Schema validation to Backstage templates. Implement approval workflows for production environments. Connect cost estimation to the template wizard. |
| No Landing Zone lifecycle plan | Landing Zone is deployed once and never updated. Cloud providers release new features (like EKS Pod Identity or AKS Workload Identity) but the baseline never adopts them. | Treat the Landing Zone as a product with a roadmap. Quarterly reviews of cloud provider releases. Automated testing of Landing Zone updates before rollout. |
Question 1: You are the lead architect for a retail company moving to Kubernetes. A colleague suggests saving time by creating a single AWS account containing one massive EKS cluster, and using Kubernetes namespaces to isolate the 15 different product teams. Why is this a dangerous architectural decision for an enterprise?
A single account creates an insurmountable blast radius problem and an IAM complexity nightmare. First, all teams share the same AWS service quotas (like EC2 instance limits, EBS volumes, and VPC IP addresses). One team’s runaway autoscaling event can easily exhaust quotas, causing outages for all other teams sharing the account. Second, restricting AWS API access via IAM requires writing incredibly complex, error-prone resource-level conditions to ensure teams cannot modify each other’s cloud resources outside the cluster. Finally, a security breach escaping one team’s namespace or a compromised node could potentially expose the IAM credentials used by other teams, making the entire organization vulnerable to a single point of failure.
Question 2: Your security team discovers that several development clusters were accidentally provisioned with public API endpoints. They want to ensure this never happens again, but they also want to audit existing clusters. Which types of guardrails should you implement for each requirement, and how do they function differently?
To stop new public endpoints from being created, you must implement a preventive guardrail, such as an AWS Service Control Policy (SCP) or an Azure Policy with a Deny effect. Preventive guardrails actively intercept and block non-compliant API requests before the resource is ever provisioned, ensuring the problem cannot occur. To audit the existing clusters, you need a detective guardrail, such as AWS Config rules or Azure Policy in Audit mode. Detective guardrails scan already-provisioned resources, identify non-compliant configurations, and generate alerts without breaking existing workloads. Using both in tandem provides a comprehensive governance strategy.
Question 3: A new engineering team joins the company and urgently needs a staging environment. They log into the Backstage portal and submit a 'New Kubernetes Environment' request. Describe the exact automated sequence of events that translates this web form submission into a fully provisioned, registered Kubernetes cluster.
The process begins when Backstage takes the form inputs and uses its software template engine to generate infrastructure-as-code files tailored to the team’s parameters. Next, Backstage creates a new Git repository and commits these generated files to it. The creation of this repository triggers a CI/CD pipeline (such as GitHub Actions or AFT) which acts as the vending machine. This pipeline executes the Terraform or Bicep code to provision the cloud account, establish the VPC network topology, deploy the Kubernetes cluster, and configure identity integrations. Finally, the pipeline concludes by making an API call back to Backstage to register the newly created cluster in the service catalog, completing the self-service loop.
Question 4: The networking team has assigned your new AWS account a /24 VPC CIDR block (256 IP addresses) because they assume you are only deploying a single EKS cluster with 10 worker nodes. Six weeks later, your cluster networking completely fails. What architectural reality of cloud-native Kubernetes did the networking team fail to account for?
The networking team failed to account for the fact that in cloud-native networking models like AWS VPC CNI, every single Kubernetes pod is assigned a real, routable IP address directly from the VPC subnet. If a node runs 30 pods, that single node consumes 30+ IP addresses. A relatively small cluster of 10 nodes running standard microservices can easily consume 400 or more IP addresses, completely exhausting a /24 allocation. Enterprise landing zones must utilize centralized IP Address Management (IPAM) to assign large CIDR blocks (typically /16 or /17) to Kubernetes accounts to prevent this exact type of catastrophic IP exhaustion.
Question 5: You are designing the identity architecture for a multi-cloud landing zone spanning AWS and Azure. The security mandate requires that a user's corporate identity directly maps to their Kubernetes namespace permissions. Contrast how you will implement this identity propagation mechanism in AWS EKS versus Azure AKS.
In Azure AKS, the implementation is highly direct because AKS natively integrates with Azure AD (Entra ID). You can directly reference Azure AD Group Object IDs inside your Kubernetes RoleBinding manifests, allowing AKS to natively validate the Azure AD tokens passed by developers. In contrast, AWS EKS requires an intermediary translation layer to bridge AWS IAM and Kubernetes RBAC. You must configure EKS Access Entries (or the legacy aws-auth ConfigMap) to explicitly map an AWS IAM Role ARN to a Kubernetes username and group. Therefore, in Azure the identity flows seamlessly from tenant to cluster, whereas AWS requires your vending pipeline to explicitly build and maintain mapping configurations.
Question 6: The central IT department spent six months building a pristine GCP Landing Zone with strict organizational policies, centralized networking, and standardized service accounts. They hand it over to the Kubernetes platform team to deploy GKE. Within days, the platform team reports they are completely blocked. What is the most likely architectural cause of this failure?
The failure occurred because the Landing Zone was designed without accommodating the specific, complex infrastructure requirements of a Kubernetes control plane and its add-ons. Common blind spots include strict firewall policies that break webhook communication between the GKE control plane and worker nodes, or Shared VPC subnet allocations that are far too small for alias IP ranges required by pods. Furthermore, organization-level policies might inadvertently deny the creation of internal load balancers or restrict the service account permissions required by the cluster autoscaler to provision new nodes. To prevent this, enterprise landing zones must be co-designed with Kubernetes architects to ensure the foundation actually supports the intended workloads.
Question 7: A junior developer uses the Backstage portal to request a 100-node production Kubernetes cluster with expensive GPU instances, intended to process a massive new data pipeline. As the platform architect, how should you design the account vending workflow to handle this specific request safely while still maintaining automated self-service?
The workflow should process this request using an automated business approval gate rather than blocking it for a manual infrastructure code review. Because the Backstage template generates standardized, pre-approved infrastructure code with built-in guardrails, the technical correctness of the cluster is already guaranteed. However, because this request targets a production environment and incurs massive cost, the workflow should pause and automatically route an approval request to the team’s cost center owner and the security lead. Once those stakeholders approve the business case and budget, the CI/CD pipeline should resume and automatically provision the cluster without any human engineer needing to touch the provisioning tools.
Hands-On Exercise: Build a Mini Landing Zone with Account Vending
Section titled “Hands-On Exercise: Build a Mini Landing Zone with Account Vending”In this exercise, you will simulate an enterprise Landing Zone using local tools. You will create a multi-account structure, implement guardrails, and build a self-service vending pipeline that provisions a Kubernetes cluster.
What you will build:
flowchart TD subgraph Local Environment direction TB subgraph Mgmt["Management Cluster (kind)"] direction TB CP["Crossplane (infrastructure provisioner)"] KY["Kyverno (guardrails)"] BS["Backstage (self-service portal)"] end subgraph Pipeline["Vending Pipeline"] direction TB S1["1. Create namespace (simulates account)"] S2["2. Apply network policies (simulates VPC)"] S3["3. Deploy workload cluster (kind-in-kind)"] S4["4. Install baseline (monitoring, policy)"] S5["5. Register in catalog"]
S1 --> S2 --> S3 --> S4 --> S5 end endTask 1: Create the Management Cluster
Section titled “Task 1: Create the Management Cluster”Set up the local management cluster that will serve as your Landing Zone control plane.
Solution
# Create a kind cluster to act as the management clustercat <<'EOF' > /tmp/mgmt-cluster.yamlkind: ClusterapiVersion: kind.x-k8s.io/v1alpha4name: landing-zone-mgmtnodes: - role: control-plane - role: worker - role: workerEOF
kind create cluster --config /tmp/mgmt-cluster.yaml
# Verify the cluster is runningk get nodes# NAME STATUS ROLES AGE VERSION# landing-zone-mgmt-control-plane Ready control-plane 45s v1.32.0# landing-zone-mgmt-worker Ready <none> 30s v1.32.0# landing-zone-mgmt-worker2 Ready <none> 30s v1.32.0Task 2: Install the Guardrail Layer
Section titled “Task 2: Install the Guardrail Layer”Deploy Kyverno and create policies that simulate enterprise guardrails (no privileged containers, mandatory labels, resource limits required).
Solution
# Install Kyvernohelm repo add kyverno https://kyverno.github.io/kyverno/helm install kyverno kyverno/kyverno -n kyverno --create-namespace --wait
# Create enterprise guardrail policiescat <<'EOF' | k apply -f -apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-team-label annotations: policies.kyverno.io/description: "All namespaces must have a team label"spec: validationFailureAction: Enforce rules: - name: check-team-label match: any: - resources: kinds: - Namespace exclude: any: - resources: namespaces: - kube-system - kube-public - kube-node-lease - kyverno - default validate: message: "Namespace must have a 'team' label. This is required by the Landing Zone policy." pattern: metadata: labels: team: "?*"---apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: deny-privilegedspec: validationFailureAction: Enforce rules: - name: deny-privileged-containers match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kyverno validate: message: "Privileged containers are not allowed by Landing Zone policy." pattern: spec: containers: - securityContext: privileged: "!true"---apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-resource-limitsspec: validationFailureAction: Enforce rules: - name: check-limits match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kyverno validate: message: "All containers must have CPU and memory limits set." pattern: spec: containers: - resources: limits: memory: "?*" cpu: "?*"EOF
# Test the guardrailsecho "Testing: namespace without team label (should fail)"k create namespace bad-namespace 2>&1 || true
echo "Testing: namespace with team label (should succeed)"k create namespace good-namespace --dry-run=server -o yaml \ | k label --local -f - team=alpha --dry-run=client -o yaml \ | k apply -f - --dry-run=serverTask 3: Create an Account Vending Script
Section titled “Task 3: Create an Account Vending Script”Build a script that simulates account vending — creating a namespace with all the Landing Zone baseline configurations.
Solution
cat <<'SCRIPT' > /tmp/vend-account.sh#!/bin/bashset -euo pipefail
TEAM_NAME=$1ENVIRONMENT=$2
NAMESPACE="${TEAM_NAME}-${ENVIRONMENT}"echo "=== Vending account: ${NAMESPACE} ==="
# Step 1: Create namespace with required labelsecho "[1/5] Creating namespace with Landing Zone labels..."cat <<EOF | kubectl apply -f -apiVersion: v1kind: Namespacemetadata: name: ${NAMESPACE} labels: team: ${TEAM_NAME} environment: ${ENVIRONMENT} managed-by: landing-zone cost-center: "cc-${TEAM_NAME}"EOF
# Step 2: Apply network policies (simulates VPC isolation)echo "[2/5] Applying network isolation policies..."cat <<EOF | kubectl apply -f -apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-ingress namespace: ${NAMESPACE}spec: podSelector: {} policyTypes: - Ingress---apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-same-namespace namespace: ${NAMESPACE}spec: podSelector: {} ingress: - from: - podSelector: {} policyTypes: - IngressEOF
# Step 3: Create resource quotasecho "[3/5] Setting resource quotas..."cat <<EOF | kubectl apply -f -apiVersion: v1kind: ResourceQuotametadata: name: landing-zone-quota namespace: ${NAMESPACE}spec: hard: requests.cpu: "8" requests.memory: 16Gi limits.cpu: "16" limits.memory: 32Gi pods: "50" services.loadbalancers: "2"EOF
# Step 4: Create RBAC for the teamecho "[4/5] Configuring RBAC..."cat <<EOF | kubectl apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: team-developer namespace: ${NAMESPACE}rules: - apiGroups: ["", "apps", "batch"] resources: ["pods", "deployments", "services", "configmaps", "secrets", "jobs"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["pods/log", "pods/exec"] verbs: ["get", "create"]---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: team-developer-binding namespace: ${NAMESPACE}subjects: - kind: Group name: "team-${TEAM_NAME}-developers" apiGroup: rbac.authorization.k8s.ioroleRef: kind: Role name: team-developer apiGroup: rbac.authorization.k8s.ioEOF
# Step 5: Deploy baseline monitoringecho "[5/5] Deploying baseline services..."cat <<EOF | kubectl apply -f -apiVersion: v1kind: ConfigMapmetadata: name: landing-zone-config namespace: ${NAMESPACE}data: team: "${TEAM_NAME}" environment: "${ENVIRONMENT}" provisioned-at: "$(date -u +%Y-%m-%dT%H:%M:%SZ)" landing-zone-version: "2.1.0"EOF
echo ""echo "=== Account vended successfully ==="echo "Namespace: ${NAMESPACE}"echo "Team: ${TEAM_NAME}"echo "Environment: ${ENVIRONMENT}"echo "Quotas: CPU 8/16 req/limit, Memory 16/32Gi req/limit"echo "Network: Default deny ingress, allow same-namespace"echo "RBAC: team-${TEAM_NAME}-developers -> team-developer role"SCRIPT
chmod +x /tmp/vend-account.sh
# Vend accounts for two teams/tmp/vend-account.sh alpha production/tmp/vend-account.sh beta development
# Verify the vended accountsk get namespaces -l managed-by=landing-zonek get resourcequota -A -l managed-by!=null 2>/dev/null || k get resourcequota -n alpha-productionk get networkpolicy -n alpha-productionTask 4: Test Guardrail Enforcement
Section titled “Task 4: Test Guardrail Enforcement”Verify that the guardrails prevent non-compliant resources in vended accounts.
Solution
# Test 1: Try to create a privileged pod (should be denied)echo "--- Test: Privileged pod (expect DENIED) ---"cat <<'EOF' | k apply -f - 2>&1 || trueapiVersion: v1kind: Podmetadata: name: bad-privileged-pod namespace: alpha-productionspec: containers: - name: evil image: nginx:1.27 securityContext: privileged: true resources: limits: cpu: 100m memory: 128MiEOF
# Test 2: Try to create a pod without resource limits (should be denied)echo "--- Test: Pod without limits (expect DENIED) ---"cat <<'EOF' | k apply -f - 2>&1 || trueapiVersion: v1kind: Podmetadata: name: no-limits-pod namespace: alpha-productionspec: containers: - name: wasteful image: nginx:1.27EOF
# Test 3: Create a compliant pod (should succeed)echo "--- Test: Compliant pod (expect SUCCESS) ---"cat <<'EOF' | k apply -f -apiVersion: v1kind: Podmetadata: name: good-pod namespace: alpha-productionspec: containers: - name: web image: nginx:1.27 securityContext: privileged: false resources: limits: cpu: 100m memory: 128Mi requests: cpu: 50m memory: 64MiEOF
# Verify the compliant pod is runningk get pods -n alpha-productionTask 5: Audit the Landing Zone
Section titled “Task 5: Audit the Landing Zone”Generate a compliance report for all vended accounts.
Solution
cat <<'SCRIPT' > /tmp/audit-landing-zone.sh#!/bin/bashecho "========================================="echo " LANDING ZONE COMPLIANCE AUDIT REPORT"echo " Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)"echo "========================================="echo ""
# List all vended namespacesNAMESPACES=$(kubectl get namespaces -l managed-by=landing-zone -o jsonpath='{.items[*].metadata.name}')
for NS in $NAMESPACES; do echo "--- Namespace: $NS ---" TEAM=$(kubectl get namespace $NS -o jsonpath='{.metadata.labels.team}') ENV=$(kubectl get namespace $NS -o jsonpath='{.metadata.labels.environment}') echo " Team: $TEAM | Environment: $ENV"
# Check network policies NP_COUNT=$(kubectl get networkpolicy -n $NS --no-headers 2>/dev/null | wc -l) if [ "$NP_COUNT" -ge 1 ]; then echo " Network Policies: PASS ($NP_COUNT policies)" else echo " Network Policies: FAIL (no policies found)" fi
# Check resource quotas RQ_COUNT=$(kubectl get resourcequota -n $NS --no-headers 2>/dev/null | wc -l) if [ "$RQ_COUNT" -ge 1 ]; then echo " Resource Quotas: PASS ($RQ_COUNT quotas)" else echo " Resource Quotas: FAIL (no quotas found)" fi
# Check RBAC ROLE_COUNT=$(kubectl get role -n $NS --no-headers 2>/dev/null | wc -l) if [ "$ROLE_COUNT" -ge 1 ]; then echo " RBAC Roles: PASS ($ROLE_COUNT roles)" else echo " RBAC Roles: FAIL (no roles found)" fi
# Check Kyverno policy reports VIOLATIONS=$(kubectl get policyreport -n $NS -o jsonpath='{.items[*].summary.fail}' 2>/dev/null) if [ -z "$VIOLATIONS" ] || [ "$VIOLATIONS" = "0" ]; then echo " Policy Violations: PASS (0 violations)" else echo " Policy Violations: WARN ($VIOLATIONS violations)" fi
echo ""done
echo "========================================="echo " Guardrail Policy Summary"echo "========================================="kubectl get clusterpolicy -o custom-columns=NAME:.metadata.name,ACTION:.spec.validationFailureAction,READY:.status.readySCRIPT
chmod +x /tmp/audit-landing-zone.shbash /tmp/audit-landing-zone.shClean Up
Section titled “Clean Up”kind delete cluster --name landing-zone-mgmtrm /tmp/mgmt-cluster.yaml /tmp/vend-account.sh /tmp/audit-landing-zone.shSuccess Criteria
Section titled “Success Criteria”- I created a management cluster with Kyverno guardrails installed
- I deployed three guardrail policies (team label, no privileged, resource limits)
- I built and ran an account vending script that provisions namespaces with full baseline
- I successfully vended accounts for two teams
- I verified that guardrails block non-compliant resources
- I generated a compliance audit report for all vended accounts
- I can explain the four pillars of an enterprise Landing Zone
Next Module
Section titled “Next Module”With the Landing Zone foundation in place, it is time to go deeper into the policy layer. Head to Module 10.2: Cloud Governance & Policy as Code to learn how AWS SCPs, Azure Policies, and GCP Organization Policies map to Kubernetes policy engines like Kyverno and OPA Gatekeeper, and how to build a unified governance model across cloud and cluster.