Multi-Cluster & Platform

On-premises organizations rarely run a single Kubernetes cluster. As teams grow and workloads diversify, the need for multiple clusters emerges — dev/staging/prod separation, regional deployments, tenant isolation, or simply blast-radius reduction. But unlike the cloud, where spinning up a new cluster takes minutes and costs only API calls, on-premises multi-cluster means managing physical servers, control plane placement, and lifecycle automation with limited hardware.

This section covers the infrastructure platforms that sit beneath Kubernetes (vSphere, OpenStack, Harvester), the control plane strategies that let you run many clusters on few servers (vCluster, Kamaji), and the declarative lifecycle tools that treat clusters as cattle (Cluster API on bare metal). We will also explore the complexities of managing fleets of clusters, Kubernetes-as-a-Service control planes like Gardener, and ensuring high availability across disparate geographical locations.

By the end of this section, you will understand how to design, deploy, and operate a multi-cluster architecture on bare metal or private cloud infrastructure, moving away from fragile “pet” clusters to a robust, automated platform engineering approach.

Modules

Module	Description	Time
Module 5.1: Private Cloud Platforms	VMware vSphere + Tanzu, OpenStack + Magnum, Harvester	45 min
Module 5.2: Multi-Cluster Control Planes	vCluster, Kamaji, shared vs dedicated control planes	50 min
Module 5.3: Cluster API on Bare Metal	CAPM3, CAPV, declarative lifecycle, GitOps-driven clusters	50 min
Fleet Management	Managing multiple clusters at scale, policy distribution, and centralized observability	45 min
Active-Active Multi-Site	Disaster recovery, cross-cluster networking, global load balancing, and state replication	60 min
Module 5.6: Gardener	Open-source Kubernetes-as-a-Service; Gardens/Seeds/Shoots architecture; cluster lifecycle at scale; comparison vs Cluster API and Crossplane	60 min
Module 5.7: Multi-Cluster On-Prem	kube-vip virtual IPs (L2/BGP), Karmada federation policy, Liqo transparent offloading; layered architecture for the on-prem multi-cluster stack	60-70 min
Module 5.8: OpenStack on Kubernetes	Architectural inversion: OpenStack control plane as K8s workloads (OpenStack-Helm, Loci, Atmosphere); Ceph+Rook storage; OVN-Kubernetes+Neutron convergence; Magnum as K8s-on-OpenStack; CERN/Walmart/AT&T production realities	60-70 min
Module 5.9: VMware Tanzu	Enterprise Kubernetes portfolio map (TKG, vSphere with Tanzu, TMC, TAP); Supervisor + workload cluster architecture; Cluster API foundations; Broadcom acquisition licensing reality; when Tanzu wins vs alternatives (Rancher, Gardener, OpenShift, vanilla CAPI)	55-65 min
Module 5.10: Edge Fleet Patterns	Edge-scale fleet GitOps for hundreds-to-thousands of store, branch, and IoT clusters; Fleet, ApplicationSet, Flux, CAPI bootstrap, bandwidth-aware sync, per-site overrides, and ring-based rollout isolation	60-75 min
Module 5.11: Disconnected & Air-gapped K8s Ops	Air-gapped, intermittent, and low-bandwidth edge operations; image and Helm mirroring, offline GitOps, data sync, OS updates, local PKI, telemetry buffering, and restricted k3s pulls from Harbor	55 min