Skip to content

Multi-Cluster & Platform

On-premises organizations rarely run a single Kubernetes cluster. As teams grow and workloads diversify, the need for multiple clusters emerges — dev/staging/prod separation, regional deployments, tenant isolation, or simply blast-radius reduction. But unlike the cloud, where spinning up a new cluster takes minutes and costs only API calls, on-premises multi-cluster means managing physical servers, control plane placement, and lifecycle automation with limited hardware.

This section covers the infrastructure platforms that sit beneath Kubernetes (vSphere, OpenStack, Harvester), the control plane strategies that let you run many clusters on few servers (vCluster, Kamaji), and the declarative lifecycle tools that treat clusters as cattle (Cluster API on bare metal). We will also explore the complexities of managing fleets of clusters, Kubernetes-as-a-Service control planes like Gardener, and ensuring high availability across disparate geographical locations.

By the end of this section, you will understand how to design, deploy, and operate a multi-cluster architecture on bare metal or private cloud infrastructure, moving away from fragile “pet” clusters to a robust, automated platform engineering approach.

ModuleDescriptionTime
Module 5.1: Private Cloud PlatformsVMware vSphere + Tanzu, OpenStack + Magnum, Harvester45 min
Module 5.2: Multi-Cluster Control PlanesvCluster, Kamaji, shared vs dedicated control planes50 min
Module 5.3: Cluster API on Bare MetalCAPM3, CAPV, declarative lifecycle, GitOps-driven clusters50 min
Fleet ManagementManaging multiple clusters at scale, policy distribution, and centralized observability45 min
Active-Active Multi-SiteDisaster recovery, cross-cluster networking, global load balancing, and state replication60 min
Module 5.6: GardenerOpen-source Kubernetes-as-a-Service; Gardens/Seeds/Shoots architecture; cluster lifecycle at scale; comparison vs Cluster API and Crossplane60 min
Module 5.7: Multi-Cluster On-Premkube-vip virtual IPs (L2/BGP), Karmada federation policy, Liqo transparent offloading; layered architecture for the on-prem multi-cluster stack60-70 min
Module 5.8: OpenStack on KubernetesArchitectural inversion: OpenStack control plane as K8s workloads (OpenStack-Helm, Loci, Atmosphere); Ceph+Rook storage; OVN-Kubernetes+Neutron convergence; Magnum as K8s-on-OpenStack; CERN/Walmart/AT&T production realities60-70 min
Module 5.9: VMware TanzuEnterprise Kubernetes portfolio map (TKG, vSphere with Tanzu, TMC, TAP); Supervisor + workload cluster architecture; Cluster API foundations; Broadcom acquisition licensing reality; when Tanzu wins vs alternatives (Rancher, Gardener, OpenShift, vanilla CAPI)55-65 min
Module 5.10: Edge Fleet PatternsEdge-scale fleet GitOps for hundreds-to-thousands of store, branch, and IoT clusters; Fleet, ApplicationSet, Flux, CAPI bootstrap, bandwidth-aware sync, per-site overrides, and ring-based rollout isolation60-75 min
Module 5.11: Disconnected & Air-gapped K8s OpsAir-gapped, intermittent, and low-bandwidth edge operations; image and Helm mirroring, offline GitOps, data sync, OS updates, local PKI, telemetry buffering, and restricted k3s pulls from Harbor55 min