On-Premises Kubernetes
12 modules are currently being reworked. Watch this section over the next few days.
Run Kubernetes where the cloud can’t go.
Not every workload belongs in the cloud. Data sovereignty, latency requirements, regulatory constraints, and economics drive enterprises to run Kubernetes on their own hardware. This track covers everything from datacenter planning to day-2 operations — the knowledge that most free resources skip because it’s not glamorous, but it’s where a massive share of production Kubernetes actually runs.
Learning Path
Section titled “Learning Path”Planning & Economics (5 modules) │ ▼Bare Metal Provisioning (4 modules) │ ├── Networking (6 modules) ├── Storage (5 modules) └── Multi-Cluster & Platform (5 modules) │ ▼ Security & Compliance (8 modules) │ ▼ Day-2 Operations (9 modules) │ ▼ Resilience & Migration (3 modules) │ ▼ AI/ML Infrastructure (6 modules)Sections
Section titled “Sections”| Section | Modules | Focus |
|---|---|---|
| Planning & Economics | 5 | Server sizing, cluster topology, TCO, cloud vs on-prem, FinOps & chargeback |
| Bare Metal Provisioning | 4 | PXE, MAAS, Talos, Sidero/Metal3 |
| Networking | 6 | Spine-leaf, BGP, MetalLB, DNS/certs, cross-cluster, service mesh |
| Storage | 5 | Ceph/Rook, local storage, object storage (MinIO), database operators |
| Multi-Cluster & Platform | 5 | vSphere/OpenStack, vCluster/Kamaji, Cluster API, fleet management, active-active |
| Security & Compliance | 8 | Air-gapped, HSM/TPM, AD/LDAP, SPIFFE, Vault, policy-as-code, zero-trust |
| Day-2 Operations | 9 | Upgrades, firmware, observability, capacity, self-hosted CI/CD & registry, serverless |
| Resilience & Migration | 3 | Multi-site DR, hybrid connectivity, cloud repatriation |
| AI/ML Infrastructure | 6 | GPU nodes, private training, LLM serving, MLOps, AIOps, HPC storage |
51 modules total (30 existing + 21 new from #197). From “should we go on-prem?” to “how do we train LLMs on our own GPUs.”
Readiness Check
Section titled “Readiness Check”This is an advanced track. You should already be comfortable with:
- core Kubernetes concepts and troubleshooting
- Linux networking, storage, and security basics
- day-2 operational thinking such as upgrades, observability, and failure handling
If you are not there yet, strengthen those areas first through Prerequisites, Linux, Kubernetes Certifications, and optionally Platform Engineering.
Safest Route Into On-Prem
Section titled “Safest Route Into On-Prem”Prerequisites |Linux |CKA-level Kubernetes understanding |Platform / SRE thinking |On-PremisesYou do not need to finish every module in those tracks first, but you do need the operational maturity they represent.
Red Flags That You Are Entering Too Early
Section titled “Red Flags That You Are Entering Too Early”- you still struggle with Linux networking, storage, or service management basics
- you have not yet operated Kubernetes under failure, upgrade, or capacity pressure
- you mainly want a simpler first cluster rather than private-platform design
If those are true, stay in Prerequisites, Linux, or Kubernetes Certifications a bit longer.
Common Bridge Routes Into On-Prem
Section titled “Common Bridge Routes Into On-Prem”| Coming from | Safest bridge | What to prove before going deeper |
|---|---|---|
| Kubernetes Certifications | CKA -> Linux depth -> On-Prem Planning/Provisioning | kubeadm, networking, storage, and troubleshooting are not fragile for you |
| Cloud | managed Kubernetes -> Architecture Patterns -> On-Prem Planning | you can reason about tradeoffs without assuming the cloud will always provide the control plane around you |
| Platform Engineering | SRE / GitOps / Networking -> On-Prem Operations | you already think in day-2 systems, not only cluster setup |
| AI/ML Engineering | local-first AI -> AI Infrastructure -> On-Prem AI/ML Infrastructure | you understand the jump from a workstation or notebook workflow to private GPU fleet operations |
Prerequisites
Section titled “Prerequisites”- Fundamentals — Cloud Native 101, K8s Basics
- Linux — networking, storage, security hardening (includes LFCS)
- Certifications — CKA (cluster architecture, kubeadm) is required
- Recommended: CKS for security modules
- Recommended: Platform Engineering for SRE and operations modules
Who This Is For
Section titled “Who This Is For”- Infrastructure engineers building private Kubernetes platforms
- Platform teams evaluating on-prem vs cloud for their organization
- SREs operating bare metal or private cloud Kubernetes clusters
- Architects designing multi-site, air-gapped, or hybrid environments
- Budget owners calculating TCO and making build-vs-buy decisions
Common Next Steps
Section titled “Common Next Steps”- go to Platform Engineering if you want deeper reliability, delivery, and platform-discipline context around private infrastructure
- go to AI/ML Engineering if your on-prem goal includes private model training or local-first AI systems
- go back to Cloud if you need a cleaner contrast between managed-cloud assumptions and private-infrastructure tradeoffs before committing to this path