Day-2 Operations
2 modules are currently being reworked. Watch this section over the next few days.
Day-2 operations on bare metal Kubernetes clusters are fundamentally different from managed cloud services. There is no “upgrade cluster” button, no auto-replacing failed nodes, no built-in observability stack, and no default registries or pipelines. You own the hardware, the operating system, the control plane, and every component in between.
These modules cover the operational practices that keep on-premises clusters healthy, current, and scalable over multi-year hardware lifecycles. You will learn how to handle node failures, execute zero-downtime upgrades, manage hardware firmware, and build out the essential developer platform services—such as self-hosted CI/CD, image registries, and serverless capabilities—that make bare metal feel like a fully integrated cloud environment.
Modules
Section titled “Modules”| Module | Description | Time |
|---|---|---|
| Module 7.1: Kubernetes Upgrades on Bare Metal | kubeadm upgrade path, drain strategies, rollback, version skew | 60 min |
| Module 7.2: Hardware Lifecycle & Firmware | BIOS/firmware updates, disk replacement, SMART monitoring, Redfish API | 60 min |
| Module 7.3: Node Failure & Auto-Remediation | Machine Health Checks, node problem detector, automated reboot/reprovision | 60 min |
| Module 7.4: Observability Without Cloud Services | Self-hosted Prometheus + Thanos, Grafana, Loki, IPMI exporter | 60 min |
| Module 7.5: Capacity Expansion & Hardware Refresh | Adding racks, mixed CPU generations, topology spread, refresh cycles | 60 min |
| Self-Hosted CI/CD | Building robust pipeline infrastructure, runners, and GitOps on bare metal | 60 min |
| Self-Hosted Container Registry | Deploying Harbor or similar registries, replication, caching, and vulnerability scanning | 60 min |
| Observability at Scale | Long-term metric retention, high availability logging, and distributed tracing architectures | 60 min |
| Serverless on Bare Metal | Implementing Knative, OpenFaaS, and event-driven architectures without cloud primitives | 60 min |