Skip to content

Day-2 Operations

2 modules are currently being reworked. Watch this section over the next few days.

Day-2 operations on bare metal Kubernetes clusters are fundamentally different from managed cloud services. There is no “upgrade cluster” button, no auto-replacing failed nodes, no built-in observability stack, and no default registries or pipelines. You own the hardware, the operating system, the control plane, and every component in between.

These modules cover the operational practices that keep on-premises clusters healthy, current, and scalable over multi-year hardware lifecycles. You will learn how to handle node failures, execute zero-downtime upgrades, manage hardware firmware, and build out the essential developer platform services—such as self-hosted CI/CD, image registries, and serverless capabilities—that make bare metal feel like a fully integrated cloud environment.

ModuleDescriptionTime
Module 7.1: Kubernetes Upgrades on Bare Metalkubeadm upgrade path, drain strategies, rollback, version skew60 min
Module 7.2: Hardware Lifecycle & FirmwareBIOS/firmware updates, disk replacement, SMART monitoring, Redfish API60 min
Module 7.3: Node Failure & Auto-RemediationMachine Health Checks, node problem detector, automated reboot/reprovision60 min
Module 7.4: Observability Without Cloud ServicesSelf-hosted Prometheus + Thanos, Grafana, Loki, IPMI exporter60 min
Module 7.5: Capacity Expansion & Hardware RefreshAdding racks, mixed CPU generations, topology spread, refresh cycles60 min
Self-Hosted CI/CDBuilding robust pipeline infrastructure, runners, and GitOps on bare metal60 min
Self-Hosted Container RegistryDeploying Harbor or similar registries, replication, caching, and vulnerability scanning60 min
Observability at ScaleLong-term metric retention, high availability logging, and distributed tracing architectures60 min
Serverless on Bare MetalImplementing Knative, OpenFaaS, and event-driven architectures without cloud primitives60 min