Module 5.4: Worker Node Failures
Complexity:
[MEDIUM]- Critical for cluster operationsTime to Complete: 45-55 minutes
Prerequisites: Module 5.1 (Methodology), Module 1.1 (Cluster Architecture)
Why This Module Matters
Section titled “Why This Module Matters”In 2018, a major online retailer experienced a catastrophic global outage during their peak holiday sales event. The root cause was not a complex network intrusion or a database corruption, but a simple memory leak in a third-party logging daemon deployed across their worker nodes. As the daemon consumed RAM, individual worker nodes sequentially exhausted their memory capacities. Each node hit MemoryPressure, stopped accepting new pods, and began aggressively evicting existing workloads.
Because the underlying issue was not immediately diagnosed, the Kubernetes scheduler desperately scrambled to place the newly evicted pods onto the remaining healthy nodes. This cascading failure created a massive “thundering herd” effect. The surviving worker nodes were instantaneously overwhelmed by the flood of rescheduled pods, causing them to run out of memory as well. Within minutes, the entire e-commerce platform collapsed, resulting in six hours of downtime and an estimated $15 million in lost revenue. This incident underscores a brutal truth in distributed systems: a localized node failure, if left unchecked, can quickly metastasize into a global cluster outage.
Worker nodes are the fundamental workhorses of your Kubernetes cluster. They are where your applications actually execute. When a node fails, the applications running on it suffer immediately. Understanding how to definitively diagnose and fix worker node issues—whether it is a crashed kubelet agent, an unresponsive container runtime, or critical resource exhaustion—is essential for maintaining cluster health. This module prepares you to jump into a failing node, interpret the low-level system signals, and confidently restore service before the cascading effects take hold of your infrastructure.
The Factory Floor Analogy
If the control plane is management, worker nodes are the factory floor. The kubelet is the floor supervisor - if they’re out, nothing gets done. The container runtime is the machinery - if it breaks, production stops. Node resources (CPU, memory, disk) are the raw materials - run out, and the factory grinds to a halt.
What You’ll Learn
Section titled “What You’ll Learn”- Diagnose the root cause of
NotReadyandUnknownnode states using systematic debugging techniques and system logs. - Evaluate node resource pressure conditions and implement immediate remediation strategies to prevent cascading failures across the cluster.
- Debug kubelet and container runtime integration failures by analyzing systemd service states, journalctl logs, and CRI socket configurations.
- Implement safe node recovery and maintenance procedures, including cordoning, draining, and component restarts while respecting workload disruption budgets.
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”- Diagnose worker node NotReady status by checking kubelet, container runtime, and network
- Fix kubelet failures caused by configuration errors, certificate expiry, and resource pressure
- Recover a node from disk pressure, memory pressure, and PID pressure conditions
- Drain and cordon nodes safely during maintenance while respecting PodDisruptionBudgets
Did You Know?
Section titled “Did You Know?”- 10-second heartbeats: The kubelet reports its node status to the API server every 10 seconds. If 40 seconds pass without a heartbeat, the node is marked
Unknown. - 5-minute eviction threshold: By default, pods running on a
NotReadynode are tolerated for exactly 300 seconds (5 minutes) before the control plane initiates eviction. - 15 percent disk threshold: The kubelet automatically triggers
DiskPressureand begins garbage collecting unused container images when the node’s root filesystem drops below 15% available space. - 65536 PID limit: In many default Linux distributions configured for Kubernetes, the
pid_maxlimit is historically set to 32768, which can easily be exhausted by rogue microservices, causingPIDPressure.
Part 1: Node Status Overview
Section titled “Part 1: Node Status Overview”Before diving into the command line, it is critical to understand how Kubernetes thinks about node health. The Kubernetes control plane does not actively poll the worker nodes; instead, it relies on a push-based mechanism. The kubelet agent running on each worker node is responsible for periodically evaluating the node’s health and pushing a status update (a heartbeat) back to the API server. The Node Controller, running inside the kube-controller-manager on the control plane, monitors these heartbeats. If the heartbeats stop, or if the kubelet explicitly reports a problem, the Node Controller changes the node’s status to reflect the failure.
The node’s status is expressed through a set of Node Conditions. These conditions are boolean flags that describe specific aspects of the node’s health.
┌──────────────────────────────────────────────────────────────┐│ NODE CONDITIONS ││ ││ Condition Healthy Meaning ││ ───────────────────────────────────────────────────────── ││ Ready True Node is healthy, can run pods ││ MemoryPressure False Memory is sufficient ││ DiskPressure False Disk space is sufficient ││ PIDPressure False Process IDs are available ││ NetworkUnavailable False Network is configured ││ ││ Any unhealthy condition → scheduling problems ││ Ready=False or Unknown → node is NotReady ││ │└──────────────────────────────────────────────────────────────┘Architecturally, we can visualize the relationship between these conditions and the overall node readiness as follows:
graph TD A[Node Conditions] --> B{Ready?} B -->|True| C[Healthy, can run pods] B -->|False/Unknown| D[NotReady, scheduling problems] A --> E[Resource Pressures] E --> F[MemoryPressure] E --> G[DiskPressure] E --> H[PIDPressure] E --> I[NetworkUnavailable] F -.->|True| D G -.->|True| D H -.->|True| D I -.->|True| DTo diagnose a node, you must first interrogate the API server to see what it believes the node’s state is. You can start with a broad overview and then drill down into the specific conditions of a problematic node.
# Quick statusk get nodes
# Detailed conditionsk describe node <node-name> | grep -A 10 Conditions
# All nodes with conditionsk get nodes -o custom-columns='NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,REASON:.status.conditions[?(@.type=="Ready")].reason'
# Check for resource pressurek describe node <node-name> | grep -E "MemoryPressure|DiskPressure|PIDPressure"When you query the node status, you will typically see one of the following states. Understanding the distinction between NotReady and Unknown is particularly important for troubleshooting.
| Status | Meaning | Common Causes |
|---|---|---|
| Ready | Healthy and accepting pods | Normal operation |
| NotReady | Unhealthy | kubelet down, network issues |
| Unknown | No heartbeat received | Node unreachable, kubelet crashed |
| SchedulingDisabled | Cordoned | Manual cordon or maintenance |
Stop and think: If a node transitions to the
Unknownstate, does that mean the applications running on it have crashed? Think about the separation of concerns between the control plane and the runtime before moving on.
Part 2: kubelet Troubleshooting
Section titled “Part 2: kubelet Troubleshooting”The kubelet is the most critical Kubernetes component running on a worker node. It is the primary node agent, the direct representative of the control plane on the factory floor. If the kubelet is not functioning, the node is effectively severed from the cluster, regardless of whether the physical server is perfectly healthy.
┌──────────────────────────────────────────────────────────────┐│ KUBELET RESPONSIBILITIES ││ ││ ┌──────────────────────────────────────────────────────┐ ││ │ kubelet │ ││ │ │ ││ │ • Registers node with API server │ ││ │ • Watches for pod assignments │ ││ │ • Manages container lifecycle (via runtime) │ ││ │ • Reports node/pod status │ ││ │ • Handles probes (liveness, readiness) │ ││ │ • Mounts volumes │ ││ │ • Runs static pods │ ││ │ │ ││ └──────────────────────────────────────────────────────┘ ││ ││ If kubelet fails → Node goes NotReady → Pods stop working ││ │└──────────────────────────────────────────────────────────────┘Here is a structural view of the kubelet’s responsibilities and the consequences of its failure:
flowchart TD K[kubelet] --> R[Registers node with API server] K --> W[Watches for pod assignments] K --> M[Manages container lifecycle via runtime] K --> S[Reports node/pod status] K --> H[Handles probes: liveness, readiness] K --> V[Mounts volumes] K --> P[Runs static pods] K -.->|Fails| N[Node goes NotReady] N -.->|Result| X[Pods stop working or face eviction]When a node is NotReady, your very first step should be to bypass Kubernetes entirely, SSH directly into the affected node, and check the health of the kubelet service using the Linux system manager, systemd.
# SSH to the node firstssh <node-name>
# Check kubelet service statussudo systemctl status kubelet
# Check if kubelet is runningps aux | grep kubelet
# Check kubelet logssudo journalctl -u kubelet -f
# Check recent kubelet errorssudo journalctl -u kubelet --since "10 minutes ago" | grep -i errorBased on the output of the commands above, you can categorize the failure into one of several common buckets.
| Issue | Symptom | Diagnosis | Fix |
|---|---|---|---|
| kubelet stopped | Node NotReady | systemctl status kubelet | systemctl start kubelet |
| kubelet crash loop | Node flapping | journalctl -u kubelet | Fix config, check logs |
| Wrong config | Can’t start | Error in logs | Fix /var/lib/kubelet/config.yaml |
| Can’t reach API | NotReady | Network timeout in logs | Check network, firewall |
| Certificate issues | TLS errors | Cert errors in logs | Renew certs |
| Container runtime down | Can’t create pods | Runtime errors | Fix containerd/docker |
If the kubelet is simply stopped (perhaps due to an accidental administrative command or an abrupt system restart where the service wasn’t enabled), starting it is straightforward:
# Start kubeletsudo systemctl start kubelet
# Enable on bootsudo systemctl enable kubelet
# Check statussudo systemctl status kubeletMore often, the kubelet is in a crash loop due to a configuration error. The kubelet’s configuration is typically split between a YAML file and a set of systemd drop-in arguments. A typo in either will prevent the daemon from starting.
# Check kubelet config filecat /var/lib/kubelet/config.yaml
# Check kubelet flagscat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# After fixing config, reload and restartsudo systemctl daemon-reloadsudo systemctl restart kubeletAnother pernicious issue is certificate expiration. The kubelet authenticates to the API server using client certificates. If these expire (usually after one year in a kubeadm-provisioned cluster), the API server will reject the kubelet’s heartbeats, and the node will drop offline silently.
# Check certificate pathscat /var/lib/kubelet/config.yaml | grep -i cert
# Verify certificates existls -la /var/lib/kubelet/pki/
# For expired certs, may need to rejoin node# On control plane: kubeadm token create --print-join-command# On worker: kubeadm reset && kubeadm join ...Part 3: Container Runtime Troubleshooting
Section titled “Part 3: Container Runtime Troubleshooting”The kubelet does not actually run containers itself; it delegates that responsibility to a Container Runtime via the Container Runtime Interface (CRI). If the container runtime crashes, hangs, or corrupts its local storage, the kubelet will be unable to spin up new pods or retrieve the status of existing ones.
┌──────────────────────────────────────────────────────────────┐│ CONTAINER RUNTIME STACK ││ ││ kubelet ││ │ ││ │ CRI (Container Runtime Interface) ││ ▼ ││ containerd (or docker, cri-o) ││ │ ││ │ OCI (Open Container Initiative) ││ ▼ ││ runc (low-level runtime) ││ │ ││ ▼ ││ Linux kernel (cgroups, namespaces) ││ │└──────────────────────────────────────────────────────────────┘The flow of instructions down the stack looks like this:
flowchart TD K[kubelet] -->|CRI - gRPC via unix socket| C[containerd / cri-o] C -->|OCI - JSON spec| R[runc / crun - low-level runtime] R -->|System Calls| L[Linux kernel: cgroups, namespaces]To troubleshoot the runtime, we use crictl, a CLI tool specifically designed for CRI-compatible runtimes. It is invaluable because it allows you to inspect the state of containers directly on the node without needing the Kubernetes API server to be reachable.
# Check containerd (most common)sudo systemctl status containerdsudo crictl info
# Check container runtime socketls -la /run/containerd/containerd.sock
# List containers with crictlsudo crictl ps
# List imagessudo crictl imagesRuntime issues often manifest as pods stuck in the ContainerCreating state, or as cryptic CRI integration errors inside the kubelet logs.
| Issue | Symptom | Diagnosis | Fix |
|---|---|---|---|
| containerd stopped | Pods ContainerCreating | systemctl status containerd | systemctl start containerd |
| Socket missing | kubelet errors | Check socket path | Restart containerd |
| Disk full | Container create fails | df -h | Clean up disk |
| Image pull fails | ImagePullBackOff | Check registry access | Fix registry auth |
| Resource exhausted | Random container failures | Check cgroups | Increase resources |
If containerd has crashed, restarting it via systemd is the immediate remediation:
# Start containerdsudo systemctl start containerd
# Check statussudo systemctl status containerd
# Check logs for issuessudo journalctl -u containerd --since "10 minutes ago"If you need to dig deeper into why a specific container is failing to start, configuring and using crictl is your best path forward. Ensure crictl knows where your CRI socket is located by writing a quick config file.
# Configure crictl for containerdcat <<EOF | sudo tee /etc/crictl.yamlruntime-endpoint: unix:///run/containerd/containerd.sockimage-endpoint: unix:///run/containerd/containerd.socktimeout: 10debug: falseEOF
# List all containers (including stopped)sudo crictl ps -a
# Get container logssudo crictl logs <container-id>
# Inspect containersudo crictl inspect <container-id>Part 4: Node Resource Exhaustion
Section titled “Part 4: Node Resource Exhaustion”Worker nodes possess finite physical resources. When a node begins to run out of memory, disk space, or process IDs, the kubelet detects this via cAdvisor (Container Advisor, which is embedded in the kubelet) and asserts a resource pressure condition.
┌──────────────────────────────────────────────────────────────┐│ RESOURCE PRESSURE TYPES ││ ││ MEMORY PRESSURE ││ • Available memory below threshold ││ • Triggers pod eviction ││ • Check: free -m, cat /proc/meminfo ││ ││ DISK PRESSURE ││ • Disk usage above threshold ││ • Triggers image garbage collection ││ • Check: df -h ││ ││ PID PRESSURE ││ • Process IDs exhausted ││ • Can't fork new processes ││ • Check: cat /proc/sys/kernel/pid_max ││ ││ When any pressure is True, node may not accept new pods ││ │└──────────────────────────────────────────────────────────────┘mindmap root((Resource Pressure)) Memory Available memory below threshold Triggers pod eviction Check: free -m Disk Usage above threshold Triggers image GC Check: df -h PID Process IDs exhausted Cannot fork processes Check: pid_maxWhen diagnosing resource exhaustion, you must check both the Kubernetes API’s view of the node and the raw operating system metrics.
# Check node conditionsk describe node <node> | grep -A 10 Conditions
# On the node - check memoryfree -mcat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable"
# Check diskdf -hdu -sh /var/lib/containerd/* # Container storagedu -sh /var/log/* # Log storage
# Check PIDscat /proc/sys/kernel/pid_maxps aux | wc -lThe kubelet determines when a node is under pressure based on configured eviction thresholds. These are defined in the kubelet’s configuration YAML.
evictionHard: memory.available: "100Mi" nodefs.available: "10%" nodefs.inodesFree: "5%" imagefs.available: "15%When these thresholds are crossed, the kubelet acts defensively:
- Node condition set to True (e.g.,
MemoryPressure=True). - The scheduler stops assigning new pods to the node.
- The kubelet begins evicting existing pods to reclaim resources, starting with
BestEffortpods.
Pause and predict: If a runaway pod is consuming all the memory on a node, and the kubelet decides to evict it, what happens to the pod’s data if it was using an
emptyDirvolume? Think about the ephemeral nature of local storage before proceeding.
To fix memory pressure, you need to identify the culprit and intervene:
# Find memory-hungry processesps aux --sort=-%mem | head -20
# Find pods using most memoryk top pods -A --sort-by=memory
# Options:# 1. Kill unnecessary processes# 2. Evict low-priority pods# 3. Add more memory to nodeFor disk pressure, the solution is aggressive cleanup. A node with a completely full disk will often completely freeze up, requiring a hard reboot.
# Find large filessudo find / -type f -size +100M -exec ls -lh {} \;
# Clean up container imagessudo crictl rmi --prune
# Clean up old logssudo journalctl --vacuum-time=3d
# Clean up unused containerssudo crictl rm $(sudo crictl ps -a -q --state exited)PID pressure is an insidious problem. The Linux kernel limits the maximum number of Process IDs that can exist simultaneously. If a container forks processes rapidly without cleaning them up (a fork bomb), the node will hit its PID limit, preventing any new processes (including basic shell commands) from running.
# Check current PID limitcat /proc/sys/kernel/pid_max
# Increase limit temporarilyecho 65536 | sudo tee /proc/sys/kernel/pid_max
# Find processes by countps aux | awk '{print $1}' | sort | uniq -c | sort -rn | headPart 5: Node Network Issues
Section titled “Part 5: Node Network Issues”Even if the kubelet is healthy and resources are abundant, a node must have robust network connectivity to function within the cluster. It must be able to reach the API server to send heartbeats, reach other nodes for overlay networking, and reach container registries to pull images.
┌──────────────────────────────────────────────────────────────┐│ NODE NETWORK REQUIREMENTS ││ ││ Node needs connectivity to: ││ ┌─────────────────────────────────────────────────────┐ ││ │ API Server (port 6443) - Required │ ││ │ Other nodes (varies) - For pod networking │ ││ │ DNS servers (port 53) - For name resolution │ ││ │ Registry (port 443) - For pulling images │ ││ └─────────────────────────────────────────────────────┘ ││ ││ Network failures → Node NotReady or Unknown ││ │└──────────────────────────────────────────────────────────────┘flowchart LR Node -->|port 6443| API[API Server] Node -->|varies based on CNI| Nodes[Other Nodes] Node -->|port 53| DNS[DNS Servers] Node -->|port 443| Reg[Container Registry] style API stroke:#f66,stroke-width:2pxWhen diagnosing a network partition, use standard Linux networking tools directly from the affected worker node to trace the connection failure.
# Check basic connectivityping <api-server-ip>
# Check API server reachabilitycurl -k https://<api-server>:6443/healthz
# Check DNSnslookup kubernetes.default.svc.cluster.localcat /etc/resolv.conf
# Check firewallsudo iptables -L -nsudo firewall-cmd --list-all # If using firewalld
# Check network interfacesip addrip routeCommon network issues range from aggressive firewall rules dropping packets to asymmetric routing configurations causing silent timeouts.
| Issue | Symptom | Diagnosis | Fix |
|---|---|---|---|
| Firewall blocking | Can’t reach API | telnet api-server 6443 | Open firewall ports |
| DNS failure | Name resolution fails | nslookup | Fix /etc/resolv.conf |
| IP address change | Node NotReady | Check IP in node spec | Reconfigure or rejoin |
| CNI plugin issues | Pod networking fails | Check CNI pods | Restart CNI, fix config |
| MTU mismatch | Intermittent failures | Check MTU settings | Align MTU values |
Familiarize yourself with the default ports required for Kubernetes components to communicate securely.
| Port | Protocol | Component | Purpose |
|---|---|---|---|
| 6443 | TCP | API Server | Kubernetes API |
| 10250 | TCP | kubelet | kubelet API |
| 10259 | TCP | kube-scheduler | Scheduler metrics |
| 10257 | TCP | kube-controller-manager | Controller metrics |
| 2379-2380 | TCP | etcd | Client and peer |
| 30000-32767 | TCP | NodePort | Service NodePorts |
Part 6: Node Recovery Procedures
Section titled “Part 6: Node Recovery Procedures”When you have exhausted your troubleshooting options and need to perform deep maintenance on a node, you must follow safe recovery procedures. A chaotic recovery approach causes more downtime than the initial failure.
┌──────────────────────────────────────────────────────────────┐│ NODE RECOVERY DECISION TREE ││ ││ Node NotReady? ││ │ ││ ├── Can SSH to node? ││ │ │ ││ │ ├── YES → Check kubelet, runtime, network ││ │ │ ││ │ └── NO → Check physical/VM, cloud console ││ │ ││ ├── kubelet running? ││ │ │ ││ │ ├── YES → Check logs, certs, API connectivity ││ │ │ ││ │ └── NO → Start kubelet ││ │ ││ └── Still NotReady after fixes? ││ │ ││ └── Drain and rejoin node ││ │└──────────────────────────────────────────────────────────────┘flowchart TD A{Node NotReady?} -->|Yes| B{Can SSH to node?} B -->|YES| C{kubelet running?} B -->|NO| D[Check physical/VM, cloud console] C -->|YES| E[Check logs, certs, API connectivity] C -->|NO| F[Start kubelet] E --> G{Still NotReady after fixes?} F --> G G -->|Yes| H[Drain and rejoin node]Before rebooting a node or ripping out its configuration, you must safely remove the workloads it is hosting. The cordon command marks the node as unschedulable, and the drain command safely evicts all running pods (respecting PodDisruptionBudgets and graceful termination periods).
# Drain node (evicts pods safely)k drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Cordon only (prevent new pods)k cordon <node-name>
# Uncordon (allow scheduling again)k uncordon <node-name>If the node’s local configuration is entirely corrupted (e.g., messed up certificates or network configurations), the fastest path to recovery is often to wipe the node’s Kubernetes state and rejoin it to the cluster from scratch.
# On the worker nodesudo kubeadm reset
# On control plane - generate new join tokenkubeadm token create --print-join-command
# On worker - rejoinsudo kubeadm join <api-server>:6443 --token <token> --discovery-token-ca-cert-hash <hash>If a node has suffered catastrophic hardware failure and will never return, you must clean it out of the API server to prevent the cluster from waiting for it forever.
# Drain firstk drain <node> --ignore-daemonsets --delete-emptydir-data
# Delete node from clusterk delete node <node-name>
# On the node itselfsudo kubeadm resetCommon Mistakes
Section titled “Common Mistakes”When troubleshooting worker nodes, panic often leads to rushed commands that exacerbate the problem. Avoid these common pitfalls:
| Mistake | Why | Fix |
|---|---|---|
| Not checking kubelet first | Miss obvious issue | Always start with systemctl status kubelet |
| Ignoring node conditions | Miss resource pressure | Check all conditions, not just Ready |
| Deleting node before drain | Pod disruption | Always drain before delete |
| Forgetting DaemonSet pods | Drain fails | Use --ignore-daemonsets |
| Not checking runtime | Blame kubelet | Check containerd status too |
| Ignoring disk usage | Node degradation | Monitor disk, clean regularly |
| Restarting without reload | Changes to systemd drop-ins do not take effect | Always run systemctl daemon-reload before restart |
| Skipping CNI checks | Assume node is broken when only pod networking is down | Verify CNI binary paths and configurations |
Q1: Node Heartbeat
Section titled “Q1: Node Heartbeat”You receive an alert that a production worker node has transitioned to the Unknown state. The node was perfectly healthy a minute ago. Based on the Kubernetes node controller architecture, how many consecutive seconds of missed heartbeats does this state represent, and what happens next?
Answer
It represents 40 seconds of missed heartbeats. The kubelet sends heartbeats to the API server every 10 seconds. After 4 missed heartbeats (40s), the node-controller marks the node as Unknown. After 5 minutes (by default) in this state, pods on that node are forcefully scheduled for eviction to ensure application availability.
Q2: kubelet vs Static Pods
Section titled “Q2: kubelet vs Static Pods”You discover a critical control plane component is failing on a master node, but when you run crictl ps, you see the container still running. You then check systemctl status kubelet and see it has crashed. What is the key difference between how the kubelet and control plane components run that explains this?
Answer
The kubelet runs as a systemd service directly on the host OS, while control plane components (API server, scheduler, etc.) run as static pods managed by the kubelet. If the kubelet crashes, the container runtime (containerd) keeps the existing static pods running independently. However, the kubelet is no longer around to report their status or apply updates, meaning Kubernetes loses management visibility over them.
Q3: MemoryPressure
Section titled “Q3: MemoryPressure”A node in your cluster is marked with MemoryPressure=True. A developer complains that their newly deployed pod is stuck in the Pending state. How does the node’s condition directly affect the scheduler’s behavior regarding new workloads?
Answer
New pods will not be scheduled to this node. The kube-scheduler actively filters out nodes that have resource pressure conditions set to True during its predicate evaluation phase. Additionally, the kubelet on that node will begin actively evicting existing pods to free up memory, starting with BestEffort pods, to prevent the entire node from freezing.
Q4: crictl vs kubectl
Section titled “Q4: crictl vs kubectl”During a severe API server outage, you need to inspect the logs of a failing ingress controller pod on a worker node. kubectl commands are timing out globally. What is the most effective approach to retrieve these logs?
Answer
You should use crictl directly on the worker node. Use crictl ps to find the container ID, and crictl logs <container-id> to view the output. Because crictl communicates directly with the container runtime (containerd) over the local Unix socket, it bypasses the Kubernetes API entirely, allowing you to debug even when the control plane is unreachable.
Q5: Drain vs Cordon
Section titled “Q5: Drain vs Cordon”You need to perform emergency kernel patching on a worker node. You execute kubectl cordon <node-name>. Five minutes later, you notice that all the original pods are still running on the node, blocking your maintenance window. What operational misunderstanding caused this delay?
Answer
The cordon command only marks the node as unschedulable (preventing new pods from arriving); it does not stop or move existing workloads. To properly clear a node for maintenance, you must use the drain command. Draining will cordon the node AND safely evict all existing pods (except DaemonSets, which you ignore with a flag) while respecting PodDisruptionBudgets.
Q6: Container Runtime Socket
Section titled “Q6: Container Runtime Socket”A worker node’s kubelet fails to start, throwing errors in journalctl about missing files in /run/containerd/. You suspect the container runtime interface socket is unavailable. Where exactly should you look to verify the socket’s existence?
Answer
You should check for the socket at /run/containerd/containerd.sock. This Unix socket is created and listened on by the containerd systemd service. If it is missing, it typically means containerd has crashed or failed to start, which subsequently prevents the kubelet from initializing because it cannot connect via the CRI.
Q7: Certificate Expiration
Section titled “Q7: Certificate Expiration”You notice a node flipping between Ready and NotReady states every few minutes. Upon inspecting the kubelet logs, you see repeated TLS handshake timeouts. The cluster was provisioned exactly one year ago. What is the most highly probable root cause of this flapping behavior?
Answer
The most highly probable root cause is an expired kubelet client certificate. By default, kubeadm-provisioned clusters issue kubelet client certificates with a one-year validity period. When the certificate expires, the kubelet can no longer authenticate with the API server to send heartbeats, causing total communication failure until the certificate is rotated or the node is rejoined.
Hands-On Exercise: Node Troubleshooting Simulation
Section titled “Hands-On Exercise: Node Troubleshooting Simulation”Scenario
Section titled “Scenario”You are the on-call engineer. Monitoring has alerted you that a critical worker node is experiencing intermittent instability and resource spikes. You need to log into the environment, systematically diagnose the health of the node, inspect its core services, evaluate its resources, and safely prepare it for maintenance.
Prerequisites
Section titled “Prerequisites”- Access to a Kubernetes cluster
- SSH access to at least one worker node
Task 1: Node Health Assessment
Section titled “Task 1: Node Health Assessment”Begin by evaluating the cluster-wide state from the perspective of the control plane. Identify the node you want to investigate.
Solution
# Check all nodesk get nodes -o wide
# Get detailed node informationk describe node <node-name>
# Check node conditions specificallyk get node <node-name> -o jsonpath='{.status.conditions[*].type}' | tr ' ' '\n'Task 2: kubelet Investigation
Section titled “Task 2: kubelet Investigation”Assume the node is showing signs of distress. SSH directly into the node and interrogate the primary agent.
Solution
# SSH to a worker nodessh <node>
# Check kubelet statussudo systemctl status kubelet
# View recent kubelet logssudo journalctl -u kubelet --since "5 minutes ago" | tail -50
# Check kubelet configurationcat /var/lib/kubelet/config.yaml | head -30Task 3: Container Runtime Check
Section titled “Task 3: Container Runtime Check”The kubelet relies entirely on the container runtime. Verify that containerd is healthy and properly managing containers.
Solution
# Check containerd statussudo systemctl status containerd
# List running containerssudo crictl ps
# Check container runtime infosudo crictl info
# List images on nodesudo crictl imagesTask 4: Resource Assessment
Section titled “Task 4: Resource Assessment”The node is healthy at the service level, but it might be starving for resources. Check the physical resource consumption.
Solution
# Check memoryfree -m
# Check diskdf -h
# Check what's using resourcesk top node <node-name>
# See allocated resourcesk describe node <node-name> | grep -A 10 "Allocated resources"Task 5: Cordon and Uncordon (Safe)
Section titled “Task 5: Cordon and Uncordon (Safe)”You have decided the node needs a reboot to clear a suspected memory leak. Safely cordon the node and verify that the scheduler respects your command.
Solution
# Cordon a node (prevents new scheduling)k cordon <node-name>
# Verify it's unschedulablek get node <node-name>
# Try to schedule a podk run test-pod --image=nginxk get pods test-pod -o wide # Should NOT be on cordoned node
# Uncordonk uncordon <node-name>
# Cleanupk delete pod test-podSuccess Criteria
Section titled “Success Criteria”- Checked node conditions for all nodes using jsonpath.
- Verified kubelet is running and inspected the systemd logs.
- Verified containerd is running and used crictl to list images.
- Assessed node resource usage at both the OS and cluster levels.
- Successfully cordoned a node, tested scheduler avoidance, and uncordoned it.
Cleanup
Section titled “Cleanup”Ensure the node is uncordoned and the test pod is deleted after completing the exercise.
Practice Drills
Section titled “Practice Drills”Develop muscle memory for node troubleshooting by executing these rapid-fire drills.
Drill 1: Node Status Check (30 sec)
Section titled “Drill 1: Node Status Check (30 sec)”# Task: List all nodes with their statusk get nodesDrill 2: Node Conditions (1 min)
Section titled “Drill 2: Node Conditions (1 min)”# Task: Check all conditions for a specific nodek describe node <node> | grep -A 10 ConditionsDrill 3: kubelet Status (30 sec)
Section titled “Drill 3: kubelet Status (30 sec)”# Task: Check if kubelet is running (on node)sudo systemctl status kubeletDrill 4: kubelet Logs (1 min)
Section titled “Drill 4: kubelet Logs (1 min)”# Task: View last 20 lines of kubelet logssudo journalctl -u kubelet -n 20Drill 5: Container Runtime Status (30 sec)
Section titled “Drill 5: Container Runtime Status (30 sec)”# Task: Check containerd and list containerssudo systemctl status containerdsudo crictl psDrill 6: Resource Usage (1 min)
Section titled “Drill 6: Resource Usage (1 min)”# Task: Check node resource usagek top nodesk describe node <node> | grep -A 5 "Allocated resources"Drill 7: Drain Node (1 min)
Section titled “Drill 7: Drain Node (1 min)”# Task: Safely drain a nodek drain <node> --ignore-daemonsets --delete-emptydir-dataDrill 8: Disk Usage (30 sec)
Section titled “Drill 8: Disk Usage (30 sec)”# Task: Check disk usage on nodedf -hdu -sh /var/lib/containerd/Next Module
Section titled “Next Module”Continue to Module 5.5: Network Troubleshooting to learn how to diagnose and fix pod-to-pod, pod-to-service, and external connectivity issues that plague distributed systems.