Module 1.1: Control Plane Deep-Dive
Complexity:
[MEDIUM]- Conceptual understanding requiredTime to Complete: 35-45 minutes
Prerequisites: Module 0.1 (working cluster)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Explain the role of each control plane component (API server, etcd, scheduler, controller manager) and how they interact
- Diagnose control plane failures by checking static pod manifests, component logs, and health endpoints
- Trace a kubectl request from client through API server to etcd and back
- Recover a control plane component by restoring its static pod manifest
Why This Module Matters
Section titled “Why This Module Matters”Every kubectl command you run talks to the control plane. Every pod that schedules, every service that routes traffic, every secret that stores credentials—it all happens because control plane components are working together.
When troubleshooting fails, when pods won’t schedule, when your cluster “just stops working”—you need to understand what’s actually running your cluster. The CKA exam tests this. Real-world incidents demand it.
This module takes you inside the machine.
The Air Traffic Control Analogy
Think of Kubernetes as an airport. The control plane is air traffic control—it doesn’t fly planes, but nothing flies without it. The API server is the control tower (single point of communication). The scheduler decides which runway (node) each plane (pod) uses. The controller manager monitors everything and calls for help when planes deviate from flight plans. etcd is the flight log—every decision recorded, every state tracked. Workers (nodes) are the runways where actual planes land.
What You’ll Learn
Section titled “What You’ll Learn”By the end of this module, you’ll understand:
- What each control plane component does (and doesn’t do)
- How they communicate with each other
- What happens when each component fails
- How to check component health
- Where component logs live
Part 1: The Control Plane Overview
Section titled “Part 1: The Control Plane Overview”1.1 Architecture at a Glance
Section titled “1.1 Architecture at a Glance”┌─────────────────────────────────────────────────────────────────────┐│ CONTROL PLANE ││ ┌─────────────┐ ┌─────────────┐ ┌──────────────────────────────┐ ││ │ API Server │ │ etcd │ │ Controller Manager │ ││ │ (kube-api) │◄─┤ (storage) │ │ ┌────────────────────────┐ │ ││ │ │ │ │ │ │ Deployment Controller │ │ ││ └──────┬──────┘ └─────────────┘ │ │ ReplicaSet Controller │ │ ││ │ │ │ Node Controller │ │ ││ │ ┌─────────────────────┐ │ │ Job Controller │ │ ││ │ │ Scheduler │ │ │ ... (40+ controllers) │ │ ││ │ │ (kube-scheduler) │ │ └────────────────────────┘ │ ││ │ └─────────────────────┘ └──────────────────────────────┘ │└─────────┼───────────────────────────────────────────────────────────┘ │ │ kubelet talks to API server ▼┌─────────────────────────────────────────────────────────────────────┐│ WORKER NODES ││ ┌─────────────────────────────────────────────────────────────────┐││ │ Node 1 Node 2 Node 3 │││ │ ┌─────────┐ ┌──────────┐ ┌─────────┐ ┌──────────┐ │││ │ │ kubelet │ │kube-proxy│ │ kubelet │ │kube-proxy│ ... │││ │ └─────────┘ └──────────┘ └─────────┘ └──────────┘ │││ │ ┌──────────────────────┐ ┌──────────────────────┐ │││ │ │ Container Runtime │ │ Container Runtime │ │││ │ │ (containerd) │ │ (containerd) │ │││ │ └──────────────────────┘ └──────────────────────┘ │││ └─────────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────────┘1.2 Control Plane vs. Worker Nodes
Section titled “1.2 Control Plane vs. Worker Nodes”| Component | Runs On | Purpose |
|---|---|---|
| kube-apiserver | Control plane | API gateway, all communication |
| etcd | Control plane | Cluster state storage |
| kube-scheduler | Control plane | Pod placement decisions |
| kube-controller-manager | Control plane | Reconciliation loops |
| kubelet | Every node | Container lifecycle |
| kube-proxy | Every node | Network rules |
| Container runtime | Every node | Actually runs containers |
Did You Know?
In production, etcd often runs on dedicated machines separate from other control plane components. A three-node etcd cluster can handle thousands of Kubernetes nodes. Google’s Borg (Kubernetes’ predecessor) inspired this separation.
Part 2: kube-apiserver - The Front Door
Section titled “Part 2: kube-apiserver - The Front Door”2.1 What It Does
Section titled “2.1 What It Does”The API server is the only component that talks directly to etcd. Everything else talks to the API server.
┌─────────────────────────────────────────────────────────────────┐│ All Roads Lead to API Server ││ ││ kubectl ────────┐ ││ Scheduler ──────┼────► kube-apiserver ◄───► etcd ││ Controllers ────┤ ││ kubelet ────────┤ ││ Dashboard ──────┘ ││ │└─────────────────────────────────────────────────────────────────┘Key responsibilities:
- Authenticate requests (who are you?)
- Authorize requests (can you do this?)
- Validate requests (is this valid YAML/JSON?)
- Persist to etcd (store the desired state)
- Serve as the cluster’s REST API
2.2 API Request Flow
Section titled “2.2 API Request Flow”When you run kubectl create -f pod.yaml:
1. kubectl → API Server: "Create this pod please"2. API Server: Authentication check ✓3. API Server: Authorization check ✓4. API Server: Admission controllers run5. API Server: Validation check ✓6. API Server → etcd: "Store this pod spec"7. API Server → kubectl: "Pod created (pending)"The pod doesn’t exist yet as a running container—it’s just stored in etcd. The scheduler and kubelet take it from there.
Pause and predict: If the API server is the only component that talks to etcd, what happens to the rest of the cluster when the API server goes down? Can existing pods keep running? Think about it before reading on.
2.3 Checking API Server Health
Section titled “2.3 Checking API Server Health”# Is the API server responding?kubectl cluster-info
# Check API server component status (legacy)kubectl get componentstatuses # Deprecated, may not work on all clusters
# Modern health endpoints (preferred)kubectl get --raw='/readyz?verbose'kubectl get --raw='/livez?verbose'
# Direct health endpointkubectl get --raw='/healthz'
# Detailed healthkubectl get --raw='/healthz?verbose'2.4 API Server Logs
Section titled “2.4 API Server Logs”# If running as a static pod (kubeadm setup)kubectl logs -n kube-system kube-apiserver-<control-plane-node>
# If running as systemd servicejournalctl -u kube-apiserver
# Static pod manifest locationcat /etc/kubernetes/manifests/kube-apiserver.yamlGotcha: API Server Unavailable
If the API server is down,
kubectlwon’t work at all. You’ll need to SSH into the control plane node and check logs directly withcrictlorjournalctl. This is a common CKA troubleshooting scenario.
Part 3: etcd - The Source of Truth
Section titled “Part 3: etcd - The Source of Truth”3.1 What It Does
Section titled “3.1 What It Does”etcd is a distributed key-value store. It holds all cluster state:
- All resource definitions (pods, services, secrets, etc.)
- Cluster configuration
- Current state of everything
If etcd loses data, your cluster loses its memory.
3.2 How Kubernetes Uses etcd
Section titled “3.2 How Kubernetes Uses etcd”Key format: /registry/<resource-type>/<namespace>/<name>
Examples:/registry/pods/default/nginx/registry/services/kube-system/kube-dns/registry/secrets/default/my-secret/registry/deployments/production/web-app3.3 etcd Architecture
Section titled “3.3 etcd Architecture”┌─────────────────────────────────────────────────────────────────┐│ etcd Cluster (Raft Consensus) ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ etcd-1 │◄────►│ etcd-2 │◄────►│ etcd-3 │ ││ │ (Leader) │ │(Follower)│ │(Follower)│ ││ └──────────┘ └──────────┘ └──────────┘ ││ ││ Writes go to leader, replicated to followers ││ Reads can go to any node ││ Survives loss of 1 node (quorum = 2/3) ││ │└─────────────────────────────────────────────────────────────────┘3.4 Checking etcd Health
Section titled “3.4 Checking etcd Health”# etcd member list (if you have etcdctl configured)ETCDCTL_API=3 etcdctl member list \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key
# Check etcd podkubectl get pods -n kube-system | grep etcdkubectl logs -n kube-system etcd-<control-plane-node>Did You Know?
etcd uses the Raft consensus algorithm. It requires a majority (quorum) to operate. A 3-node cluster tolerates 1 failure. A 5-node cluster tolerates 2 failures. This is why production clusters use odd numbers of etcd nodes.
War Story: The etcd Disk Full Incident
A team ran a cluster for months without monitoring etcd disk usage. etcd keeps a history of all changes (for watch operations). One day, etcd’s disk filled up. The entire cluster became read-only—no new pods, no updates, no deletes. The fix? Emergency disk cleanup and enabling etcd auto-compaction. They lost 4 hours of productivity because they didn’t monitor a 10GB disk.
Part 4: kube-scheduler - The Matchmaker
Section titled “Part 4: kube-scheduler - The Matchmaker”4.1 What It Does
Section titled “4.1 What It Does”The scheduler watches for pods with no assigned node and finds the best node for them.
┌────────────────────────────────────────────────────────────────┐│ Scheduling Process ││ ││ 1. New pod created (no nodeName) ─────────────────────┐ ││ │ ││ 2. Scheduler watches API server ▼ ││ "Any pods need scheduling?" ◄────────────────── Pod Queue ││ ││ 3. Filtering: Which nodes CAN run this pod? ││ - Enough CPU/memory? ││ - Taints/tolerations match? ││ - Node selectors match? ││ - Affinity rules satisfied? ││ ││ 4. Scoring: Which node is BEST? ││ - Resource balance ││ - Spreading across zones ││ - Custom priorities ││ ││ 5. Binding: Assign pod to winning node ││ Scheduler → API Server: "pod X goes to node Y" ││ │└────────────────────────────────────────────────────────────────┘Stop and think: A pod requests 4 CPU cores and 8Gi of memory, but no single node in your cluster has that much available. What state will the pod be in, and what message will you see in
kubectl describe pod?
4.2 Filtering vs. Scoring
Section titled “4.2 Filtering vs. Scoring”Filtering (hard constraints): “Can this node run the pod at all?”
- Does it have enough resources?
- Does it match nodeSelector?
- Does it tolerate the node’s taints?
- Does it satisfy affinity requirements?
Scoring (soft constraints): “Which eligible node is best?”
- Balance resource utilization
- Spread pods across failure domains
- Prefer nodes with image already pulled
- Custom scoring plugins
4.3 When Scheduling Fails
Section titled “4.3 When Scheduling Fails”# Pod stuck in Pendingkubectl describe pod <pod-name>
# Look for Events section:# "0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/control-plane: },# 2 node(s) didn't match Pod's node affinity/selector"Common reasons:
- Insufficient resources: No node has enough CPU/memory
- Taints not tolerated: Node has taints, pod lacks tolerations
- Affinity not satisfied: Pod requires specific node labels
- PVC not bound: Pod needs storage that doesn’t exist
4.4 Checking Scheduler Health
Section titled “4.4 Checking Scheduler Health”# Scheduler podkubectl get pods -n kube-system | grep schedulerkubectl logs -n kube-system kube-scheduler-<control-plane-node>
# Scheduler leader election (in HA setups)kubectl get endpoints kube-scheduler -n kube-system -o yamlPart 5: kube-controller-manager - The Reconciler
Section titled “Part 5: kube-controller-manager - The Reconciler”5.1 What It Does
Section titled “5.1 What It Does”The controller manager runs controllers—reconciliation loops that watch the current state and work toward the desired state.
┌────────────────────────────────────────────────────────────────┐│ Controller Loop Pattern ││ ││ ┌─────────────────┐ ││ │ Desired State │ ││ │ (in etcd) │ ││ └────────┬────────┘ ││ │ ││ Compare │ ││ ▼ ││ Is current state = desired state? ││ │ ││ ┌──────────────┴──────────────┐ ││ │ YES NO │ ││ ▼ ▼ ││ Do nothing Take action ││ (wait & watch) (create/delete/update) ││ │└────────────────────────────────────────────────────────────────┘5.2 Built-in Controllers
Section titled “5.2 Built-in Controllers”There are 40+ controllers. Key ones:
| Controller | Watches | Does |
|---|---|---|
| Deployment | Deployments | Creates/updates ReplicaSets |
| ReplicaSet | ReplicaSets | Ensures correct pod count |
| Node | Nodes | Monitors node health, evicts pods from dead nodes |
| Job | Jobs | Creates pods, tracks completion |
| Endpoint | Services, Pods | Updates Service endpoints |
| ServiceAccount | Namespaces | Creates default ServiceAccount |
| Namespace | Namespaces | Cleans up resources when namespace deleted |
What would happen if: The controller manager crashes but the API server, scheduler, and etcd are still running. Can you still create new pods manually? Can Deployments scale automatically? Think about which operations depend on the controller manager.
5.3 Example: ReplicaSet Controller
Section titled “5.3 Example: ReplicaSet Controller”# You create this:apiVersion: apps/v1kind: ReplicaSetmetadata: name: webspec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: nginx image: nginxController loop:1. Watch: "ReplicaSet 'web' wants 3 pods"2. Check: "How many pods with label 'app=web' exist?"3. Compare: "0 exist, 3 desired"4. Act: "Create 3 pods"5. Repeat forever...
Later:- Pod dies → Controller sees 2 pods → Creates 1 more- You scale to 5 → Controller sees 3 pods → Creates 2 more- You scale to 2 → Controller sees 5 pods → Deletes 35.4 Checking Controller Manager
Section titled “5.4 Checking Controller Manager”# Controller manager podkubectl get pods -n kube-system | grep controller-managerkubectl logs -n kube-system kube-controller-manager-<control-plane-node>
# Check for specific controller issues in logskubectl logs -n kube-system kube-controller-manager-<node> | grep -i "error\|failed"Gotcha: Controller Manager Down
If the controller manager stops, nothing actively breaks immediately—existing pods keep running. But nothing new happens: deployments won’t create pods, dead pods won’t be replaced, jobs won’t start. The cluster becomes “frozen.”
Part 6: Node Components
Section titled “Part 6: Node Components”6.1 kubelet - The Node Agent
Section titled “6.1 kubelet - The Node Agent”kubelet runs on every node (including control plane). It’s responsible for:
- Registering the node with the cluster
- Watching for pods assigned to its node
- Starting/stopping containers via the container runtime
- Reporting node and pod status back to API server
- Running liveness/readiness probes
# Check kubelet statussystemctl status kubelet
# kubelet logsjournalctl -u kubelet -f
# kubelet configurationcat /var/lib/kubelet/config.yaml6.2 kube-proxy - The Network Plumber
Section titled “6.2 kube-proxy - The Network Plumber”kube-proxy runs on every node. It maintains network rules so that Services work:
- Watches Services and Endpoints
- Creates iptables/IPVS rules to forward traffic
- Enables ClusterIP, NodePort, LoadBalancer services
# Check kube-proxykubectl get pods -n kube-system | grep kube-proxykubectl logs -n kube-system kube-proxy-<id>
# See iptables rules kube-proxy creatediptables -t nat -L KUBE-SERVICES6.3 Container Runtime
Section titled “6.3 Container Runtime”The actual software that runs containers. Kubernetes supports:
- containerd (most common, default in kubeadm)
- CRI-O (used by OpenShift)
- Docker (deprecated as of K8s 1.24, but images still work)
# Check containerdsystemctl status containerdcrictl ps # List running containerscrictl images # List imagesPart 7: Putting It All Together
Section titled “Part 7: Putting It All Together”7.1 What Happens When You Create a Deployment
Section titled “7.1 What Happens When You Create a Deployment”kubectl create deployment nginx --image=nginx --replicas=3┌─────────────────────────────────────────────────────────────────┐│ Timeline of Events │├─────────────────────────────────────────────────────────────────┤│ ││ 0ms kubectl → API Server: "Create Deployment nginx" ││ 5ms API Server → etcd: Store Deployment ││ ││ 10ms Deployment Controller sees new Deployment ││ 15ms Deployment Controller → API: "Create ReplicaSet" ││ 20ms API Server → etcd: Store ReplicaSet ││ ││ 25ms ReplicaSet Controller sees new ReplicaSet ││ 30ms ReplicaSet Controller → API: "Create Pod 1, 2, 3" ││ 35ms API Server → etcd: Store 3 Pods (Pending) ││ ││ 40ms Scheduler sees 3 unscheduled Pods ││ 50ms Scheduler → API: "Pod 1→node1, Pod 2→node2, Pod 3→node1" ││ 55ms API Server → etcd: Update Pods with nodeName ││ ││ 60ms kubelet on node1 sees 2 Pods assigned to it ││ 65ms kubelet on node2 sees 1 Pod assigned to it ││ 70ms kubelets → containerd: "Start nginx containers" ││ ││ 500ms Containers running ││ 505ms kubelets → API: "Pods are Running" ││ 510ms API Server → etcd: Update Pod status ││ ││ Done! kubectl get pods shows 3/3 Running ││ │└─────────────────────────────────────────────────────────────────┘Did You Know?
Section titled “Did You Know?”-
Static pods are special pods managed directly by kubelet, not the API server. Control plane components (API server, scheduler, controller manager, etcd) run as static pods in kubeadm clusters. Their manifests live in
/etc/kubernetes/manifests/. -
The API server is stateless. All state is in etcd. You can restart the API server and lose nothing. This is why Kubernetes is resilient.
-
Controllers use leader election in HA setups. Only one controller manager is active at a time, the others are on standby. Same for the scheduler.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Thinking kubelet runs in a pod | kubelet is a systemd service | Check with systemctl status kubelet |
| Ignoring etcd health | etcd issues cascade to everything | Monitor etcd metrics and disk |
| Not checking component logs | Miss root cause during troubleshooting | Always check logs in kube-system |
| Confusing control plane with worker | Different components, different issues | Know what runs where |
| Forgetting static pods | Can’t delete them with kubectl | Edit/delete manifest in /etc/kubernetes/manifests/ |
-
Your monitoring alert fires: “etcd latency exceeding 500ms.” Within minutes, developers report that
kubectlcommands are slow or timing out. Why does etcd latency affect kubectl, and which other cluster behaviors would degrade?Answer
The kube-apiserver is the only component that communicates directly with etcd, and every kubectl command goes through the API server. When etcd is slow, the API server blocks waiting for reads and writes, causing kubectl timeouts. Beyond kubectl, the scheduler cannot persist pod binding decisions, controllers cannot update resource status, and kubelets cannot report node conditions. Essentially, the entire control loop stalls because etcd is the single source of truth and all state changes must flow through it. -
A developer reports that their new Deployment shows
0/3replicas ready. You runkubectl get podsand see three pods stuck in Pending. The scheduler pod in kube-system is Running. What are the most likely causes, and how would you investigate?Answer
Even though the scheduler is running, Pending pods mean the scheduler cannot find a suitable node. Run `kubectl describe pod` and check the Events section for scheduling failure messages. The most likely causes are: insufficient CPU or memory on all nodes ("Insufficient cpu/memory"), taints on nodes that the pods don't tolerate (e.g., control-plane taint), node affinity or nodeSelector rules that no node satisfies, or unbound PersistentVolumeClaims. Check node capacity with `kubectl describe node` and compare against pod resource requests. -
During an incident, you discover the kube-controller-manager pod has been down for 10 minutes. Existing pods are still running and serving traffic. However, you notice a Deployment was scaled from 3 to 5 replicas 8 minutes ago, but only 3 pods exist. Explain why existing pods survived but the scale-up didn’t happen.
Answer
Existing pods continue running because the controller manager doesn't directly manage running containers — kubelet does that independently on each node. The controller manager runs reconciliation loops that compare desired state (in etcd) with current state. Without it, no controller is watching the Deployment to create a new ReplicaSet or watching the ReplicaSet to create additional pods. The scale-up was written to etcd via the API server, but the ReplicaSet controller wasn't running to act on it. Once the controller manager restarts, it will immediately detect the discrepancy (3 pods vs 5 desired) and create the missing 2 pods. -
A colleague accidentally deleted the file
/etc/kubernetes/manifests/kube-scheduler.yamlon the control plane node. You trykubectl delete pod kube-scheduler -n kube-systemto “restart” it, but nothing happens. What went wrong with this recovery approach, and what is the correct fix?Answer
Static pods are managed by kubelet directly, not by the API server. The pod you see in `kubectl get pods -n kube-system` is a "mirror pod" — a read-only representation. Deleting the mirror pod does nothing because kubelet is the actual manager, and without the manifest file, kubelet has nothing to run. The correct fix is to restore the manifest file to `/etc/kubernetes/manifests/kube-scheduler.yaml` (from a backup, another control plane node, or by recreating it). Once the file is back, kubelet detects it and automatically starts the scheduler pod. This is why backing up the manifests directory is critical.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Explore your cluster’s control plane components.
Steps:
- List all control plane pods:
kubectl get pods -n kube-system- Check component health:
kubectl get componentstatuseskubectl get --raw='/healthz?verbose'- View API server configuration:
# On control plane nodesudo cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep -A5 "command:"- Check scheduler logs for recent activity:
kubectl logs -n kube-system -l component=kube-scheduler --tail=20- Watch controller manager in action:
# Terminal 1: Watch controller logskubectl logs -n kube-system -l component=kube-controller-manager -f
# Terminal 2: Create and delete a deploymentkubectl create deployment test --image=nginx --replicas=2kubectl delete deployment test- Explore etcd (if available):
# On control plane node with etcdctlsudo ETCDCTL_API=3 etcdctl get /registry/namespaces/default \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.keySuccess Criteria:
- Can identify all control plane components and their pods
- Can check health of API server
- Can find and read control plane component logs
- Understand what each component does in pod creation
Cleanup:
# Remove test deployment if createdkubectl delete deployment test --ignore-not-foundPractice Drills
Section titled “Practice Drills”Drill 1: Component Identification Race (Target: 2 minutes)
Section titled “Drill 1: Component Identification Race (Target: 2 minutes)”Without looking at notes, identify which component handles each scenario:
| Scenario | Component |
|---|---|
| Stores all cluster state | ___ |
| Decides which node runs a new pod | ___ |
| Authenticates kubectl requests | ___ |
| Creates pods when ReplicaSet changes | ___ |
| Reports node status to control plane | ___ |
| Maintains iptables rules for Services | ___ |
Answers
- etcd
- kube-scheduler
- kube-apiserver
- kube-controller-manager (ReplicaSet controller)
- kubelet
- kube-proxy
Drill 2: Troubleshooting - Missing Scheduler (Target: 5 minutes)
Section titled “Drill 2: Troubleshooting - Missing Scheduler (Target: 5 minutes)”Scenario: Pods are stuck in Pending forever.
# Setup: Break the schedulersudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
# Create a test podkubectl run drill-pod --image=nginx
# Observe the problemkubectl get pods # Pending foreverkubectl describe pod drill-pod | grep -A5 Events
# YOUR TASK: Diagnose and fix# 1. What's missing?# 2. How do you restore it?Solution
# Check control plane podskubectl get pods -n kube-system | grep scheduler # Nothing!
# Restore schedulersudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
# Wait for scheduler and verifykubectl get pods -n kube-system | grep scheduler # Running!kubectl get pod drill-pod # Now Running
# Cleanupkubectl delete pod drill-podDrill 3: Troubleshooting - Controller Manager Down (Target: 5 minutes)
Section titled “Drill 3: Troubleshooting - Controller Manager Down (Target: 5 minutes)”Scenario: Deployments create ReplicaSets but pods never appear.
# Setupsudo mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
# Create deploymentkubectl create deployment drill-deploy --image=nginx --replicas=3
# Observekubectl get deploy # Shows 0/3 readykubectl get rs # ReplicaSet exists but...kubectl get pods # No pods!
# YOUR TASK: Diagnose and fixSolution
# Check controller managerkubectl get pods -n kube-system | grep controller # Nothing!
# Restore controller managersudo mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
# Watch pods appearkubectl get pods -w # 3 pods created
# Cleanupkubectl delete deployment drill-deployDrill 4: API Server Health Deep Dive (Target: 3 minutes)
Section titled “Drill 4: API Server Health Deep Dive (Target: 3 minutes)”Check API server health using multiple methods:
# Method 1: Basic connectivitykubectl cluster-info
# Method 2: Health endpointskubectl get --raw='/healthz'kubectl get --raw='/readyz'kubectl get --raw='/livez'
# Method 3: Detailed healthkubectl get --raw='/readyz?verbose' | grep -E "^\[|ok|failed"
# Method 4: Direct curl (from control plane)curl -k https://localhost:6443/healthz
# Method 5: Check API server logs for errorskubectl logs -n kube-system -l component=kube-apiserver --tail=20 | grep -i errorDrill 5: Watch the Reconciliation Loop (Target: 5 minutes)
Section titled “Drill 5: Watch the Reconciliation Loop (Target: 5 minutes)”See controllers in action:
# Terminal 1: Watch controller manager logskubectl logs -n kube-system -l component=kube-controller-manager -f | grep -i "replicaset\|deployment"
# Terminal 2: Create, scale, delete deploymentkubectl create deployment watch-me --image=nginx --replicas=2sleep 5kubectl scale deployment watch-me --replicas=5sleep 5kubectl delete deployment watch-me
# Observe logs in Terminal 1 - see the controller react to each changeDrill 6: etcd Exploration (Target: 5 minutes)
Section titled “Drill 6: etcd Exploration (Target: 5 minutes)”Explore what etcd stores (requires etcdctl on control plane):
# Set up etcdctl aliasexport ETCDCTL_API=3alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key'
# List all keys (be careful in production!)etcdctl get / --prefix --keys-only | head -50
# Find all podsetcdctl get /registry/pods --prefix --keys-only
# Get a specific pod's dataetcdctl get /registry/pods/default/<pod-name>Drill 7: Challenge - Full Control Plane Restart
Section titled “Drill 7: Challenge - Full Control Plane Restart”Advanced: Restart all control plane components and verify cluster recovery.
# WARNING: Only do this on practice clusters!
# 1. Note current statekubectl get nodeskubectl get pods -A | wc -l
# 2. Restart all control plane componentssudo systemctl restart kubelet# Static pods will restart automatically
# 3. Wait and verify recoverysleep 30kubectl get nodes # All Ready?kubectl get pods -n kube-system # All Running?
# 4. Test workload schedulingkubectl run recovery-test --image=nginxkubectl get pods # Running?kubectl delete pod recovery-testNext Module
Section titled “Next Module”Module 1.2: Extension Interfaces (CNI, CSI, CRI) - How Kubernetes plugs in networking, storage, and container runtimes.