Skip to content

Module 0.1: Cluster Setup

Hands-On Lab Available
K8s Cluster beginner 30 min
Launch Lab ↗

Opens in Killercoda in a new tab

Complexity: [MEDIUM] - Takes time but straightforward if you follow steps

Time to Complete: 45-60 minutes (first time), 15-20 minutes (once familiar)

Prerequisites: Two or more machines (physical, VMs, or cloud instances)


After this module, you will be able to:

  • Build a multi-node kubeadm cluster from scratch (control plane + 2 workers)
  • Diagnose a node stuck in NotReady by checking kubelet, CNI, and system pod health
  • Recover from cluster failures (missing scheduler, crashed worker, expired token)
  • Explain the kubeadm init/join workflow and what each component does during bootstrap

You can’t practice Kubernetes administration without a Kubernetes cluster. Sounds obvious, right? Yet many CKA candidates rely entirely on managed clusters (EKS, GKE, AKS) or single-node setups (minikube, kind) and then freeze when the exam asks them to troubleshoot kubelet on a worker node.

The CKA exam runs on kubeadm-provisioned clusters. Not managed Kubernetes. Not Docker Desktop. Real kubeadm clusters with separate control plane and worker nodes.

This module teaches you to build exactly what you’ll encounter in the exam.

The Orchestra Analogy

Think of a Kubernetes cluster like an orchestra. The control plane is the conductor—it doesn’t play any instruments (run your apps), but it coordinates everything: who plays when, how loud, when to start and stop. The worker nodes are the musicians—they do the actual work of producing music (running containers). Without a conductor, you have chaos. Without musicians, you have silence. You need both, working together, communicating constantly.


┌─────────────────────────────────────────────────────────────────┐
│ Your Practice Cluster │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ cp-node │ │ worker-01 │ │ worker-02 │ │
│ │ (control) │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ │ • API Server│ │ • kubelet │ │ • kubelet │ │
│ │ • etcd │ │ • kube-proxy│ │ • kube-proxy│ │
│ │ • scheduler │ │ • containerd│ │ • containerd│ │
│ │ • ctrl-mgr │ │ │ │ │ │
│ │ • containerd│ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └───────────────────┴───────────────────┘ │
│ Pod Network (Calico) │
│ │
└─────────────────────────────────────────────────────────────────┘

1 control plane node + 2 worker nodes = realistic cluster for CKA practice.


You need 3 machines. Here are your options:

OptionProsConsCost
VMs on Mac (UTM/Parallels)Local, no network issuesResource heavyFree (UTM)
VMs on Linux (KVM/libvirt)Native performanceLinux host requiredFree
Cloud VMs (AWS/GCP/Azure)Closest to exam environmentCosts money~$0.10/hr
Bare metalBest performanceNeed hardwareExisting
Raspberry Pi clusterFun project, low powerARM quirks~$200
ResourceControl PlaneWorker
CPU2 cores2 cores
RAM2 GB2 GB
Disk20 GB20 GB
OSUbuntu 22.04 LTSUbuntu 22.04 LTS

Did You Know?

The CKA exam environment uses Ubuntu-based nodes. While Kubernetes runs on many distributions, practicing on Ubuntu means fewer surprises on exam day.


Run these steps on ALL THREE nodes (control plane AND workers).

On each node, set a meaningful hostname:

Terminal window
# On control plane node
sudo hostnamectl set-hostname cp-node
# On first worker
sudo hostnamectl set-hostname worker-01
# On second worker
sudo hostnamectl set-hostname worker-02

Add all nodes to /etc/hosts on EACH machine:

Terminal window
sudo tee -a /etc/hosts << EOF
192.168.1.10 cp-node
192.168.1.11 worker-01
192.168.1.12 worker-02
EOF

Replace the IPs with your actual node IPs.

Kubernetes requires swap to be disabled. This is non-negotiable.

Terminal window
# Disable swap immediately
sudo swapoff -a
# Disable swap permanently (survives reboot)
sudo sed -i '/ swap / s/^/#/' /etc/fstab

War Story: The Mysterious OOMKill

A team spent days debugging why their pods kept getting OOMKilled despite having plenty of memory. The culprit? Swap was enabled. When the kubelet reported memory to the scheduler, it didn’t account for swap, leading to over-scheduling and eventual memory pressure. Kubernetes doesn’t manage swap—it expects you to disable it.

Kubernetes networking requires specific kernel modules:

Terminal window
# Load modules now
sudo modprobe overlay
sudo modprobe br_netfilter
# Ensure they load on boot
cat << EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

Enable IP forwarding and bridge netfilter:

Terminal window
cat << EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply immediately
sudo sysctl --system

Starting with Kubernetes 1.35, cgroup v1 support is disabled by default. Your nodes must run cgroup v2 or the kubelet will fail to start.

Terminal window
# Check cgroup version (must show "cgroup2fs")
stat -fc %T /sys/fs/cgroup
# Expected output: cgroup2fs
# If it shows "tmpfs", you're on cgroup v1 — you need a newer OS
# Affected: CentOS 7, RHEL 7, Ubuntu 18.04
# Supported: Ubuntu 22.04+, Debian 12+, RHEL 9+, Rocky 9+

Breaking Change Alert: If stat -fc %T /sys/fs/cgroup returns tmpfs instead of cgroup2fs, upgrade your OS before proceeding. Kubernetes 1.35 will not start on cgroup v1 nodes.

Kubernetes needs a container runtime. containerd 2.0+ is required (1.35 is the last version supporting containerd 1.x):

Terminal window
# Install containerd (ensure version 2.0+)
sudo apt-get update
sudo apt-get install -y containerd
# Verify version
containerd --version
# Should be 2.0.0 or later
# Create default config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# Enable systemd cgroup driver (IMPORTANT!)
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# Restart containerd
sudo systemctl restart containerd
sudo systemctl enable containerd

Gotcha: SystemdCgroup

If you skip setting SystemdCgroup = true, you’ll get cryptic errors later. The kubelet and containerd must agree on the cgroup driver. Modern systems use systemd. Don’t miss this step.

Gotcha: containerd 2.0 and old images

containerd 2.0 removes support for Docker Schema 1 images. If you have very old images (pushed 5+ years ago), they will fail to pull. Rebuild or re-push them with a modern Docker/buildkit.

Terminal window
# Install dependencies
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
# Add Kubernetes repository key
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# Add Kubernetes repository
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
# Install components
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
# Prevent automatic updates (version consistency matters)
sudo apt-mark hold kubelet kubeadm kubectl

Run on each node:

Terminal window
# Check containerd
sudo systemctl status containerd
# Check kubelet (will be inactive until cluster is initialized)
sudo systemctl status kubelet
# Check kubeadm version
kubeadm version

Stop and think: You’ve installed kubelet, containerd, and kubeadm, but systemctl status kubelet shows it is activating/crashlooping. Why is this expected behavior right now? The kubelet is constantly restarting because it’s looking for its configuration file (/var/lib/kubelet/config.yaml), which won’t exist until kubeadm init or kubeadm join is run.


Run these steps ONLY on the control plane node (cp-node).

Terminal window
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint=cp-node:6443

This takes 2-3 minutes. When complete, you’ll see output like:

Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join cp-node:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:...

SAVE THE JOIN COMMAND! You’ll need it for the workers.

As a regular user (not root):

Terminal window
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Terminal window
kubectl get nodes

Output:

NAME STATUS ROLES AGE VERSION
cp-node NotReady control-plane 1m v1.35.0

The node shows NotReady because we haven’t installed a network plugin yet.

Pause and predict: Why would a freshly initialized Kubernetes node be NotReady? It has an API server, etcd, scheduler, and controller manager — all running. What’s missing? The answer: without a CNI (network plugin), pods can’t get IP addresses, and the node can’t report as healthy. This is the #1 “gotcha” for first-time kubeadm users, and it’s a common CKA troubleshooting scenario.


Kubernetes doesn’t come with networking. You must install a CNI plugin. We’ll use Calico (widely used, exam-friendly).

Why Doesn’t Kubernetes Include Networking?

This surprises everyone at first. Kubernetes made a deliberate choice to define a networking model (every pod gets an IP, pods can reach each other) but not implement it. Why? Because networking needs vary wildly—some need advanced policies, some need high performance, some need cloud integration. By using the CNI (Container Network Interface) standard, Kubernetes lets you choose. Calico, Flannel, Cilium, Weave—they all implement the same interface but with different superpowers. It’s like USB: the standard defines how to connect, but you choose your device.

On the control plane node:

Terminal window
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

Wait for Calico pods to be ready:

Terminal window
kubectl get pods -n kube-system -w

After 1-2 minutes, check node status:

Terminal window
kubectl get nodes

Output:

NAME STATUS ROLES AGE VERSION
cp-node Ready control-plane 5m v1.35.0

Ready! The control plane is operational.


Run these steps on EACH worker node (worker-01 and worker-02).

Use the join command from kubeadm init output:

Terminal window
sudo kubeadm join cp-node:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:...

Gotcha: Token Expired?

Tokens expire after 24 hours. If your token expired:

Terminal window
# On control plane, generate new token
kubeadm token create --print-join-command

On the control plane node:

Terminal window
kubectl get nodes

Output:

NAME STATUS ROLES AGE VERSION
cp-node Ready control-plane 10m v1.35.0
worker-01 Ready <none> 2m v1.35.0
worker-02 Ready <none> 1m v1.35.0

All nodes Ready! Your cluster is operational.

War Story: The Phantom Node

An engineer once spent an hour trying to figure out why their “3-node cluster” only had 2 nodes showing. They ran kubeadm join on all three machines. Turns out, they ran it on the control plane node by mistake (instead of a worker), which silently failed because that node was already in the cluster. The lesson: always verify which node you’re SSH’d into before running commands. The hostname in your terminal prompt is your friend.

Section titled “4.3 Label Worker Nodes (Optional but Recommended)”
Terminal window
kubectl label node worker-01 node-role.kubernetes.io/worker=
kubectl label node worker-02 node-role.kubernetes.io/worker=

Now kubectl get nodes shows:

NAME STATUS ROLES AGE VERSION
cp-node Ready control-plane 10m v1.35.0
worker-01 Ready worker 3m v1.35.0
worker-02 Ready worker 2m v1.35.0

Run these tests to confirm everything works:

Terminal window
kubectl create deployment nginx --image=nginx --replicas=3
kubectl expose deployment nginx --port=80 --type=NodePort
Terminal window
kubectl get pods -o wide

You should see pods running on different worker nodes:

NAME READY STATUS NODE
nginx-77b4fdf86c-abc12 1/1 Running worker-01
nginx-77b4fdf86c-def34 1/1 Running worker-02
nginx-77b4fdf86c-ghi56 1/1 Running worker-01
Terminal window
# Get NodePort
kubectl get svc nginx
# Test from any node
curl http://worker-01:<nodeport>
Terminal window
kubectl delete deployment nginx
kubectl delete svc nginx

Quick Reference: Commands You’ll Use Often

Section titled “Quick Reference: Commands You’ll Use Often”
Terminal window
# Check cluster status
kubectl cluster-info
kubectl get nodes
kubectl get pods -A
# Check component health
kubectl get componentstatuses # deprecated but still works
# SSH to nodes for troubleshooting
ssh worker-01 "sudo systemctl status kubelet"
ssh worker-01 "sudo journalctl -u kubelet -f"
# Reset a node (start over)
sudo kubeadm reset

  • kubeadm was created specifically to make cluster setup straightforward. Before kubeadm, setting up Kubernetes involved manually generating certificates, writing systemd unit files, and configuring each component by hand. Some people still do this (“Kubernetes the Hard Way”) for learning, but kubeadm is the production standard.

  • The CKA exam uses kubeadm clusters. You won’t see managed Kubernetes (EKS/GKE/AKS) on the exam. Everything is kubeadm-based, which is why practicing on kubeadm matters.

  • containerd replaced Docker as the default container runtime in Kubernetes 1.24. Docker still works (via cri-dockerd), but containerd is simpler and what you’ll encounter in the exam.


ProblemCauseSolution
kubelet keeps restartingSwap enabledsudo swapoff -a
Nodes stuck in NotReadyNo CNI installedInstall Calico/Flannel
kubeadm init hangsFirewall blocking portsOpen ports 6443, 10250
Token expiredTokens last 24hkubeadm token create --print-join-command
connection refused to APIWrong kubeconfigCheck ~/.kube/config

  1. Scenario: Your team is provisioning new bare-metal servers for a Kubernetes cluster. A systems engineer suggests leaving 16GB of swap space enabled to prevent out-of-memory kernel panics. You advise against this. Why must swap be disabled for the kubelet to function correctly?

    Answer Kubernetes expects to manage memory allocation directly and definitively for all scheduled pods. When swap is enabled, the underlying operating system can silently move memory pages to disk, completely blinding the kubelet to the true memory utilization of the node. This breaks the Kubernetes scheduler's ability to make accurate placement decisions and guarantees, leading to severe performance degradation and unpredictable out-of-memory (OOM) behavior. If the kubelet detects swap is enabled without explicit overrides, it will immediately crash to prevent the cluster from entering this degraded state.
  2. Scenario: You are initializing a new cluster with kubeadm init and plan to use Flannel for your CNI. A colleague asks why you are explicitly defining --pod-network-cidr=10.244.0.0/16 instead of just running kubeadm init without flags. What is the technical reason for providing this flag?

    Answer The control plane needs to know which IP addresses are reserved for pods so it doesn't assign overlapping addresses to different nodes. The `--pod-network-cidr` flag reserves a massive block of IPs for the entire cluster, which the Kubernetes controller manager then carves up into smaller subnets (/24 blocks) for each individual node. The Container Network Interface (CNI) plugin, like Flannel or Calico, reads this configuration to know exactly which IP addresses it is legally allowed to assign to the pods running on that specific host. Without this flag, the CNI wouldn't know the network boundaries and pod-to-pod routing would fail.
  3. Scenario: You successfully joined worker-02 to the cluster using the kubeadm token. However, 15 minutes later, kubectl get nodes still shows worker-02 with a status of NotReady. You verify the kubelet is running on the node. What is the most likely architectural component missing or failing?

    Answer The most likely cause is that a Container Network Interface (CNI) plugin has not been properly deployed, or its pods are crashing. When a kubelet starts up, it checks for a valid CNI configuration file in `/etc/cni/net.d/`. If this configuration is missing, the kubelet intentionally marks the node as `NotReady` because it physically cannot assign IP addresses to any pods scheduled there. You must apply a CNI manifest (like Calico) to the cluster, which will deploy DaemonSet pods to configure the network on each node and transition them to a `Ready` state.
  4. Scenario: Three days after creating your cluster, you decide to scale out by adding a new worker node. You SSH into the new machine, install containerd and kubelet, but realize you didn’t save the original kubeadm join output. How do you generate the exact command and token needed to authenticate this new node to the API server?

    Answer You must run `kubeadm token create --print-join-command` on the control plane node. Bootstrap tokens generated during `kubeadm init` have a hardcoded security lifespan of exactly 24 hours to prevent unauthorized machines from joining the cluster if the token is leaked. Because three days have passed, the original token has expired and been purged from etcd. This command generates a fresh, cryptographically secure token and immediately outputs the full `kubeadm join` string, complete with the API server endpoint and the required CA certificate hash for secure mutual TLS authentication.

Task: Build your practice cluster following this guide.

Success Criteria:

  • 3 nodes showing Ready in kubectl get nodes
  • Calico pods running in kube-system namespace
  • Can deploy a pod and have it scheduled to a worker node
  • Can SSH to worker and check kubelet status

Verification:

Terminal window
# All nodes ready?
kubectl get nodes | grep -c "Ready" # Should output: 3
# Calico running?
kubectl get pods -n kube-system | grep calico
# Pods scheduling to workers?
kubectl run test --image=nginx
kubectl get pod test -o wide # Should show worker node
kubectl delete pod test

Before you drill: These drills simulate real CKA exam scenarios. Time yourself — the exam gives you ~5 minutes per question on average. If Drill 1 takes you 10 minutes now, that’s fine. By exam day, it should take 2.

Drill 1: Node Health Check (Target: 2 minutes)

Section titled “Drill 1: Node Health Check (Target: 2 minutes)”

Verify your cluster is healthy. Run these commands and confirm expected output:

Terminal window
# All nodes Ready?
kubectl get nodes
# Expected: 3 nodes, all STATUS=Ready
# All system pods running?
kubectl get pods -n kube-system | grep -v Running
# Expected: No output (all pods Running)
# Can schedule workloads?
kubectl run test --image=nginx --rm -it --restart=Never -- echo "Cluster healthy"
# Expected: "Cluster healthy" then pod deleted

Drill 2: Troubleshooting - Node NotReady (Target: 5 minutes)

Section titled “Drill 2: Troubleshooting - Node NotReady (Target: 5 minutes)”

Scenario: Simulate a node going NotReady and fix it.

Terminal window
# On worker-01, stop kubelet
sudo systemctl stop kubelet
# On control plane, watch node status
kubectl get nodes -w
# Wait until worker-01 shows NotReady
# Diagnose the issue
kubectl describe node worker-01 | grep -A5 Conditions
# Fix: Restart kubelet on worker-01
sudo systemctl start kubelet
# Verify recovery
kubectl get nodes

What you learned: kubelet health directly affects node status.

Drill 3: Troubleshooting - CNI Failure (Target: 5 minutes)

Section titled “Drill 3: Troubleshooting - CNI Failure (Target: 5 minutes)”

Scenario: Pods stuck in ContainerCreating after CNI issues.

Terminal window
# Create a test pod
kubectl run cni-test --image=nginx
# Check status (should be Running if CNI works)
kubectl get pod cni-test
# If ContainerCreating, diagnose:
kubectl describe pod cni-test | grep -A10 Events
kubectl get pods -n kube-system | grep calico
# Common fix: Restart CNI pods
kubectl delete pods -n kube-system -l k8s-app=calico-node
# Cleanup
kubectl delete pod cni-test

Drill 4: Reset and Rebuild (Target: 15 minutes)

Section titled “Drill 4: Reset and Rebuild (Target: 15 minutes)”

Challenge: Practice cluster recovery by resetting a worker and rejoining.

Terminal window
# On worker-01: Reset the node
sudo kubeadm reset -f
sudo rm -rf /etc/cni/net.d
# On control plane: Remove the node
kubectl delete node worker-01
# On control plane: Generate new join command
kubeadm token create --print-join-command
# On worker-01: Rejoin
sudo kubeadm join <command-from-above>
# Verify
kubectl get nodes

Drill 5: Challenge - Add a Third Worker (Target: 20 minutes)

Section titled “Drill 5: Challenge - Add a Third Worker (Target: 20 minutes)”

No guidance provided. Using only what you learned in this module:

  1. Prepare a new VM with the same base setup
  2. Join it to the cluster as worker-03
  3. Verify it’s Ready and can schedule pods
  4. Label it with node-role.kubernetes.io/worker=
Hints (only if stuck)
  1. Run all Part 1 steps (1.1-1.8) on the new node
  2. Get join command: kubeadm token create --print-join-command
  3. Label: kubectl label node worker-03 node-role.kubernetes.io/worker=

Scenario: Your colleague broke something. Fix it.

Terminal window
# Setup: Run this to break the cluster (on control plane)
sudo mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
# Problem: New pods won't schedule
kubectl run broken-test --image=nginx
kubectl get pods # STATUS: Pending forever
# YOUR TASK: Figure out why and fix it
# Hint: Check control plane components
Solution
Terminal window
# Check what's running in kube-system
kubectl get pods -n kube-system
# Notice: No scheduler pod!
# Check manifest directory
ls /etc/kubernetes/manifests/
# Notice: kube-scheduler.yaml is missing
# Fix: Restore the scheduler
sudo mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
# Wait for scheduler to restart
kubectl get pods -n kube-system | grep scheduler
# Verify pod now schedules
kubectl get pods # Should transition to Running
kubectl delete pod broken-test

Module 0.2: Shell Mastery - Aliases, autocomplete, and shell optimization for speed.