Module 2.2: OS Provisioning & PXE Boot
Complexity:
[COMPLEX]| Time: 60 minutesPrerequisites: Module 2.1: Datacenter Fundamentals, Linux: Kernel Architecture
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Implement a PXE boot infrastructure with DHCP, TFTP, and HTTP servers for automated OS provisioning
- Configure kickstart/preseed/autoinstall files that produce consistent, repeatable node installations
- Deploy bare-metal servers from power-on to Kubernetes-ready state without manual intervention
- Troubleshoot PXE boot failures across DHCP relay, TFTP timeouts, and UEFI/BIOS compatibility issues
Why This Module Matters
Section titled “Why This Module Matters”When you buy 20 servers, they arrive as blank hardware — no operating system, no configuration, no identity. In the cloud, you click “Launch Instance” and an OS appears in 30 seconds. On-premises, you need to solve the bootstrapping problem: how do you install an OS on 20 servers that have no OS?
You could walk to each server with a USB stick. For 3 servers, this is annoying but workable. For 20, it is a full day of repetitive work. For 200, it is impossible. And every time you need to reprovision a node — after a disk failure, a security incident, or a Kubernetes version change — you would need to do it again.
PXE (Preboot Execution Environment) solves this by booting servers over the network. The server’s NIC downloads a boot image from a central server, installs the OS automatically, and the machine is ready to join your Kubernetes cluster — all without anyone touching it.
The Vending Machine Analogy
PXE is like a vending machine for operating systems. The server walks up (boots from network), identifies itself (MAC address), receives its order (DHCP offer + boot image), and gets its product (fully installed OS). The vending machine (PXE server) can serve hundreds of customers simultaneously. Compare this to a human (USB stick) who can only serve one customer at a time.
What You’ll Learn
Section titled “What You’ll Learn”- How PXE boot works (DHCP → TFTP → kernel → installer)
- Setting up a basic PXE server for Ubuntu/RHEL autoinstall
- MAAS (Metal as a Service) for fleet management
- Tinkerbell for cloud-native bare metal provisioning
- How to integrate PXE with Kubernetes cluster bootstrapping
How PXE Boot Works
Section titled “How PXE Boot Works”The Boot Sequence
Section titled “The Boot Sequence”┌─────────────────────────────────────────────────────────────┐│ PXE BOOT SEQUENCE ││ ││ 1. Server powers on (via BMC or button) ││ └── BIOS/UEFI starts POST (Power-On Self-Test) ││ ││ 2. BIOS tries boot devices in order: ││ └── PXE Network Boot (configured in BIOS boot order) ││ ││ 3. NIC broadcasts DHCP Discover ││ └── "I need an IP address and a boot file" ││ ││ 4. DHCP server responds with: ││ ├── IP address (10.0.1.50) ││ ├── Gateway, DNS ││ └── Next-server: 10.0.1.1 (TFTP server) ││ Filename: pxelinux.0 (boot loader) ││ ││ 5. NIC downloads boot loader via TFTP ││ └── pxelinux.0 or grubx64.efi (for UEFI) ││ ││ 6. Boot loader downloads kernel + initrd ││ └── vmlinuz + initrd.img via TFTP or HTTP ││ ││ 7. Kernel starts, runs installer (autoinstall/kickstart) ││ └── Downloads packages from HTTP repo ││ └── Partitions disks, installs OS ││ └── Runs post-install scripts (join K8s cluster) ││ ││ 8. Server reboots into installed OS ││ └── Ready for kubeadm join or Cluster API enrollment ││ ││ Total time: 5-15 minutes per server (parallel) ││ │└─────────────────────────────────────────────────────────────┘UEFI vs Legacy BIOS PXE
Section titled “UEFI vs Legacy BIOS PXE”| Aspect | Legacy BIOS PXE | UEFI PXE |
|---|---|---|
| Boot loader | pxelinux.0 (SYSLINUX) | grubx64.efi or shimx64.efi |
| Protocol | TFTP only | TFTP or HTTP (faster) |
| Secure Boot | Not supported | Supported (recommended) |
| Disk support | MBR (2TB limit) | GPT (no size limit) |
| Status | Legacy, being phased out | Current standard |
All modern servers use UEFI. If your servers support it (all enterprise servers from 2015+ do), use UEFI PXE. It supports Secure Boot, larger disks, and HTTP boot (faster than TFTP).
Setting Up a Basic PXE Server
Section titled “Setting Up a Basic PXE Server”Components Needed
Section titled “Components Needed”┌─────────────────────────────────────────────────────────────┐│ PXE SERVER COMPONENTS ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ DHCP │ │ TFTP │ │ HTTP │ │ Autoinstall│ ││ │ Server │ │ Server │ │ Server │ │ Config │ ││ │ │ │ │ │ │ │ │ ││ │ Assigns │ │ Serves │ │ Serves │ │ Answers │ ││ │ IPs + │ │ boot │ │ OS repo │ │ all │ ││ │ boot │ │ loader │ │ packages │ │ installer │ ││ │ filename │ │ + kernel │ │ │ │ questions │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ││ Can be one server or split across multiple ││ In practice: dnsmasq handles DHCP + TFTP in one process ││ │└─────────────────────────────────────────────────────────────┘Pause and predict: You have 20 new servers that just arrived in the datacenter. They have no operating system. You need them running Ubuntu with containerd by end of day. If you used USB sticks, how long would it take? What if a server fails next month and needs reprovisioning — how does PXE change that recovery time?
Quick PXE Server with dnsmasq
Section titled “Quick PXE Server with dnsmasq”We use dnsmasq rather than separate DHCP and TFTP servers because it handles both protocols in a single lightweight process. This simplifies the PXE infrastructure to a single daemon that manages IP assignment and boot file delivery:
# Install dnsmasq (handles DHCP + TFTP)apt-get install dnsmasq
# Create directory structuremkdir -p /srv/tftp/pxelinux.cfgmkdir -p /srv/http/ubuntu
# Download Ubuntu 22.04 server ISO and extractwget https://releases.ubuntu.com/22.04/ubuntu-22.04-live-server-amd64.isomount -o loop ubuntu-22.04-live-server-amd64.iso /mntcp -r /mnt/* /srv/http/ubuntu/umount /mnt
# Copy UEFI boot filescp /srv/http/ubuntu/casper/vmlinuz /srv/tftp/cp /srv/http/ubuntu/casper/initrd /srv/tftp/
# Configure dnsmasqcat > /etc/dnsmasq.d/pxe.conf << 'EOF'# DHCP range for PXE clientsdhcp-range=10.0.1.50,10.0.1.150,255.255.255.0,1h
# PXE boot optionsdhcp-boot=grubx64.efienable-tftptftp-root=/srv/tftp
# UEFI-specific bootdhcp-match=set:efi-x86_64,option:client-arch,7dhcp-boot=tag:efi-x86_64,grubx64.efiEOF
systemctl restart dnsmasqStop and think: The dnsmasq configuration above responds to any DHCP request on the PXE network with a boot image. What would happen if a production server accidentally rebooted with PXE as its first boot device? How would you prevent this?
Ubuntu Autoinstall Configuration
Section titled “Ubuntu Autoinstall Configuration”The autoinstall file below answers every question the Ubuntu installer would normally ask interactively. This is what makes the installation fully hands-off — from disk partitioning to user creation to Kubernetes prerequisite packages. The late-commands section runs after the OS is installed and configures the kernel modules and sysctl settings that Kubernetes requires:
#cloud-configautoinstall: version: 1 locale: en_US.UTF-8 keyboard: layout: us
# Network: DHCP on first interface network: version: 2 ethernets: id0: match: name: en* dhcp4: true
# Storage: entire first disk storage: layout: name: lvm sizing-policy: all
# Users identity: hostname: k8s-node username: kubedojo # password: "changeme" (hashed) password: "$6$rounds=4096$xyz$..."
# SSH ssh: install-server: true authorized-keys: - ssh-ed25519 AAAA... admin@kubedojo
# Packages for Kubernetes packages: - containerd - apt-transport-https - curl
# Post-install: prepare for K8s late-commands: - | cat > /target/etc/modules-load.d/k8s.conf << 'MODULES' overlay br_netfilter MODULES - | cat > /target/etc/sysctl.d/k8s.conf << 'SYSCTL' net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 SYSCTL # Disable swap - curtin in-target -- swapoff -a - curtin in-target -- sed -i '/swap/d' /etc/fstabMAAS (Metal as a Service)
Section titled “MAAS (Metal as a Service)”MAAS by Canonical provides a full lifecycle management platform for bare metal:
┌─────────────────────────────────────────────────────────────┐│ MAAS ARCHITECTURE ││ ││ ┌──────────────────────────────────────┐ ││ │ MAAS Region Controller │ ││ │ ┌──────────┐ ┌──────────┐ │ ││ │ │ REST API │ │ Web UI │ │ ││ │ └──────────┘ └──────────┘ │ ││ │ ┌──────────┐ ┌──────────┐ │ ││ │ │PostgreSQL│ │ Image │ │ ││ │ │ (state) │ │ Store │ │ ││ │ └──────────┘ └──────────┘ │ ││ └───────────────────┬──────────────────┘ ││ │ ││ ┌───────────────────▼──────────────────┐ ││ │ MAAS Rack Controller │ ││ │ ┌──────┐ ┌──────┐ ┌──────┐ │ ││ │ │ DHCP │ │ TFTP │ │ HTTP │ │ ││ │ └──────┘ └──────┘ └──────┘ │ ││ │ ┌──────┐ ┌──────┐ │ ││ │ │ DNS │ │ Proxy│ │ ││ │ └──────┘ └──────┘ │ ││ └───────────────────┬──────────────────┘ ││ │ ││ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ││ │Server │ │Server │ │Server │ │Server │ ││ │ 01 │ │ 02 │ │ 03 │ │ 04 │ ││ └───────┘ └───────┘ └───────┘ └───────┘ ││ ││ Machine States: ││ New → Commissioning → Ready → Deploying → Deployed ││ ↓ ││ Releasing → Ready (recycled) ││ │└─────────────────────────────────────────────────────────────┘MAAS Key Features
Section titled “MAAS Key Features”| Feature | Description |
|---|---|
| Discovery | Automatically detects new servers via DHCP |
| Commissioning | Inventories hardware (CPU, RAM, disks, NICs) |
| Deployment | Installs Ubuntu, CentOS, RHEL, or custom images |
| Networking | Manages VLANs, bonds, bridges, DNS |
| Storage | LVM, RAID, bcache configuration |
| API | Full REST API for automation |
| Juju integration | Deploy applications via Juju charms |
# Install MAAS (snap)sudo snap install maas --channel=3.4
# Initializesudo maas init region+rack \ --database-uri "postgres://maas:password@localhost/maas"
# Create admin usersudo maas createadmin \ --username admin \ --password secure-password \ --email admin@kubedojo.local
# Access web UI: http://maas-server:5240/MAAS/Tinkerbell: Cloud-Native Bare Metal
Section titled “Tinkerbell: Cloud-Native Bare Metal”Tinkerbell is a CNCF project for declarative bare metal provisioning — it treats hardware like Kubernetes treats pods:
┌─────────────────────────────────────────────────────────────┐│ TINKERBELL ARCHITECTURE ││ ││ ┌──────────────────────────────────────┐ ││ │ Tinkerbell Stack │ ││ │ │ ││ │ ┌──────────┐ Workflow engine │ ││ │ │ Tink │ Defines provisioning │ ││ │ │ Server │ steps as containers │ ││ │ └──────────┘ │ ││ │ ┌──────────┐ DHCP + PXE + OSIE │ ││ │ │ Boots │ Handles network boot │ ││ │ └──────────┘ │ ││ │ ┌──────────┐ Object storage │ ││ │ │ Hegel │ Metadata service │ ││ │ └──────────┘ (like cloud metadata) │ ││ └──────────────────────────────────────┘ ││ ││ Provisioning defined as Kubernetes CRDs: ││ - Hardware: describes physical machine ││ - Template: defines provisioning steps ││ - Workflow: links Hardware to Template ││ │└─────────────────────────────────────────────────────────────┘Pause and predict: Tinkerbell defines provisioning steps as container actions. How does this differ from a traditional kickstart/autoinstall approach? What advantage does containerized provisioning give you for reproducibility and testing?
Tinkerbell Workflow Example
Section titled “Tinkerbell Workflow Example”The Tinkerbell workflow below shows the declarative approach to provisioning. Each action is a container image that performs one step — streaming the OS image, writing configuration files, or setting up networking. Because these are standard OCI containers, you can test and version them independently:
# Hardware definition (like a cloud instance profile)apiVersion: tinkerbell.org/v1alpha1kind: Hardwaremetadata: name: worker-01spec: disks: - device: /dev/sda metadata: facility: plan_slug: "c3.small.x86" instance: hostname: k8s-worker-01 operating_system: slug: ubuntu_22_04 interfaces: - dhcp: mac: "aa:bb:cc:dd:ee:01" ip: address: 10.0.1.51 netmask: 255.255.255.0 gateway: 10.0.1.1---# Template (provisioning steps as container actions)apiVersion: tinkerbell.org/v1alpha1kind: Templatemetadata: name: ubuntu-k8sspec: data: | version: "0.1" name: ubuntu-k8s-install global_timeout: 1800 tasks: - name: os-installation worker: "{{.device_1}}" actions: - name: stream-ubuntu-image image: quay.io/tinkerbell-actions/image2disk:v1.0.0 timeout: 600 environment: IMG_URL: http://10.0.1.1/images/ubuntu-22.04.raw.gz DEST_DISK: /dev/sda COMPRESSED: true - name: install-containerd image: quay.io/tinkerbell-actions/writefile:v1.0.0 timeout: 90 environment: DEST_DISK: /dev/sda1 DEST_PATH: /etc/modules-load.d/k8s.conf CONTENTS: | overlay br_netfilter - name: configure-network image: quay.io/tinkerbell-actions/writefile:v1.0.0 timeout: 90 environment: DEST_DISK: /dev/sda1 DEST_PATH: /etc/netplan/01-netcfg.yaml CONTENTS: | network: version: 2 ethernets: eno1: addresses: [10.0.1.51/24] gateway4: 10.0.1.1 nameservers: addresses: [10.0.1.1]---# Workflow (connects hardware to template)apiVersion: tinkerbell.org/v1alpha1kind: Workflowmetadata: name: provision-worker-01spec: templateRef: ubuntu-k8s hardwareRef: worker-01Did You Know?
Section titled “Did You Know?”-
PXE was invented by Intel in 1999 as part of the Wired for Management specification. Over 25 years later, it remains the standard way to network-boot servers. The protocol has barely changed — it still uses TFTP, a protocol from 1981.
-
MAAS manages over 1 million machines worldwide according to Canonical. It was originally built for Ubuntu’s own infrastructure and later open-sourced. The largest known MAAS deployment manages 30,000+ servers.
-
HTTP Boot (UEFI) is replacing TFTP for PXE. UEFI firmware can download boot files via HTTP instead of TFTP, which is significantly faster (HTTP supports parallel downloads and larger block sizes). Most servers from 2020+ support HTTP Boot.
-
Tinkerbell was created by Equinix Metal (formerly Packet) to manage their bare metal cloud fleet. It is now a CNCF Sandbox project and is used by Spectro Cloud, Platform9, and other bare metal Kubernetes providers.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| PXE on production VLAN | Any new server auto-installs, destroying data | Isolate PXE DHCP to a dedicated provisioning VLAN |
| No DHCP relay | Servers in other VLANs cannot PXE boot | Configure DHCP relay (ip helper-address) on switches |
| TFTP through firewall | TFTP uses random UDP ports; firewalls block it | Use HTTP Boot or configure firewall for TFTP passive mode |
| Static autoinstall | Every server gets identical config | Template with MAC-based customization (hostname, IP, role) |
| No post-install validation | Server installs but has wrong config | Add verification step (check containerd, sysctl, hostname) |
| Skipping Secure Boot | Anyone can PXE boot malicious images | Enable UEFI Secure Boot with signed boot chain |
| Manual BIOS config | Each server has different BIOS settings | Use Redfish API to configure BIOS programmatically |
| Not backing up PXE server | PXE server dies = cannot provision new nodes | Replicate PXE config via Git; have a standby PXE server |
Question 1
Section titled “Question 1”A server is PXE booting but hangs at “Waiting for DHCP…” for 60 seconds. What are the most likely causes?
Answer
In order of likelihood:
-
DHCP server not running or not configured for the PXE VLAN. Check
systemctl status dnsmasqand verify the DHCP range covers the server’s subnet. -
VLAN mismatch: The server’s port is in a different VLAN than the DHCP server. Check the switch port VLAN assignment and ensure DHCP relay is configured if they are on different VLANs.
-
Firewall blocking DHCP: UDP ports 67/68 must be open between the server and the DHCP server.
-
DHCP range exhausted: All IPs in the range are leased. Check
cat /var/lib/misc/dnsmasq.leasesfor active leases. -
NIC boot order: The server is trying to PXE boot from the wrong NIC (e.g., the BMC NIC instead of the production NIC). Check BIOS boot order.
Debug:
# On the DHCP server, watch for DHCP requests:tcpdump -i eth0 -n port 67 or port 68# You should see DHCPDISCOVER from the server's MACQuestion 2
Section titled “Question 2”Why should PXE provisioning use a dedicated VLAN instead of the production network?
Answer
Safety and security:
-
Accidental reimaging: If a production server reboots and its BIOS has PXE as the first boot device, it will DHCP discover and potentially start a fresh OS install — wiping the existing OS and all data. A dedicated provisioning VLAN prevents this because production servers are not on the PXE VLAN.
-
DHCP conflicts: PXE requires a DHCP server with
next-serverandfilenameoptions. Running this on the production VLAN risks conflicting with the production DHCP server, causing IP assignment issues for existing infrastructure. -
Attack surface: The PXE server can push arbitrary OS images to any server that PXE boots. An attacker on the PXE VLAN could install a compromised OS on any server that reboots with PXE enabled.
Best practice:
- Provisioning VLAN (e.g., VLAN 100) — only new/reprovisioning servers
- Production VLAN (e.g., VLAN 200) — running K8s nodes
- After OS installation, the server’s network config switches to the production VLAN
Question 3
Section titled “Question 3”Compare MAAS and Tinkerbell. When would you choose each?
Answer
MAAS:
- Full lifecycle management with Web UI
- Ubuntu-centric (best for Ubuntu/RHEL)
- Includes networking, DNS, DHCP, storage management
- PostgreSQL backend, monolithic architecture
- Best for: organizations standardizing on Ubuntu, teams that want a GUI, environments where MAAS manages the full network stack
Tinkerbell:
- Kubernetes-native (CRD-based, declarative)
- OS-agnostic (streams raw disk images)
- Lightweight, microservices architecture
- Integrates with Cluster API (Sidero/Metal3)
- Best for: GitOps-driven environments, teams already running Kubernetes, integration with Cluster API for declarative cluster lifecycle
Decision guide:
- If you want a “cloud-like” bare metal experience with a UI → MAAS
- If you want declarative, Kubernetes-native provisioning → Tinkerbell
- If you are using Cluster API for cluster lifecycle → Tinkerbell + Sidero
Question 4
Section titled “Question 4”Your organization requires Secure Boot for all production servers. How does this affect PXE provisioning?
Answer
With Secure Boot enabled, the UEFI firmware verifies the cryptographic signature of every boot component. Only signed code can execute:
-
Boot loader must be signed: Use
shimx64.efi(signed by Microsoft’s UEFI CA) which then loadsgrubx64.efi(signed by your distribution’s key). -
Kernel must be signed: Ubuntu and RHEL ship signed kernels. Custom kernels must be enrolled in the Secure Boot database (MOK — Machine Owner Key).
-
initrd is not signed but is loaded by the signed kernel, which verifies it.
-
TFTP path changes: Instead of
pxelinux.0(unsigned BIOS loader), useshimx64.efi→grubx64.efi→ signed kernel.
Impact on provisioning:
- Cannot boot unsigned custom images
- Must use distribution-provided boot chain
- Custom kernels require MOK enrollment (manual step per server, or automated via
mokutil) - Increases security but adds complexity to the boot chain
For Talos Linux and Flatcar (covered in Module 2.3), both provide Secure Boot-compatible images.
Hands-On Exercise: PXE Boot a Virtual Machine
Section titled “Hands-On Exercise: PXE Boot a Virtual Machine”Task: Set up a minimal PXE server and boot a VM from it.
Note: This exercise uses QEMU/KVM to simulate a PXE-booting bare metal server. No physical hardware required.
- Create the PXE server (Ubuntu host):
# Install dependenciessudo apt-get install -y dnsmasq nginx qemu-kvm
# Create TFTP directorysudo mkdir -p /srv/tftp
# Download Ubuntu 22.04 Live Server ISO and extract boot files# Note: Canonical removed debian-installer netboot images after 20.04.# Extract vmlinuz and initrd from the ISO's casper directory instead.wget https://releases.ubuntu.com/22.04/ubuntu-22.04-live-server-amd64.isomkdir -p /mnt/isosudo mount -o loop ubuntu-22.04-live-server-amd64.iso /mnt/isocp /mnt/iso/casper/vmlinuz /srv/tftp/vmlinuzcp /mnt/iso/casper/initrd /srv/tftp/initrdsudo umount /mnt/iso- Configure dnsmasq for PXE:
cat | sudo tee /etc/dnsmasq.d/pxe.conf << 'EOF'interface=virbr1dhcp-range=192.168.100.50,192.168.100.100,255.255.255.0,1hdhcp-boot=pxelinux.0enable-tftptftp-root=/srv/tftpEOF
# Create PXE menusudo mkdir -p /srv/tftp/pxelinux.cfgcat | sudo tee /srv/tftp/pxelinux.cfg/default << 'EOF'DEFAULT installLABEL install KERNEL vmlinuz APPEND initrd=initrdEOF
sudo systemctl restart dnsmasq- Boot a VM via PXE:
# Create a disk for the VMqemu-img create -f qcow2 /tmp/pxe-test.qcow2 20G
# Boot VM with PXE (network boot)qemu-system-x86_64 \ -m 2048 \ -boot n \ -net nic \ -net bridge,br=virbr1 \ -drive file=/tmp/pxe-test.qcow2,format=qcow2 \ -nographic- Watch the PXE boot process: You should see DHCP discovery, TFTP download, and the Ubuntu installer starting.
Success Criteria
Section titled “Success Criteria”- dnsmasq running with DHCP + TFTP
- VM successfully PXE boots from network
- DHCP lease visible in dnsmasq logs
- TFTP transfer visible (boot loader download)
- OS installer starts automatically
Next Module
Section titled “Next Module”Continue to Module 2.3: Immutable OS for Kubernetes to learn why Talos Linux and Flatcar Container Linux are better choices than traditional distributions for bare metal Kubernetes.