Module 2.2: OS Provisioning & PXE Boot
Цей контент ще не доступний вашою мовою.
Complexity:
[COMPLEX]| Time: 60 minutesPrerequisites: Module 2.1: Datacenter Fundamentals, Linux: Kernel Architecture
Why This Module Matters
Section titled “Why This Module Matters”In August 2012, Knight Capital Group deployed a new software update to their SMARS high-frequency trading servers. Because the deployment and configuration process involved manual steps instead of an automated, strictly declarative provisioning pipeline, a technician missed one of the eight core servers. The stale code on that single server went rogue, causing erratic trading behavior that resulted in a staggering loss of $460 million in just 45 minutes, leading directly to the subsequent bankruptcy of the firm.
While Knight Capital’s failure manifested at the application layer, the root cause—manual, inconsistent server configuration—plagues infrastructure teams daily. When you purchase 20 bare-metal servers for a Kubernetes cluster, they arrive as completely blank hardware. On-premises, you must solve the fundamental bootstrapping problem: how do you install an operating system on 20 servers that have no OS, ensuring absolute consistency across the entire fleet?
You could walk to each server with a bootable USB stick. For three servers, this is annoying but workable. For 20 servers, it is a full day of repetitive, error-prone work. For 200 servers, it is impossible. Furthermore, every time you need to reprovision a node—after a hardware failure, a security incident, or a major Kubernetes version upgrade—you would need to perform this manual process again.
PXE (Preboot Execution Environment) solves this fundamental infrastructure challenge by booting servers directly over the network. The server’s Network Interface Card (NIC) downloads a boot image from a central provisioning server, automatically installs the operating system, applies declarative configurations, and prepares the machine to join your Kubernetes cluster—all without a single human touching the physical machine.
The Vending Machine Analogy
PXE operates like a vending machine for operating systems. The bare-metal server powers on (boots from the network), identifies itself via its MAC address, receives its specific order (DHCP offer plus a boot image), and receives its product (a fully installed OS). The vending machine (PXE server) can serve hundreds of customers simultaneously, whereas a human operator with a USB stick can only serve one customer at a time.
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Design a resilient PXE boot infrastructure using DHCP, TFTP, and UEFI HTTP Boot to automate bare-metal OS provisioning.
- Implement unattended installation configurations using cloud-init, autoinstall, or Ignition to bootstrap identical Kubernetes cluster nodes.
- Compare legacy BIOS and UEFI PXE bootloaders, evaluating the security benefits of modern bootloaders like iPXE and Secure Boot integrations.
- Evaluate bare-metal provisioning platforms such as MAAS, Metal3, and Tinkerbell for declarative lifecycle management.
- Diagnose common network boot failures by analyzing DHCP relay misconfigurations, TFTP timeouts, and firmware architecture mismatches.
Core Content 1: The Boot Sequence & PXE Fundamentals
Section titled “Core Content 1: The Boot Sequence & PXE Fundamentals”The canonical PXE specification is version 2.1, published by Intel on September 20, 1999; no newer official PXE specification has been published. Despite its age, it remains the backbone of datacenter automation. The traditional PXE boot process relies heavily on a sequence of network protocols working in tandem.
Pause and predict: You have 20 new servers that just arrived in the datacenter. They have no operating system. You need them running Ubuntu with containerd by end of day. If you used USB sticks, how long would it take? What if a server fails next month and needs reprovisioning — how does PXE change that recovery time?
When a bare-metal server first powers on and attempts to network boot, it broadcasts a DHCP Discover packet. However, a standard DHCP Offer containing just an IP address and a default gateway is insufficient for PXE. The DHCP server must provide specific PXE-related options to guide the bare-metal hardware.
DHCP option 66 (TFTP Server Name) and option 67 (Boot File Name) are used by PXE clients to locate the TFTP server and download the Network Bootstrap Program (NBP). Furthermore, RFC 4578 defines PXE-specific DHCP options including option 93 (Client System Architecture Type), which is used to differentiate BIOS from UEFI clients. This allows the DHCP server to dynamically assign the correct bootloader based on the hardware requesting it.
Once the DHCP lease is acquired, the server must download the bootloader. TFTP is defined by RFC 1350 (published July 1992) and remains the primary transport protocol for the legacy PXE network boot file transfer stage.
The Network Boot Sequence
Section titled “The Network Boot Sequence”Legacy ASCII representation (preserved for historical reference):
┌─────────────────────────────────────────────────────────────┐│ PXE BOOT SEQUENCE ││ ││ 1. Server powers on (via BMC or button) ││ └── BIOS/UEFI starts POST (Power-On Self-Test) ││ ││ 2. BIOS tries boot devices in order: ││ └── PXE Network Boot (configured in BIOS boot order) ││ ││ 3. NIC broadcasts DHCP Discover ││ └── "I need an IP address and a boot file" ││ ││ 4. DHCP server responds with: ││ ├── IP address (10.0.5.50) ││ ├── Gateway, DNS ││ └── Next-server: 10.0.5.1 (TFTP server) ││ Filename: pxelinux.0 (boot loader) ││ ││ 5. NIC downloads boot loader via TFTP ││ └── pxelinux.0 or grubx64.efi (for UEFI) ││ ││ 6. Boot loader downloads kernel + initrd ││ └── vmlinuz + initrd.img via TFTP or HTTP ││ ││ 7. Kernel starts, runs installer (autoinstall/kickstart) ││ └── Downloads packages from HTTP repo ││ └── Partitions disks, installs OS ││ └── Runs post-install scripts (join K8s cluster) ││ ││ 8. Server reboots into installed OS ││ └── Ready for kubeadm join or Cluster API enrollment ││ ││ Total time: 5-15 minutes per server (parallel) ││ │└─────────────────────────────────────────────────────────────┘Modern Architectural View:
sequenceDiagram participant S as Server (BIOS/UEFI) participant N as NIC participant D as DHCP Server participant T as TFTP/HTTP Server S->>S: 1. Power on & POST S->>N: 2. Try PXE Network Boot N->>D: 3. DHCP Discover (Need IP & boot file) D-->>N: 4. DHCP Offer (IP: 10.0.5.50, TFTP: 10.0.5.1, File: pxelinux.0) N->>T: 5. Download boot loader (TFTP/HTTP) T-->>N: pxelinux.0 or grubx64.efi N->>T: 6. Download kernel + initrd T-->>N: vmlinuz + initrd.img N->>S: 7. Start kernel & run installer S->>S: Download packages, partition disks S->>S: Run post-install scripts S->>S: 8. Reboot into installed OSEssential PXE Server Components
Section titled “Essential PXE Server Components”To implement this flow, you need several interconnected services.
Legacy ASCII representation:
┌─────────────────────────────────────────────────────────────┐│ PXE SERVER COMPONENTS ││ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ DHCP │ │ TFTP │ │ HTTP │ │ Autoinstall│ ││ │ Server │ │ Server │ │ Server │ │ Config │ ││ │ │ │ │ │ │ │ │ ││ │ Assigns │ │ Serves │ │ Serves │ │ Answers │ ││ │ IPs + │ │ boot │ │ OS repo │ │ all │ ││ │ boot │ │ loader │ │ packages │ │ installer │ ││ │ filename │ │ + kernel │ │ │ │ questions │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ││ Can be one server or split across multiple ││ In practice: dnsmasq handles DHCP + TFTP in one process ││ │└─────────────────────────────────────────────────────────────┘Modern Architectural View:
flowchart TD subgraph PXEServer[PXE SERVER COMPONENTS] direction LR A[DHCP Server<br/>Assigns IPs & boot filename] B[TFTP Server<br/>Serves boot loader & kernel] C[HTTP Server<br/>Serves OS repo packages] D[Autoinstall Config<br/>Answers installer questions] end note[Can be one server or split across multiple.<br/>In practice: dnsmasq handles DHCP + TFTP in one process.] PXEServer --- noteQuick PXE Server with dnsmasq
Section titled “Quick PXE Server with dnsmasq”We use dnsmasq rather than separate DHCP and TFTP servers because it handles both protocols elegantly in a single lightweight process. This simplifies the PXE infrastructure to a single daemon that manages IP assignment and boot file delivery:
# Install dnsmasq (handles DHCP + TFTP)sudo apt-get install -y dnsmasq
# Create directory structuresudo mkdir -p /srv/tftp/pxelinux.cfgsudo mkdir -p /srv/http/ubuntu
# Download Ubuntu 22.04 server ISO and extractwget https://releases.ubuntu.com/22.04/ubuntu-22.04-live-server-amd64.isosudo mount -o loop ubuntu-22.04-live-server-amd64.iso /mntsudo cp -r /mnt/* /srv/http/ubuntu/sudo umount /mnt
# Copy UEFI boot filessudo cp /srv/http/ubuntu/casper/vmlinuz /srv/tftp/sudo cp /srv/http/ubuntu/casper/initrd /srv/tftp/
# Configure dnsmasqcat | sudo tee /etc/dnsmasq.d/pxe.conf << 'EOF'# DHCP range for PXE clientsdhcp-range=10.0.5.50,10.0.5.150,255.255.255.0,1h
# PXE boot optionsdhcp-boot=grubx64.efienable-tftptftp-root=/srv/tftp
# UEFI-specific bootdhcp-match=set:efi-x86_64,option:client-arch,7dhcp-boot=tag:efi-x86_64,grubx64.efiEOF
sudo systemctl restart dnsmasqCore Content 2: UEFI vs Legacy BIOS PXE
Section titled “Core Content 2: UEFI vs Legacy BIOS PXE”As physical servers evolved, the limitations of Legacy BIOS became apparent. All enterprise servers from 2015 onwards support UEFI. Legacy BIOS is restricted to MBR partitioning (which has a 2TB limit) and lacks cryptographic verification mechanisms.
The current UEFI specification version is 2.11, released December 17, 2024. UEFI completely overhauls the boot process. One of the most significant changes is how network payloads are delivered. While older servers rely entirely on TFTP, modern UEFI firmware supports HTTP Boot. UEFI HTTP Boot allows firmware to fetch OS images directly from an HTTP server, eliminating the need for TFTP entirely. While secondary sources state that HTTP Boot was introduced in UEFI 2.5, no direct press release from the UEFI forum was retrieved in this research session to independently verify the exact version introduction; however, it remains a standard part of modern UEFI specifications.
It is critical to note that many current firmware implementations of UEFI HTTP Boot support only plain HTTP, not HTTPS, and do not follow HTTP redirects (301/302). You must configure your web servers accordingly.
For bootloaders, PXELINUX (part of the Syslinux project) is the traditional legacy-BIOS NBP; however, the Syslinux project is discontinued with no stable releases since 2019. Modern infrastructure uses GRUB2, which supports network booting for both legacy BIOS systems (core.0) and UEFI systems (core.efi) via the grub2-mknetdir command.
| Aspect | Legacy BIOS PXE | UEFI PXE |
|---|---|---|
| Boot loader | pxelinux.0 (SYSLINUX) | grubx64.efi or shimx64.efi |
| Protocol | TFTP only | TFTP or HTTP (faster) |
| Secure Boot | Not supported | Supported (recommended) |
| Disk support | MBR (2TB limit) | GPT (no size limit) |
| Status | Legacy, being phased out | Current standard |
War Story: The Legacy Transfer Bottleneck In a massive cluster deployment, an engineering team attempted to PXE boot 500 servers simultaneously. Because they were using legacy TFTP, which operates on a strict stop-and-wait mechanism (acknowledging every 512-byte block before sending the next), the network saturated with acknowledgments. Booting a single 1GB kernel took hours. The immediate mitigation was implementing the TFTP windowsize option (RFC 7440), which was published in January 2015 and substantially improves PXE boot transfer performance by allowing multiple data packets in flight. Eventually, they migrated to UEFI HTTP Boot to leverage TCP’s native windowing, dropping the boot time to under two minutes.
Core Content 3: Modern Bootloaders: iPXE and netboot.xyz
Section titled “Core Content 3: Modern Bootloaders: iPXE and netboot.xyz”Standard firmware PXE implementations are often limited in scope. To gain advanced scripting, protocol support, and cryptographic verification, administrators often chain-load into iPXE.
The latest stable iPXE release is v2.0.0, released March 6, 2026. This is a monumental release because iPXE v2.0.0 is the first iPXE release to include official UEFI Secure Boot support via a dedicated shim. This allows you to construct a fully trusted, end-to-end verified boot chain without disabling Secure Boot in the physical server’s BIOS.
Another excellent tool built on top of iPXE is netboot.xyz. netboot.xyz is an iPXE-based multi-OS network bootloader; its latest release is version 3.0.0 (released January 24, 2026). It provides a dynamic menu system to pull down hundreds of different operating system images directly from the internet, making it fantastic for lab environments.
Core Content 4: Unattended Installation Formats
Section titled “Core Content 4: Unattended Installation Formats”Once the kernel boots, the operating system installer takes over. Without intervention, it would halt and wait for a human to select a language, partition disks, and set a password. We bypass this using unattended installation files.
- Ubuntu Autoinstall: Ubuntu adopted the autoinstall (Subiquity) YAML format for unattended installs starting with Ubuntu Server 20.04 and Ubuntu Desktop 23.04. Ubuntu autoinstall uses a YAML configuration file passed via a cloud-init datasource or kernel parameter (e.g.,
autoinstall ds=nocloud). - Debian Preseed: Debian uses the preseed format (debian-installer/d-i) for unattended installation; preseed is actively maintained and not deprecated for Debian as of 2026.
- RHEL/Fedora Kickstart: RHEL and Fedora use the Kickstart format for unattended installation, handled by the pykickstart library and invoked through the Anaconda installer.
- Ignition: Ignition is the first-boot provisioning format for Flatcar Container Linux and Fedora CoreOS; the current specification is version 3.3.0. Note that the current Ignition spec v2.x (last version 2.3.0) is no longer actively developed but remains supported in existing Flatcar and RHCOS deployments.
These tools heavily integrate with cloud-init, which is the industry standard for cloud/bare-metal instance initialisation; its latest stable release is 26.1, released February 27, 2026.
Ubuntu Autoinstall Configuration Example
Section titled “Ubuntu Autoinstall Configuration Example”The following configuration answers every prompt the Ubuntu installer would normally ask interactively:
#cloud-configautoinstall: version: 1 locale: en_US.UTF-8 keyboard: layout: us
# Network: DHCP on first interface network: version: 2 ethernets: id0: match: name: en* dhcp4: true
# Storage: entire first disk storage: layout: name: lvm sizing-policy: all
# Users identity: hostname: k8s-node username: kubedojo # password: "changeme" (hashed) password: "$6$rounds=4096$xyz$..."
# SSH ssh: install-server: true authorized-keys: - ssh-ed25519 AAAA... admin@kubedojo
# Packages for Kubernetes packages: - containerd - apt-transport-https - curl
# Post-install: prepare for K8s late-commands: - | cat > /target/etc/modules-load.d/k8s.conf << 'MODULES' overlay br_netfilter MODULES - | cat > /target/etc/sysctl.d/k8s.conf << 'SYSCTL' net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 SYSCTL # Disable swap - curtin in-target -- swapoff -a - curtin in-target -- sed -i '/swap/d' /etc/fstabCore Content 5: Bare-Metal Provisioning Platforms
Section titled “Core Content 5: Bare-Metal Provisioning Platforms”Managing DHCP, TFTP, and configuration files manually scales poorly. Enterprise environments utilize orchestrators that act as the control plane for bare metal. These tools often integrate closely with Baseboard Management Controllers (BMCs) using modern API standards. For example, the DMTF Redfish standard for out-of-band hardware management (BMC API) has its latest protocol specification at DSP0266 version 1.23.1 (December 4, 2025), which allows orchestrators to power cycle nodes, mount virtual media, and configure BIOS settings programmatically.
MAAS (Metal as a Service)
Section titled “MAAS (Metal as a Service)”Canonical MAAS (Metal as a Service) is an active bare-metal lifecycle management tool; its latest stable release is version 3.7 (released February 13, 2026). It treats physical servers like cloud instances. It was originally built for Ubuntu’s own infrastructure and later open-sourced.
Legacy ASCII representation:
┌─────────────────────────────────────────────────────────────┐│ MAAS ARCHITECTURE ││ ││ ┌──────────────────────────────────────┐ ││ │ MAAS Region Controller │ ││ │ ┌──────────┐ ┌──────────┐ │ ││ │ │ REST API │ │ Web UI │ │ ││ │ └──────────┘ └──────────┘ │ ││ │ ┌──────────┐ ┌──────────┐ │ ││ │ │PostgreSQL│ │ Image │ │ ││ │ │ (state) │ │ Store │ │ ││ │ └──────────┘ └──────────┘ │ ││ └───────────────────┬──────────────────┘ ││ │ ││ ┌───────────────────▼──────────────────┐ ││ │ MAAS Rack Controller │ ││ │ ┌──────┐ ┌──────┐ ┌──────┐ │ ││ │ │ DHCP │ │ TFTP │ │ HTTP │ │ ││ │ └──────┘ └──────┘ └──────┘ │ ││ │ ┌──────┐ ┌──────┐ │ ││ │ │ DNS │ │ Proxy│ │ ││ │ └──────┘ └──────┘ │ ││ └───────────────────┬──────────────────┘ ││ │ ││ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ││ │Server │ │Server │ │Server │ │Server │ ││ │ 01 │ │ 02 │ │ 03 │ │ 04 │ ││ └───────┘ └───────┘ └───────┘ └───────┘ ││ ││ Machine States: ││ New → Commissioning → Ready → Deploying → Deployed ││ ↓ ││ Releasing → Ready (recycled) ││ │└─────────────────────────────────────────────────────────────┘Modern Architectural View:
flowchart TD subgraph Region[MAAS Region Controller] RAPI[REST API] RUI[Web UI] RPDB[(PostgreSQL\nstate)] RIMG[(Image Store)] end subgraph Rack[MAAS Rack Controller] RDHCP[DHCP] RTFTP[TFTP] RHTTP[HTTP] RDNS[DNS] RPRX[Proxy] end Region --> Rack Rack --> S1[Server 01] Rack --> S2[Server 02] Rack --> S3[Server 03] Rack --> S4[Server 04]
subgraph States[Machine States] direction LR N[New] --> C[Commissioning] --> R[Ready] --> D[Deploying] --> DP[Deployed] DP --> RL[Releasing] --> R end| Feature | Description |
|---|---|
| Discovery | Automatically detects new servers via DHCP |
| Commissioning | Inventories hardware (CPU, RAM, disks, NICs) |
| Deployment | Installs Ubuntu, CentOS, RHEL, or custom images |
| Networking | Manages VLANs, bonds, bridges, DNS |
| Storage | LVM, RAID, bcache configuration |
| API | Full REST API for automation |
| Juju integration | Deploy applications via Juju charms |
MAAS is incredibly easy to bootstrap on an Ubuntu host:
# Install MAAS (snap)sudo snap install maas --channel=3.7
# Initializesudo maas init region+rack \ --database-uri "postgres://maas:password@localhost/maas"
# Create admin usersudo maas createadmin \ --username admin \ --password secure-password \ --email admin@kubedojo.local
# Access web UI: http://localhost:5240/MAAS/Tinkerbell, Metal3, and Matchbox
Section titled “Tinkerbell, Metal3, and Matchbox”While MAAS relies heavily on a monolithic Postgres database and Ubuntu-centric tooling, the Cloud Native Computing Foundation (CNCF) hosts several alternative orchestrators.
Metal3 (Metal Cubed) is a CNCF Incubating project for Kubernetes-native bare-metal provisioning; it moved from Sandbox to Incubating on August 27, 2025. It integrates tightly with the Cluster API framework.
Tinkerbell is a CNCF Sandbox bare-metal provisioning project originally donated by Equinix Metal (formerly Packet) to manage their bare metal cloud fleet. It remains at Sandbox maturity as of April 2026 and is used by Spectro Cloud, Platform9, and other bare metal Kubernetes providers. The project has undergone significant architectural shifts; its tink (workflow engine) repository was archived by the project on December 2, 2025 and is now read-only.
For immutable operating systems, Matchbox (poseidon/matchbox) is an active bare-metal provisioner that matches machines by MAC/UUID and serves iPXE scripts and Ignition configs specifically designed for Flatcar and Fedora CoreOS clusters.
Pause and predict: Tinkerbell defines provisioning steps as container actions. How does this differ from a traditional kickstart/autoinstall approach? What advantage does containerized provisioning give you for reproducibility and testing?
Legacy ASCII representation (Tinkerbell):
┌─────────────────────────────────────────────────────────────┐│ TINKERBELL ARCHITECTURE ││ ││ ┌──────────────────────────────────────┐ ││ │ Tinkerbell Stack │ ││ │ │ ││ │ ┌──────────┐ Workflow engine │ ││ │ │ Tink │ Defines provisioning │ ││ │ │ Server │ steps as containers │ ││ │ └──────────┘ │ ││ │ ┌──────────┐ DHCP + PXE + OSIE │ ││ │ │ Boots │ Handles network boot │ ││ │ └──────────┘ │ ││ │ ┌──────────┐ Object storage │ ││ │ │ Hegel │ Metadata service │ ││ │ └──────────┘ (like cloud metadata) │ ││ └──────────────────────────────────────┘ ││ ││ Provisioning defined as Kubernetes CRDs: ││ - Hardware: describes physical machine ││ - Template: defines provisioning steps ││ - Workflow: links Hardware to Template ││ │└─────────────────────────────────────────────────────────────┘Modern Architectural View:
flowchart TD subgraph Stack[Tinkerbell Stack] TINK[Tink Server\nWorkflow engine\nDefines steps as containers] BOOTS[Boots\nDHCP + PXE + OSIE\nHandles network boot] HEGEL[Hegel\nObject storage\nMetadata service] end
subgraph CRDs[Provisioning defined as Kubernetes CRDs] H[Hardware: describes physical machine] T[Template: defines provisioning steps] W[Workflow: links Hardware to Template] H --- W T --- W end Stack --- CRDsThe Tinkerbell workflow below shows the declarative approach to provisioning. Each action is an OCI container image that performs one specific step—streaming the OS image, writing configuration files, or setting up networking.
# Hardware definition (like a cloud instance profile)apiVersion: tinkerbell.org/v1alpha1kind: Hardwaremetadata: name: worker-01spec: disks: - device: /dev/sda metadata: facility: plan_slug: "c3.small.x86" instance: hostname: k8s-worker-01 operating_system: slug: ubuntu_22_04 interfaces: - dhcp: mac: "aa:bb:cc:dd:ee:01" ip: address: 10.0.5.51 netmask: 255.255.255.0 gateway: 10.0.5.1# Template (provisioning steps as container actions)apiVersion: tinkerbell.org/v1alpha1kind: Templatemetadata: name: ubuntu-k8sspec: data: | version: "3.0" name: ubuntu-k8s-install global_timeout: 1800 tasks: - name: os-installation worker: "{{.device_1}}" actions: - name: stream-ubuntu-image image: quay.io/tinkerbell-actions/image2disk:v3.0.0 timeout: 600 environment: IMG_URL: http://10.0.5.1/images/ubuntu-22.04.raw.gz DEST_DISK: /dev/sda COMPRESSED: true - name: install-containerd image: quay.io/tinkerbell-actions/writefile:v3.0.0 timeout: 90 environment: DEST_DISK: /dev/sda1 DEST_PATH: /etc/modules-load.d/k8s.conf CONTENTS: | overlay br_netfilter - name: configure-network image: quay.io/tinkerbell-actions/writefile:v3.0.0 timeout: 90 environment: DEST_DISK: /dev/sda1 DEST_PATH: /etc/netplan/01-netcfg.yaml CONTENTS: | network: version: 2 ethernets: eno1: addresses: [10.0.5.51/24] gateway4: 10.0.5.1 nameservers: addresses: [10.0.5.1]# Workflow (connects hardware to template)apiVersion: tinkerbell.org/v1alpha1kind: Workflowmetadata: name: provision-worker-01spec: templateRef: ubuntu-k8s hardwareRef: worker-01Hands-On Exercise: PXE Boot a Virtual Machine
Section titled “Hands-On Exercise: PXE Boot a Virtual Machine”Task: Set up a minimal PXE server and boot a virtual machine from it entirely over the network.
Note: This exercise uses QEMU/KVM to simulate a PXE-booting bare metal server. No physical hardware is required to complete this task.
Step 1: Install Required Dependencies and Prepare Directory Structures
Section titled “Step 1: Install Required Dependencies and Prepare Directory Structures”First, we need to install our core routing and virtualization tools. We use dnsmasq rather than separate DHCP and TFTP servers because it handles both protocols elegantly in a single lightweight process.
# Install dependenciessudo apt-get install -y dnsmasq nginx qemu-kvm
# Create TFTP directorysudo mkdir -p /srv/tftp
# Download Ubuntu 22.04 Live Server ISO and extract boot files# Note: Canonical removed debian-installer netboot images after 20.04.# Extract vmlinuz and initrd from the ISO's casper directory instead.wget https://releases.ubuntu.com/22.04/ubuntu-22.04.5-live-server-amd64.isosudo mkdir -p /mnt/isosudo mount -o loop ubuntu-22.04.5-live-server-amd64.iso /mnt/isosudo cp /mnt/iso/casper/vmlinuz /srv/tftp/vmlinuzsudo cp /mnt/iso/casper/initrd /srv/tftp/initrdsudo umount /mnt/iso
# Install and copy pxelinux bootloader filessudo apt-get install -y pxelinux syslinux-commonsudo cp /usr/lib/PXELINUX/pxelinux.0 /srv/tftp/sudo cp /usr/lib/syslinux/modules/bios/ldlinux.c32 /srv/tftp/
# Create bridge interface for QEMU and assign IP for dnsmasqsudo ip link add name virbr1 type bridgesudo ip addr add 192.168.100.1/24 dev virbr1sudo ip link set dev virbr1 up
# Allow QEMU to use the bridgesudo mkdir -p /etc/qemuecho "allow virbr1" | sudo tee -a /etc/qemu/bridge.confCheckpoint Verification: Verify that the virbr1 bridge interface is active and has the correct IP address assigned before continuing:
ip addr show dev virbr1Step 2: Configure the dnsmasq Service for Network Booting
Section titled “Step 2: Configure the dnsmasq Service for Network Booting”Stop and think: The dnsmasq configuration below responds to any DHCP request on the PXE network with a boot image. What would happen if a production server accidentally rebooted with PXE as its first boot device? How would you prevent this?
# Configure dnsmasqcat | sudo tee /etc/dnsmasq.d/pxe.conf << 'EOF'interface=virbr1dhcp-range=192.168.100.50,192.168.100.100,255.255.255.0,1hdhcp-boot=pxelinux.0enable-tftptftp-root=/srv/tftpEOF
# Create PXE menusudo mkdir -p /srv/tftp/pxelinux.cfgcat | sudo tee /srv/tftp/pxelinux.cfg/default << 'EOF'DEFAULT installLABEL install KERNEL vmlinuz APPEND initrd=initrdEOF
sudo systemctl restart dnsmasqCheckpoint Verification: Ensure that the dnsmasq service is active and listening before attempting to boot the VM:
sudo systemctl status dnsmasq --no-pagerStep 3: Execute the Virtual Machine Network Boot
Section titled “Step 3: Execute the Virtual Machine Network Boot”With the backend services running, we will instruct QEMU to launch a virtual machine without any local ISO attached, forcing it to fall back to a network boot.
# Create a disk for the VMqemu-img create -f qcow2 /tmp/pxe-test.qcow2 20G
# Boot VM with PXE (network boot)sudo qemu-system-x86_64 \ -m 2048 \ -boot n \ -net nic \ -net bridge,br=virbr1 \ -drive file=/tmp/pxe-test.qcow2,format=qcow2 \ -nographicStep 4: Watch the PXE boot process
Section titled “Step 4: Watch the PXE boot process”You should see DHCP discovery, TFTP download, and the Ubuntu installer starting.
Success Criteria Checklist
Section titled “Success Criteria Checklist”- The
dnsmasqservice is actively running and listening on both DHCP and TFTP ports. - The QEMU Virtual Machine successfully initiates a PXE boot sequence over the bridge interface.
- A new DHCP lease is visibly assigned and logged within the
/var/lib/misc/dnsmasq.leasesfile. - The TFTP transfer of the
pxelinux.0(or EFI) boot loader completes successfully. - The target operating system installer initiates its automated routine over the network.
Did You Know?
Section titled “Did You Know?”- The canonical PXE specification is version 2.1, published by Intel on September 20, 1999 as part of the Wired for Management specification; remarkably, no newer official PXE spec has ever been published since. The protocol has barely changed — it still uses TFTP, a protocol from 1981.
- The latest stable iPXE release is v2.0.0, released on March 6, 2026, marking the first time the bootloader included official UEFI Secure Boot support via a dedicated shim.
- The TFTP windowsize option (RFC 7440) was published in January 2015 and substantially improves PXE boot transfer performance by allowing multiple data packets in flight over the network.
- Canonical MAAS (Metal as a Service) is an active bare-metal lifecycle management tool; its latest stable release is version 3.7, released on February 13, 2026. MAAS manages over 1 million machines worldwide according to Canonical, with the largest known deployment managing 30,000+ servers.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| PXE on production VLAN | Any new server auto-installs, destroying data | Isolate PXE DHCP to a dedicated provisioning VLAN |
| No DHCP relay | Servers in other VLANs cannot PXE boot | Configure DHCP relay (ip helper-address) on switches |
| TFTP through firewall | TFTP uses random UDP ports; firewalls block it | Use HTTP Boot or configure firewall for TFTP passive mode |
| Static autoinstall | Every server gets identical config | Template with MAC-based customization (hostname, IP, role) |
| No post-install validation | Server installs but has wrong config | Add verification step (check containerd, sysctl, hostname) |
| Skipping Secure Boot | Anyone can PXE boot malicious images | Enable UEFI Secure Boot with signed boot chain |
| Manual BIOS config | Each server has different BIOS settings | Use Redfish API to configure BIOS programmatically |
| Not backing up PXE server | PXE server dies = cannot provision new nodes | Replicate PXE config via Git; have a standby PXE server |
Question 1
Section titled “Question 1”A newly racked physical server is attempting to PXE boot but inexplicably hangs at “Waiting for DHCP…” for 60 seconds before failing. What are the most likely causes?
Answer
In order of likelihood:
-
DHCP server not running or not configured for the PXE VLAN. Check
systemctl status dnsmasqand verify the DHCP range covers the server’s subnet. -
VLAN mismatch: The server’s port is in a different VLAN than the DHCP server. Check the switch port VLAN assignment and ensure DHCP relay is configured if they are on different VLANs.
-
Firewall blocking DHCP: UDP ports 67/68 must be open between the server and the DHCP server.
-
DHCP range exhausted: All IPs in the range are leased. Check
cat /var/lib/misc/dnsmasq.leasesfor active leases. -
NIC boot order: The server is trying to PXE boot from the wrong NIC (e.g., the BMC NIC instead of the production NIC). Check BIOS boot order.
Debug:
# On the DHCP server, watch for DHCP requests:tcpdump -i eth0 -n port 67 or port 68# You should see DHCPDISCOVER from the server's MACQuestion 2
Section titled “Question 2”Why is it a critical security and operational requirement to restrict PXE provisioning to a dedicated VLAN rather than exposing it on the production network?
Answer
Safety and security:
-
Accidental reimaging: If a production server reboots and its BIOS has PXE as the first boot device, it will DHCP discover and potentially start a fresh OS install — wiping the existing OS and all data. A dedicated provisioning VLAN prevents this because production servers are not on the PXE VLAN.
-
DHCP conflicts: PXE requires a DHCP server with
next-serverandfilenameoptions. Running this on the production VLAN risks conflicting with the production DHCP server, causing IP assignment issues for existing infrastructure. -
Attack surface: The PXE server can push arbitrary OS images to any server that PXE boots. An attacker on the PXE VLAN could install a compromised OS on any server that reboots with PXE enabled.
Best practice:
- Provisioning VLAN (e.g., VLAN 100) — only new/reprovisioning servers
- Production VLAN (e.g., VLAN 200) — running K8s nodes
- After OS installation, the server’s network config switches to the production VLAN
Question 3
Section titled “Question 3”Your infrastructure team is evaluating bare-metal orchestrators. Compare the design philosophies of MAAS and Tinkerbell. When would you strategically choose each?
Answer
MAAS:
- Full lifecycle management with Web UI
- Ubuntu-centric (best for Ubuntu/RHEL)
- Includes networking, DNS, DHCP, storage management
- PostgreSQL backend, monolithic architecture
- Best for: organizations standardizing on Ubuntu, teams that want a GUI, environments where MAAS manages the full network stack
Tinkerbell:
- Kubernetes-native (CRD-based, declarative)
- OS-agnostic (streams raw disk images)
- Lightweight, microservices architecture
- Integrates with Cluster API (Sidero/Metal3)
- Best for: GitOps-driven environments, teams already running Kubernetes, integration with Cluster API for declarative cluster lifecycle
Decision guide:
- If you want a “cloud-like” bare metal experience with a UI → MAAS
- If you want declarative, Kubernetes-native provisioning → Tinkerbell
- If you are using Cluster API for cluster lifecycle → Tinkerbell + Sidero
Question 4
Section titled “Question 4”Your organization strictly requires UEFI Secure Boot to be enabled for all production servers before they enter the cluster. How does this cryptographic requirement affect your PXE provisioning pipeline?
Answer
With Secure Boot enabled, the UEFI firmware cryptographically verifies the signature of every boot component. Only code signed by a trusted certificate authority can execute:
-
Boot loader must be signed: Use
shimx64.efi(signed by Microsoft’s UEFI CA) which then loadsgrubx64.efi(signed by your distribution’s key). -
Kernel must be signed: Ubuntu and RHEL ship signed kernels. Custom kernels must be enrolled in the Secure Boot database (MOK — Machine Owner Key).
-
initrd is not signed but is loaded by the signed kernel, which verifies it.
-
TFTP path changes: Instead of
pxelinux.0(unsigned BIOS loader), you must useshimx64.efi→grubx64.efi→ signed kernel.
Impact on provisioning:
- Cannot boot unsigned custom images.
- Must strictly use the distribution-provided, cryptographically signed boot chain.
- Custom kernels require MOK enrollment, which adds manual intervention overhead unless automated via physical BMC tooling or
mokutil. - This substantially increases security but heavily adds complexity to the boot chain.
For Talos Linux and Flatcar (covered in Module 2.3), both provide Secure Boot-compatible signed images out of the box.
Question 5
Section titled “Question 5”You are configuring a fleet of Fedora CoreOS servers for a Kubernetes cluster using the Matchbox provisioner. The servers successfully pull their initial binaries via iPXE, but the operating system completely fails to apply your declarative configurations upon the first boot. What is the most likely provisioning format mismatch?
Answer
You are likely attempting to use a traditional cloud-init or Kickstart format rather than Ignition. Ignition is the strict first-boot provisioning format mandated for Flatcar Container Linux and Fedora CoreOS. The current specification is version 3.3.0. Unlike cloud-init, which typically executes during the late boot process (after the network stack is up), Ignition executes deeply within the initramfs stage before the real root filesystem is fully mounted. This architectural difference makes Ignition ideal for strictly immutable operating systems.
Question 6
Section titled “Question 6”A major datacenter refresh replaces all legacy physical hardware with modern UEFI systems. Your existing deployment infrastructure relies heavily on pxelinux.0 and the TFTP protocol. Why will your provisioning pipelines suddenly fail, and what modern equivalents should you architect to resolve this?
Answer
Modern UEFI firmware categorically rejects legacy BIOS bootstrap programs. It requires a UEFI-compatible bootloader such as grubx64.efi or a Secure Boot shim (shimx64.efi), rather than the legacy pxelinux.0 (which is part of the discontinued Syslinux project). Furthermore, relying solely on TFTP is outdated and exceptionally slow. Modern UEFI HTTP Boot (introduced in UEFI 2.5) allows the physical firmware to download the bootloader and kernel directly via plain HTTP. This is significantly faster due to larger packet sizes and native TCP windowing. To remediate the failure, you must update DHCP option 67 to point to a valid EFI binary and transition your file hosting from a TFTP root to a standard HTTP web server.
Next Module
Section titled “Next Module”Continue to Module 2.3: Immutable OS for Kubernetes to learn why Talos Linux and Flatcar Container Linux are fundamentally better architectural choices than traditional distributions for operating bare-metal Kubernetes clusters.