Module 5.4: MetalLB - Load Balancing for Bare-Metal Kubernetes
Цей контент ще не доступний вашою мовою.
Toolkit Track | Complexity:
[MEDIUM]| Time: ~35 minutes
The Dreaded “Pending” That Wouldn’t Go Away
Section titled “The Dreaded “Pending” That Wouldn’t Go Away”Three weeks. Three weeks of LoadBalancer services stuck in Pending.
[Week 1, Monday]@platform-team Just deployed our first on-prem Kubernetes cluster!@platform-team Moving the payment gateway from cloud to our datacenter.@dev-lead Sweet. I'll deploy the app. Same manifests as GKE, right?@dev-lead ...why is the Service stuck on <pending>?@platform-team Probably just needs a minute.
[Week 1, Friday]@dev-lead Still pending. I've restarted everything. Twice.@platform-team The YAML looks fine. It works on GKE.@dev-lead "Works on GKE" doesn't help me here.
[Week 2, Wednesday]@dev-lead I switched everything to NodePort. It's ugly but it works.@platform-team We now have 14 services on random high ports. The firewall team is going to love us.@firewall-team We do not love you.
[Week 3, Thursday]@new-hire Hey, has anyone heard of MetalLB?@new-hire It gives you LoadBalancer services on bare metal.@platform-team ...@dev-lead ...@platform-team Install it. Install it right now.
[Week 3, Thursday, 20 minutes later]@new-hire Done. All 14 services have external IPs.@dev-lead I'm buying you lunch for a month.That new hire saved the team from an architectural dead end. The problem was never Kubernetes—it was that nobody told them LoadBalancer services need an external implementation, and clouds just hide that fact from you.
What You’ll Learn:
- Why LoadBalancer services don’t work on bare metal out of the box
- How MetalLB fills the gap with Layer 2 and BGP modes
- Configuring IP address pools, advertisements, and failover
- When to choose L2 vs BGP (and why it matters)
- Integrating MetalLB with Ingress controllers and Gateway API
Prerequisites:
- Kubernetes Services (ClusterIP, NodePort, LoadBalancer) — see CKA Module 3.1: Services
- Basic networking concepts (IP addresses, ARP, TCP/IP)
- Helm basics for installation
- A kind or minikube cluster for the exercise
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy MetalLB for bare-metal Kubernetes clusters with Layer 2 and BGP load balancing modes
- Configure MetalLB address pools and BGP peering for production bare-metal service exposure
- Implement IP address management strategies for multi-tenant bare-metal Kubernetes environments
- Evaluate MetalLB’s L2 versus BGP modes for different network topology requirements
Why This Module Matters
Section titled “Why This Module Matters”In the cloud, creating a LoadBalancer Service is magic. You write type: LoadBalancer, and within seconds, AWS gives you an ELB, GCP gives you a Network LB, Azure gives you a Load Balancer. Behind the scenes, a cloud controller manager talks to the provider API and provisions real infrastructure.
On bare metal, there is no cloud API. There is no magical load balancer fairy. When you create a LoadBalancer Service, Kubernetes dutifully sets the type and then waits. And waits. And waits. The EXTERNAL-IP column says <pending> forever, because nothing is listening for that request.
This is not a bug. It is by design. Kubernetes defines the interface (type: LoadBalancer), but the implementation is pluggable. Cloud providers ship their own. Bare metal ships nothing.
MetalLB is that missing piece. It watches for LoadBalancer Services, assigns IP addresses from a pool you define, and announces those IPs to the network so traffic can reach your cluster. It turns bare-metal Kubernetes into a first-class citizen for external traffic.
Did You Know?
MetalLB was created by Google engineer Dave Anderson in 2017 because he was frustrated that his home lab cluster couldn’t use LoadBalancer services. A personal itch became one of the most-used bare-metal Kubernetes tools in production.
As of v0.13, MetalLB uses custom resources (IPAddressPool, L2Advertisement, BGPAdvertisement) instead of a ConfigMap. Older tutorials still reference the ConfigMap approach—if you see
configin a ConfigMap, you are reading outdated documentation.Lightweight distributions like k3s and k0s ship with their own built-in load balancer (Klipper LB for k3s, for example). MetalLB is the standard choice when you need more control, BGP support, or are running upstream Kubernetes or kubeadm clusters.
MetalLB became a CNCF Sandbox project in 2021, signaling broad community adoption. Major companies including Equinix Metal and Scaleway run it in production to provide LoadBalancer services across thousands of bare-metal nodes.
Part 1: Why Bare Metal Needs MetalLB
Section titled “Part 1: Why Bare Metal Needs MetalLB”The Cloud vs Bare-Metal Gap
Section titled “The Cloud vs Bare-Metal Gap”CLOUD KUBERNETES BARE-METAL KUBERNETES───────────────── ──────────────────────
kubectl create svc kubectl create svc type: LoadBalancer type: LoadBalancer │ │ ▼ ▼ Cloud Controller Manager Nothing. Silence. Void. │ │ ▼ ▼ Cloud API provisions EXTERNAL-IP: <pending> a real load balancer ...forever │ ▼ EXTERNAL-IP: 34.120.x.xThe LoadBalancer service type is part of the Kubernetes specification, but the spec says nothing about how to implement it. That is the job of an external controller. In the cloud, the cloud-controller-manager handles it. On bare metal, you need MetalLB.
What MetalLB Actually Does
Section titled “What MetalLB Actually Does”MetalLB runs as a set of pods in your cluster and does two things:
- Address allocation — When a LoadBalancer Service is created, MetalLB assigns it an external IP from a pool you configure.
- Address advertisement — MetalLB announces that IP to the local network so that external clients can route traffic to it.
The “how” of advertisement is where MetalLB’s two modes come in.
Part 2: Layer 2 Mode — Simple and Effective
Section titled “Part 2: Layer 2 Mode — Simple and Effective”How It Works
Section titled “How It Works”In Layer 2 (L2) mode, MetalLB uses standard ARP (IPv4) or NDP (IPv6) to announce IP addresses. One node in the cluster becomes the “owner” of each service IP by responding to ARP requests for that address. All traffic for that IP flows to the owning node, which then uses kube-proxy (or Cilium) to distribute it to the backend pods.
LAYER 2 MODE───────────────────────────────────────────────────── Client sends ARP: "Who has 192.168.1.240?" │ ▼ MetalLB speaker on Node 2 responds: "That's me! MAC: aa:bb:cc:dd:ee:ff" │ ▼ All traffic for 192.168.1.240 → Node 2 │ ▼ kube-proxy on Node 2 distributes to backend pods (which may be on Node 1, Node 2, or Node 3)Speaker Election
Section titled “Speaker Election”MetalLB runs a “speaker” DaemonSet on every node. For each service IP, speakers use a deterministic hash to elect one node as the leader. Only the leader responds to ARP requests.
If the leader node goes down, the remaining speakers detect the failure and a new leader is elected. The new leader sends a gratuitous ARP to update the network’s ARP caches, and traffic shifts to the new node. Failover typically takes 10-30 seconds depending on client ARP cache timeouts.
L2 Mode Trade-offs
Section titled “L2 Mode Trade-offs”Strengths:
- Zero network infrastructure requirements—works on any flat L2 network
- No router configuration needed
- Dead simple to set up
Weaknesses:
- Single node bottleneck: all traffic for a service IP goes through one node
- Failover is not instant (ARP cache expiry)
- Does not scale horizontally for bandwidth-heavy services
Part 3: BGP Mode — Production-Grade Load Distribution
Section titled “Part 3: BGP Mode — Production-Grade Load Distribution”How It Works
Section titled “How It Works”In BGP mode, MetalLB peers with your network routers using the Border Gateway Protocol. Every node establishes a BGP session with the router and announces service IPs. The router sees multiple paths to the same IP and uses ECMP (Equal-Cost Multi-Path) to distribute traffic across all announcing nodes.
BGP MODE───────────────────────────────────────────────────── MetalLB speakers peer with Top-of-Rack router
Node 1 ──BGP──▶ Router: "I can reach 192.168.1.240" Node 2 ──BGP──▶ Router: "I can reach 192.168.1.240" Node 3 ──BGP──▶ Router: "I can reach 192.168.1.240"
Router ECMP table: 192.168.1.240 → Node 1 (cost: 1) → Node 2 (cost: 1) → Node 3 (cost: 1)
Traffic is distributed across all three nodes!Why BGP Matters for Production
Section titled “Why BGP Matters for Production”With ECMP, traffic is spread evenly across nodes. There is no single-node bottleneck. If a node goes down, the BGP session drops and the router removes that path from its ECMP table within seconds. Failover is fast and clean, limited only by BGP hold timers (typically 3-10 seconds).
Part 4: L2 vs BGP Decision Guide
Section titled “Part 4: L2 vs BGP Decision Guide”| Criteria | Layer 2 | BGP |
|---|---|---|
| Network requirements | Flat L2 network, no special hardware | BGP-capable router (ToR switch) |
| Setup complexity | Minimal | Requires router configuration |
| Traffic distribution | Single node per service | All nodes via ECMP |
| Failover speed | 10-30 seconds (ARP cache) | 3-10 seconds (BGP hold timer) |
| Scalability | Limited by single-node bandwidth | Scales with node count |
| Best for | Dev/test, small clusters, homelab | Production, high traffic |
| Router config needed | No | Yes (BGP peering, ECMP) |
Rule of thumb: Start with L2 mode. Move to BGP when you need horizontal traffic distribution or faster failover, and when your network team can configure BGP peering on the router.
Part 5: Installation and Configuration
Section titled “Part 5: Installation and Configuration”Installing MetalLB via Helm
Section titled “Installing MetalLB via Helm”# Add the MetalLB Helm repohelm repo add metallb https://metallb.github.io/metallbhelm repo update
# Install into its own namespacehelm install metallb metallb/metallb \ --namespace metallb-system \ --create-namespace \ --waitVerify the installation:
k get pods -n metallb-system# NAME READY STATUS RESTARTS# metallb-controller-5f9bb7cdb-x7kfn 1/1 Running 0# metallb-speaker-7gqrt 1/1 Running 0# metallb-speaker-kd4v2 1/1 Running 0# metallb-speaker-wn8bx 1/1 Running 0The controller handles IP allocation. The speakers (DaemonSet) handle network advertisement.
Configuring an IP Address Pool (L2 Mode)
Section titled “Configuring an IP Address Pool (L2 Mode)”apiVersion: metallb.io/v1beta1kind: IPAddressPoolmetadata: name: default-pool namespace: metallb-systemspec: addresses: - 192.168.1.240-192.168.1.250 # Range of IPs to assign---apiVersion: metallb.io/v1beta1kind: L2Advertisementmetadata: name: default-l2 namespace: metallb-systemspec: ipAddressPools: - default-poolk apply -f ip-address-pool.yamlThat is it. Any LoadBalancer Service created in the cluster will now receive an IP from 192.168.1.240-192.168.1.250.
Configuring BGP Mode
Section titled “Configuring BGP Mode”apiVersion: metallb.io/v1beta2kind: BGPPeermetadata: name: router-peer namespace: metallb-systemspec: myASN: 64500 # Your cluster's AS number peerASN: 64501 # Your router's AS number peerAddress: 10.0.0.1 # Router IP---apiVersion: metallb.io/v1beta1kind: IPAddressPoolmetadata: name: bgp-pool namespace: metallb-systemspec: addresses: - 10.0.100.0/24---apiVersion: metallb.io/v1beta1kind: BGPAdvertisementmetadata: name: bgp-advert namespace: metallb-systemspec: ipAddressPools: - bgp-poolPart 6: Integration with Ingress and Gateway API
Section titled “Part 6: Integration with Ingress and Gateway API”MetalLB does not replace Ingress controllers or Gateway API — it complements them. A common production pattern:
MetalLB assigns external IP │ ▼Internet ──▶ LoadBalancer Service (e.g., 192.168.1.240) │ ▼ Ingress Controller (nginx, Traefik) or Gateway API (Envoy, Cilium) │ ┌─────────┼─────────┐ ▼ ▼ ▼ app-a app-b app-cYour Ingress controller runs as a Deployment with a type: LoadBalancer Service. MetalLB gives that Service an external IP. The Ingress controller then routes traffic to backend services based on hostnames and paths.
This pattern works identically with Kubernetes Gateway API—the Gateway resource gets a LoadBalancer Service, and MetalLB assigns the IP.
Common Mistakes
Section titled “Common Mistakes”| Mistake | What Happens | Fix |
|---|---|---|
| IP pool overlaps with DHCP range | Two devices claim the same IP, intermittent connectivity | Coordinate with network team; use IPs outside the DHCP scope |
| Forgetting to create L2Advertisement or BGPAdvertisement | IPs are assigned but never announced; traffic cannot reach the cluster | Always pair an IPAddressPool with an advertisement resource |
| Using old ConfigMap-based configuration | Works on older versions but breaks on v0.13+ | Migrate to CRD-based config (IPAddressPool, L2Advertisement) |
| Running MetalLB alongside k3s Klipper LB | Both controllers fight over LoadBalancer services | Disable Klipper (--disable servicelb) before installing MetalLB |
| Expecting L2 mode to load-balance across nodes | All traffic hits a single node | Use BGP mode with ECMP for true multi-node distribution |
| IP pool exhaustion with no monitoring | New services get stuck on Pending again | Size your pool appropriately; set alerts on pool utilization |
Test your understanding before moving on.
Q1: Why do LoadBalancer services stay “Pending” on bare-metal clusters?
Show Answer
There is no cloud controller manager to provision an external load balancer. The LoadBalancer service type requires an external implementation to allocate and advertise IPs. On bare metal, nothing fulfills that role unless you install something like MetalLB.
Q2: In L2 mode, what happens when the node owning a service IP goes down?
Show Answer
The remaining MetalLB speakers detect the failure and elect a new leader node for that service IP. The new leader sends a gratuitous ARP to update network ARP caches and begins responding to ARP requests for the IP. Failover takes approximately 10-30 seconds depending on client ARP cache timeouts.
Q3: Why might you choose BGP mode over L2 mode?
Show Answer
BGP mode provides true load distribution across multiple nodes via ECMP, faster failover (3-10 seconds vs 10-30 seconds), and better scalability for high-traffic workloads. L2 mode funnels all traffic for a given service through a single node, which becomes a bottleneck at scale.
Q4: You install MetalLB and create an IPAddressPool, but LoadBalancer services still show Pending. What did you forget?
Show Answer
You forgot to create an L2Advertisement or BGPAdvertisement resource. An IPAddressPool alone only defines which IPs are available. You must also create an advertisement resource to tell MetalLB how to announce those IPs to the network.
Q5: Your team runs k3s and wants to use MetalLB. What must you do first?
Show Answer
Disable k3s’s built-in Klipper LB by starting k3s with --disable servicelb. Otherwise, both MetalLB and Klipper will compete to handle LoadBalancer services, causing unpredictable behavior. See Module 14.1: k3s for details.
Hands-On Exercise: MetalLB on kind
Section titled “Hands-On Exercise: MetalLB on kind”Goal: Deploy MetalLB in L2 mode on a kind cluster, create a LoadBalancer service, and reach it from your host.
Time: ~15 minutes
Success Criteria: curl returns a response from the service using the MetalLB-assigned external IP.
Step 1: Create a kind Cluster
Section titled “Step 1: Create a kind Cluster”kind create cluster --name metallb-labStep 2: Determine the kind Docker Network Range
Section titled “Step 2: Determine the kind Docker Network Range”kind runs inside Docker. We need IPs from the same Docker bridge network.
# Get the kind network subnetdocker network inspect kind -f '{{(index .IPAM.Config 0).Subnet}}'# Example output: 172.18.0.0/16
# Pick an unused range within that subnet (e.g., 172.18.255.200-172.18.255.250)Step 3: Install MetalLB
Section titled “Step 3: Install MetalLB”helm repo add metallb https://metallb.github.io/metallbhelm repo updatehelm install metallb metallb/metallb \ --namespace metallb-system \ --create-namespace \ --waitStep 4: Configure IP Pool and L2 Advertisement
Section titled “Step 4: Configure IP Pool and L2 Advertisement”Adjust the IP range based on your kind network from Step 2:
kubectl apply -f - <<'EOF'apiVersion: metallb.io/v1beta1kind: IPAddressPoolmetadata: name: kind-pool namespace: metallb-systemspec: addresses: - 172.18.255.200-172.18.255.250---apiVersion: metallb.io/v1beta1kind: L2Advertisementmetadata: name: kind-l2 namespace: metallb-systemspec: ipAddressPools: - kind-poolEOFStep 5: Deploy a Test Application
Section titled “Step 5: Deploy a Test Application”k create deployment web --image=nginx --port=80k expose deployment web --type=LoadBalancer --port=80Step 6: Verify the External IP
Section titled “Step 6: Verify the External IP”k get svc web# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE# web LoadBalancer 10.96.45.123 172.18.255.200 80:31234/TCP 10sThe EXTERNAL-IP should no longer say <pending>. If it does, wait a few seconds and check again.
Step 7: Test Connectivity
Section titled “Step 7: Test Connectivity”curl http://172.18.255.200# You should see the default nginx welcome page HTMLCleanup
Section titled “Cleanup”kind delete cluster --name metallb-labIf you completed this exercise successfully, you have just solved the exact problem that kept that team stuck for three weeks. Not bad for 15 minutes.
Next Steps
Section titled “Next Steps”- Module 5.1: Cilium — MetalLB handles external traffic; Cilium handles everything inside the cluster
- Module 5.2: Service Mesh — For mTLS and advanced traffic management after traffic enters the cluster
- CKA Module 3.1: Services — Deep dive into Kubernetes service types
- CKA Module 3.5: Gateway API — The next-generation Ingress that pairs well with MetalLB
- Module 14.1: k3s / Module 14.2: k0s — Lightweight distros with built-in LB alternatives
“The difference between a toy cluster and a production cluster is often just one missing component. MetalLB is that component for bare-metal load balancing.”