Module 5.1: Cilium - The Kernel-Powered Network Revolution
Цей контент ще не доступний вашою мовою.
Toolkit Track | Complexity:
[COMPLEX]| Time: 60-75 minutes
The 3 AM Wake-Up Call
Section titled “The 3 AM Wake-Up Call”Your phone buzzes. Production is down. The ops channel is on fire.
[03:12 AM] @oncall ALERT: Payment service timeouts[03:14 AM] @oncall Network team says "looks fine on their end"[03:17 AM] @oncall It's DNS[03:18 AM] @oncall It's always DNS[03:23 AM] @oncall Wait, it's not DNS. Something is dropping packets.[03:31 AM] @oncall Running tcpdump on all 47 pods. Send coffee.[03:52 AM] @oncall Found it. NetworkPolicy was blocking the new service.[03:54 AM] @oncall We have 200+ NetworkPolicies. Which one? No idea.[04:23 AM] @oncall Fixed by adding another allow rule. We'll clean up later.[04:24 AM] @oncall We never clean up later.Sound familiar?
This is what Kubernetes networking feels like without proper tooling. You’re blind. Packets vanish into the void. Policies are write-only—you create them but often can’t tell which one is actually doing what.
Cilium changes everything. By the end of this module, when something drops packets, you’ll know exactly which policy dropped it, why, and you’ll see it happen in real-time. No more 4 AM tcpdump sessions.
What You’ll Learn:
- Why traditional networking can’t keep up with Kubernetes
- How eBPF lets you program the Linux kernel (without being a kernel developer)
- Identity-based security that actually makes sense
- Hubble: seeing every packet, every decision, every drop
- Replacing kube-proxy and why you’ll never miss it
Prerequisites:
- Kubernetes networking basics (Services, Pods)
- eBPF Fundamentals for programs, maps, helpers, and verifier vocabulary
- Security Principles Foundations
- A healthy frustration with iptables (optional but helps)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Cilium as a CNI plugin with eBPF-based networking and transparent encryption
- Configure Cilium network policies using L3/L4 and L7 identity-aware filtering rules
- Implement Cilium’s service mesh capabilities with sidecar-free mTLS and load balancing
- Monitor network flows and troubleshoot connectivity using Hubble’s observability dashboards
Why This Module Matters
Section titled “Why This Module Matters”Let me tell you about the moment I fell in love with Cilium.
We had a microservices architecture with enough moving parts that a network-policy mistake was hard to isolate. One service was failing health checks even though direct application tests passed, and each team initially believed its own layer was fine.
With traditional tools, we would’ve spent hours with tcpdump and iptables debugging. Instead, I ran one command:
hubble observe --pod production/payment-service --verdict DROPPEDThree seconds later:
production/payment-service → production/health-checker DROPPEDPolicy: production/legacy-lockdown (ingress)An older policy was the culprit. It had outlived the assumptions it was originally written for and blocked a dependency the team had overlooked.
The root cause was visible quickly, turning a long network investigation into a straightforward policy fix.
💡 Did You Know? eBPF-based dataplanes are attractive in Kubernetes because they reduce dependence on large iptables rule sets and enable richer observability.
Part 1: Understanding the Problem (Before We Solve It)
Section titled “Part 1: Understanding the Problem (Before We Solve It)”The IPTables Nightmare
Section titled “The IPTables Nightmare”Before we talk about Cilium’s solution, you need to feel the pain of the old way.
Every Kubernetes cluster runs kube-proxy. Every time you create a Service, kube-proxy adds iptables rules. Let’s see what that actually looks like:
# On a modest cluster with 500 services:iptables-save | wc -l# Output: 12,847 lines
# On a large cluster with 5,000 services:iptables-save | wc -l# Output: 147,291 linesVery large iptables rule sets are common in bigger clusters.
Now imagine debugging why one specific packet was dropped.
THE IPTABLES DEBUGGING EXPERIENCE═══════════════════════════════════════════════════════════════════
You: "Why was my packet dropped?"
iptables: "Let me check... Chain PREROUTING → Chain KUBE-SERVICES → Chain KUBE-SVC-XYZABC123 → Chain KUBE-SEP-DEF456 → Chain KUBE-POSTROUTING → Actually I lost track. Somewhere in these 147,000 rules."
You: "Which rule specifically?"
iptables: "¯\_(ツ)_/¯"
You: "How do I see what's being blocked?"
iptables: "Add LOG rules everywhere. Parse the logs yourself. Good luck with the performance impact."
You: [opens job listings]And it gets worse. When you update a Service:
TIME TO UPDATE 147,000 IPTABLES RULES═══════════════════════════════════════════════════════════════════
1. kube-proxy receives Service update2. kube-proxy rewrites ALL rules (can't do incremental)3. Takes ~5-30 seconds on large clusters4. During rewrite: connections drop, new connections may fail5. All nodes do this simultaneously6. Your monitoring alerts go crazy
This happens every time:- A pod scales up/down- A service is created/deleted- An endpoint changes
At scale: dozens of times per minuteThis isn’t a hypothetical. Datadog wrote about hitting this limit. So did Shopify. At larger cluster sizes, iptables-based service routing can become a real operational bottleneck.
The NetworkPolicy Problem
Section titled “The NetworkPolicy Problem”Standard Kubernetes NetworkPolicies have a different problem: they’re based on IP addresses.
# This NetworkPolicy looks reasonable:apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-frontendspec: podSelector: matchLabels: app: backend ingress: - from: - podSelector: matchLabels: app: frontendUnder the hood, this becomes:
"Allow traffic from IP 10.244.1.45 to port 80""Allow traffic from IP 10.244.2.23 to port 80""Allow traffic from IP 10.244.3.67 to port 80"Now the frontend pod crashes and restarts. New IP: 10.244.1.99.
The CNI has to:
- Detect the IP change
- Update every policy that references frontend
- Push those updates to every node
- Hope nothing breaks during the transition
This happens constantly in Kubernetes. Pods restart, scale, move between nodes. IP addresses are ephemeral by design.
Building security on IP addresses is like building a house on quicksand.
Part 2: Enter eBPF - Programming the Unprogrammable
Section titled “Part 2: Enter eBPF - Programming the Unprogrammable”What is eBPF?
Section titled “What is eBPF?”eBPF stands for “extended Berkeley Packet Filter,” but that name is misleading. It’s evolved far beyond packet filtering.
Here’s the mental model that helped me understand it:
THE JAVASCRIPT OF THE LINUX KERNEL═══════════════════════════════════════════════════════════════════
Remember when browsers only displayed static HTML?Then JavaScript came along: "What if we could run code IN the browser?"Suddenly browsers could do anything.
eBPF is JavaScript for the Linux kernel.
Before eBPF:- Want to change how networking works? Modify kernel code, recompile, reboot.- Want to add tracing? Load a kernel module, pray it doesn't crash.- Want custom packet processing? Install a userspace proxy, accept the overhead.
With eBPF:- Write small programs that run INSIDE the kernel- [Load them dynamically, no reboot needed](https://github.com/cilium/cilium/blob/main/Documentation/overview/component-overview.rst)- Kernel verifies they're safe before running- Run at kernel speed (no userspace context switches)Here’s a concrete example. Traditional packet processing:
TRADITIONAL PACKET FLOW═══════════════════════════════════════════════════════════════════
Packet arrives at network card │ ▼ Kernel receives packet │ ▼ iptables chain 1 (PREROUTING) │ ▼ iptables chain 2 (INPUT/FORWARD) │ ▼ Routing decision │ ▼ iptables chain 3 (OUTPUT) │ ▼ iptables chain 4 (POSTROUTING) │ ▼ Copy packet to userspace ← EXPENSIVE! │ ▼ Userspace proxy (kube-proxy/envoy/etc) │ ▼ Copy packet back to kernel ← EXPENSIVE! │ ▼ Finally reaches destination
Cost: ~50-100 microseconds per packet Multiple memory copies CPU cache thrashingWith eBPF:
eBPF PACKET FLOW═══════════════════════════════════════════════════════════════════
Packet arrives at network card │ ▼ eBPF program runs (in kernel) - Looks up destination in hash map: O(1) - Applies policy: O(1) - Rewrites headers if needed - Decides: forward, drop, or redirect │ ▼ Packet reaches destination
Cost: ~5-10 microseconds per packet Zero memory copies Runs in kernel context
10x faster. Zero userspace involvement for most packets.Pause and predict: in your own cluster, before reading on, what would iptables-save | wc -l return today, and at what rule count do you expect kube-proxy reconciliation to start visibly stalling Service updates? Jot a number, then compare it to the figures in this module — the gap between intuition and reality is exactly the gap eBPF closes.
Why eBPF is Safe (Despite Running in the Kernel)
Section titled “Why eBPF is Safe (Despite Running in the Kernel)”“Wait,” I hear you thinking, “running arbitrary code in the kernel sounds terrifying.”
You’re right. That’s why eBPF has a verifier:
THE eBPF VERIFIER: YOUR KERNEL'S BOUNCER═══════════════════════════════════════════════════════════════════
Before ANY eBPF program runs, the verifier checks:
✓ Does it terminate? (No infinite loops allowed)✓ Does it access only allowed memory? (No kernel crashes)✓ Does it use only allowed kernel functions?✓ Does it handle all code paths? (No undefined behavior)✓ Is the complexity bounded? ([Max 1 million instructions](https://github.com/torvalds/linux/blob/master/include/linux/bpf.h))
If ANY check fails: program is rejected, never runs.
This is why you can load eBPF programs on production systemswithout fear. The kernel itself guarantees they're safe.💡 Did You Know? The eBPF verifier is so strict that it sometimes rejects valid programs that the human eye can see are safe. The Cilium team has contributed extensively to the Linux kernel to make the verifier smarter while maintaining safety. Writing eBPF programs that pass the verifier is an art—Cilium handles this complexity so you don’t have to.
Part 3: Cilium Architecture - The Big Picture
Section titled “Part 3: Cilium Architecture - The Big Picture”Now that you understand eBPF, let’s see how Cilium uses it:
CILIUM: THE COMPLETE PICTURE═══════════════════════════════════════════════════════════════════
┌─────────────────────────────┐ │ KUBERNETES API │ │ (Pods, Services, Policies) │ └──────────────┬──────────────┘ │ ┌───────────────────┼───────────────────┐ │ │ │ ┌────────▼────────┐ ┌────────▼────────┐ ┌───────▼────────┐ │ CILIUM OPERATOR │ │ HUBBLE RELAY │ │ HUBBLE UI │ │ (1 per cluster)│ │ (aggregation) │ │ (visualization)│ └─────────────────┘ └────────┬────────┘ └────────────────┘ │ ════════════════════════════════════╧════════════════════════════ PER-NODE COMPONENTS ═════════════════════════════════════════════════════════════════
NODE 1 NODE 2 NODE 3 ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ CILIUM AGENT │ │ CILIUM AGENT │ │ CILIUM AGENT │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ │ Policy │ │ │ │ Policy │ │ │ │ Policy │ │ │ │ Engine │ │ │ │ Engine │ │ │ │ Engine │ │ │ ├─────────────┤ │ │ ├─────────────┤ │ │ ├─────────────┤ │ │ │ Identity │ │ │ │ Identity │ │ │ │ Identity │ │ │ │ Manager │ │ │ │ Manager │ │ │ │ Manager │ │ │ ├─────────────┤ │ │ ├─────────────┤ │ │ ├─────────────┤ │ │ │ Hubble │ │ │ │ Hubble │ │ │ │ Hubble │ │ │ │ Observer │ │ │ │ Observer │ │ │ │ Observer │ │ │ └──────┬──────┘ │ │ └──────┬──────┘ │ │ └──────┬──────┘ │ │ │ │ │ │ │ │ │ │ │ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │ │ │ eBPF │ │ │ │ eBPF │ │ │ │ eBPF │ │ │ │ DATAPLANE │ │ │ │ DATAPLANE │ │ │ │ DATAPLANE │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ • Networking│ │ │ │ • Networking│ │ │ │ • Networking│ │ │ │ • Policies │ │ │ │ • Policies │ │ │ │ • Policies │ │ │ │ • Load Bal. │ │ │ │ • Load Bal. │ │ │ │ • Load Bal. │ │ │ │ • Encryption│ │ │ │ • Encryption│ │ │ │ • Encryption│ │ │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │ │ │ │ │ │ │ │ ┌──────┐ ┌──────┐ │ │ ┌──────┐ ┌──────┐ │ │ ┌──────┐┌──────┐│ │ │Pod A │ │Pod B │ │ │ │Pod C │ │Pod D │ │ │ │Pod E ││Pod F ││ │ │id=123│ │id=456│ │ │ │id=789│ │id=123│ │ │ │id=456││id=999││ │ └──────┘ └──────┘ │ │ └──────┘ └──────┘ │ │ └──────┘└──────┘│ └─────────────────────┘ └─────────────────────┘ └──────────────────┘The Components Explained (Like You’re New Here)
Section titled “The Components Explained (Like You’re New Here)”Cilium Agent (DaemonSet) - The worker bee on each node:
- Watches Kubernetes for pod/service/policy changes
- Compiles eBPF programs and loads them into the kernel
- Assigns identities to pods (more on this soon)
- Runs Hubble observer for local visibility
Cilium Operator - The coordinator (1 per cluster):
- Manages IP address allocation (IPAM)
- Handles garbage collection of stale resources
- Manages CRDs and cluster-wide operations
Hubble - The observability layer:
- Hubble (per-node): Captures flows from eBPF in real-time
- Hubble Relay: Aggregates flows from all nodes
- Hubble UI: Beautiful web interface for visualization
Installation: Your First Cilium Cluster
Section titled “Installation: Your First Cilium Cluster”# Step 1: Install Cilium CLI# (The CLI makes installation and management much easier)CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)curl -L --fail -o cilium-linux-amd64.tar.gz "https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz"sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/binrm cilium-linux-amd64.tar.gz
# Step 2: Install Cilium with the good defaultscilium install \ --set kubeProxyReplacement=true \ --set hubble.enabled=true \ --set hubble.relay.enabled=true \ --set hubble.ui.enabled=true
# Step 3: Wait for it to be readycilium status --wait
# Step 4: Verify everything workscilium connectivity testWhat cilium connectivity test actually does:
This isn’t a simple ping test. It deploys test workloads and verifies:
- Pod-to-pod connectivity (same node and cross-node)
- Pod-to-Service connectivity
- Pod-to-external connectivity
- Network policies are enforced correctly
- DNS resolution works
- Hubble observability captures flows
If this test passes, your networking is solid. If it fails, you’ll know exactly what’s broken.
Part 4: Identity-Based Security - The Game Changer
Section titled “Part 4: Identity-Based Security - The Game Changer”This is where Cilium fundamentally changes how you think about network security.
The Problem with IPs
Section titled “The Problem with IPs”Remember this scenario?
# You write a policy:apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-frontend-to-backendspec: podSelector: matchLabels: app: backend ingress: - from: - podSelector: matchLabels: app: frontendBehind the scenes, your CNI translates this to IP rules. Frontend pods have IPs 10.244.1.5 and 10.244.2.12, so the rule becomes “allow from 10.244.1.5 and 10.244.2.12.”
Now frontend scales from 2 pods to 20 pods. Each new pod needs to be added. Pod crashes and restarts with new IP? Rule needs updating. Rolling deployment? Constant IP churn.
Cilium throws this model away entirely.
How Cilium Identity Works
Section titled “How Cilium Identity Works”CILIUM IDENTITY: THE "AHA!" MOMENT═══════════════════════════════════════════════════════════════════
Step 1: Pod is created with labels┌─────────────────────────────────────────────────────────────────┐│ Pod: frontend-7b9f8c4d5-x2k9p ││ Labels: ││ app: frontend ││ env: production ││ team: checkout │└─────────────────────────────────────────────────────────────────┘
Step 2: [Cilium creates a NUMERIC IDENTITY from the labels](https://github.com/cilium/cilium/blob/main/Documentation/gettingstarted/terminology.rst)┌─────────────────────────────────────────────────────────────────┐│ Identity 48291 = {app=frontend, env=production, team=checkout} ││ ││ This identity is: ││ • Cluster-wide (same on all nodes) ││ • Stable (doesn't change when pod restarts) ││ • Shared (all pods with same labels = same identity) │└─────────────────────────────────────────────────────────────────┘
Step 3: Every packet carries the identity, NOT the IP┌─────────────────────────────────────────────────────────────────┐│ Network Packet ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Source Identity: 48291 │ ││ │ Dest Identity: 73842 │ ││ │ Payload: HTTP GET /api/checkout │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ The IP is still there for routing, but POLICY uses identity │└─────────────────────────────────────────────────────────────────┘
Step 4: Policy enforcement uses identity┌─────────────────────────────────────────────────────────────────┐│ eBPF Policy Check: ││ ││ "Is identity 48291 allowed to reach identity 73842?" ││ ││ Lookup in eBPF hash map: O(1) ← Constant time! ││ Answer: ALLOW or DENY ││ ││ No IP lookups. No rule scanning. Instant decision. │└─────────────────────────────────────────────────────────────────┘Why this matters:
- Pod restarts: Same labels = same identity. No policy updates needed.
- Scaling: 1 pod or 1000 pods with same labels = same identity. No rule explosion.
- Cross-cluster: Identity follows the workload. Works in multi-cluster setups.
- Debugging: “Who is identity 48291?” →
cilium identity get 48291→ Instant answer.
Seeing Identities in Action
Section titled “Seeing Identities in Action”# List all identities in your clustercilium identity list
# Output:# IDENTITY LABELS# 1 reserved:host# 2 reserved:world# 4 reserved:health# 48291 k8s:app=frontend,k8s:env=production,k8s:team=checkout# 73842 k8s:app=backend,k8s:env=production# 99103 k8s:app=database,k8s:env=production
# Get details on a specific identitycilium identity get 48291
# See which endpoints have this identitykubectl exec -n kube-system cilium-xxxxx -- cilium endpoint list | grep 48291💡 Did You Know? Cilium reserves identity numbers 1-255 for special purposes. Identity 1 is always the host (the node itself), identity 2 is “world” (anything external to the cluster), and identity 4 is for health checks. This means you can write policies like “allow health checks” without knowing which IP ranges your health checkers use. It’s beautiful.
Part 5: Network Policies - From Basic to “Wow”
Section titled “Part 5: Network Policies - From Basic to “Wow””Standard Kubernetes NetworkPolicy (Cilium Implements These)
Section titled “Standard Kubernetes NetworkPolicy (Cilium Implements These)”Cilium fully supports standard Kubernetes NetworkPolicies. If you have existing policies, they keep working:
# Standard NetworkPolicy - Cilium handles this perfectlyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: backend-allow-frontend namespace: productionspec: podSelector: matchLabels: app: backend policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080CiliumNetworkPolicy - The Enhanced Version
Section titled “CiliumNetworkPolicy - The Enhanced Version”This is where it gets interesting. Cilium extends NetworkPolicies with features Kubernetes doesn’t support:
# Layer 7 (HTTP) Policy - Kubernetes can't do thisapiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: api-http-policy namespace: productionspec: endpointSelector: matchLabels: app: api-server ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: "8080" protocol: TCP rules: http: # Only allow specific HTTP methods and paths - method: "GET" path: "/api/v1/products.*" - method: "GET" path: "/api/v1/users/[0-9]+" - method: "POST" path: "/api/v1/orders" headers: - 'Content-Type: application/json'What this policy says in plain English:
“Frontend pods can connect to the API server on port 8080, but ONLY for:
- GET requests to
/api/v1/products*(list/view products) - GET requests to
/api/v1/users/<id>(view specific user) - POST requests to
/api/v1/orderswith JSON content type (create orders)
Any other HTTP request? DENIED at the network layer.”
This is insanely powerful. An attacker who compromises your frontend can’t hit /api/v1/admin or send DELETE requests—the network itself blocks them.
DNS-Based Egress Policies
Section titled “DNS-Based Egress Policies”One of my favorite Cilium features. Most security teams want to control what external services pods can reach:
# Allow pods to reach only specific external servicesapiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: payment-egress namespace: productionspec: endpointSelector: matchLabels: app: payment-processor egress: # Allow internal services - toEndpoints: - matchLabels: app: order-service # Allow specific external APIs - toFQDNs: - matchName: "api.stripe.com" - matchName: "api.paypal.com" - matchPattern: "*.amazonaws.com" # AWS services toPorts: - ports: - port: "443" protocol: TCP # Allow DNS (required for FQDN resolution) - toEndpoints: - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: kube-dns toPorts: - ports: - port: "53" protocol: UDPHow FQDN policies work under the hood:
FQDN POLICY MAGIC═══════════════════════════════════════════════════════════════════
1. Policy says: "Allow egress to api.stripe.com"
2. Cilium intercepts DNS queries from the pod
3. Pod asks: "What's the IP of api.stripe.com?"
4. DNS responds: "It's 52.84.150.1, 52.84.150.2, 52.84.150.3"
5. Cilium automatically adds these IPs to the allow list (stored in eBPF maps for O(1) lookup)
6. Pod connects to 52.84.150.1:443 → ALLOWED
7. Later, Stripe changes IPs (they do this a lot)
8. Next DNS query returns new IPs
9. Cilium updates the allow list automatically
10. You never have to touch the policy!No more hardcoding CIDR blocks that break when cloud providers change IPs. No more overly permissive “allow all egress to 0.0.0.0/0” rules.
Pause and predict: imagine your payment service has a toFQDNs: api.stripe.com egress rule. The DNS record’s TTL is 30 seconds, but your pod cached the answer for 5 minutes due to a stale resolver. Stripe rotates IPs. What does Hubble show — and is the dropped flow Cilium’s fault or the application’s? Answer in your head before continuing — the resolution path matters more than the policy syntax.
Cluster-Wide Policies
Section titled “Cluster-Wide Policies”For policies that should apply everywhere (like “default deny”):
# Default deny ALL traffic cluster-wideapiVersion: cilium.io/v2kind: CiliumClusterwideNetworkPolicymetadata: name: default-denyspec: endpointSelector: {} # Applies to ALL pods ingress: - fromEndpoints: - {} # Only allow from endpoints with Cilium identity egress: - toEndpoints: - {} # Always allow essential services - toEntities: - kube-apiserver # Pods need to reach API server - dns # Pods need DNS
---# Explicitly allow health checks (they'd be denied by default-deny)apiVersion: cilium.io/v2kind: CiliumClusterwideNetworkPolicymetadata: name: allow-health-checksspec: endpointSelector: {} ingress: - fromEntities: - health # Cilium's reserved identity for health checksThe power of toEntities:
Instead of figuring out which IPs your kube-apiserver uses, which ports health checks come from, or which IPs your DNS servers have, Cilium provides semantic entities:
| Entity | What it means |
|---|---|
host | The node the pod runs on |
remote-node | Other nodes in the cluster |
kube-apiserver | Kubernetes API server |
health | Health check probes |
dns | DNS servers (kube-dns/CoreDNS) |
world | Everything outside the cluster |
Part 6: Hubble - Seeing the Invisible
Section titled “Part 6: Hubble - Seeing the Invisible”If Cilium is the brain, Hubble is the eyes.
The Old Way vs. The Hubble Way
Section titled “The Old Way vs. The Hubble Way”DEBUGGING NETWORK ISSUES: OLD VS NEW═══════════════════════════════════════════════════════════════════
THE OLD WAY:───────────────────────────────────────────────────────────────────1. Get alert: "Service unreachable"2. SSH into pod: kubectl exec -it pod -- sh3. Run tcpdump: tcpdump -i eth0 port 80804. Wait for traffic...5. Stare at hex dumps6. Realize you need tcpdump on the OTHER pod too7. SSH into other pod8. Run tcpdump there9. Try to correlate timestamps across pods10. Give up, ask network team11. Network team says "network is fine"12. Cry
THE HUBBLE WAY:───────────────────────────────────────────────────────────────────1. Get alert: "Service unreachable"2. Run: hubble observe --from-pod web --to-pod api --verdict DROPPED3. See exact policy that dropped the traffic4. Fix policy5. Go back to bedInstalling and Accessing Hubble
Section titled “Installing and Accessing Hubble”# Install Hubble CLIHUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/main/stable.txt)curl -L --fail -o hubble-linux-amd64.tar.gz "https://github.com/cilium/hubble/releases/download/${HUBBLE_VERSION}/hubble-linux-amd64.tar.gz"sudo tar xzvfC hubble-linux-amd64.tar.gz /usr/local/binrm hubble-linux-amd64.tar.gz
# Port-forward to Hubble Relay (needed to aggregate from all nodes)cilium hubble port-forward &
# Now you can use hubble observehubble observe
# Access the UI (optional but beautiful)cilium hubble ui# Opens browser to http://localhost:12000Hubble CLI - Your New Best Friend
Section titled “Hubble CLI - Your New Best Friend”# See ALL traffic in real-timehubble observe
# Filter by namespacehubble observe --namespace production
# Filter by specific podhubble observe --pod production/frontend-abc
# See only DROPPED traffic (the gold mine for debugging)hubble observe --verdict DROPPED
# See traffic between two specific serviceshubble observe \ --from-pod production/frontend \ --to-pod production/backend
# Filter by protocolhubble observe --protocol httphubble observe --protocol dnshubble observe --protocol tcp
# See HTTP requests with detailshubble observe --protocol http -o json | jq
# See DNS querieshubble observe --protocol dns --namespace production
# Output format optionshubble observe -o compact # One line per flowhubble observe -o dict # Readable dictionary formathubble observe -o json # JSON for scriptinghubble observe -o table # Table formatUnderstanding Hubble Output
Section titled “Understanding Hubble Output”HUBBLE FLOW ANATOMY═══════════════════════════════════════════════════════════════════
Dec 9 10:23:45.123 production/frontend-7b9f8c4d5-x2k9p:46532 (ID:48291) -> production/backend-5d8f7b3a2-k9p2m:8080 (ID:73842) http-request FORWARDED (HTTP/1.1 GET /api/users)
Let's break this down:───────────────────────────────────────────────────────────────────
TIMESTAMP SOURCEDec 9 10:23:45.123 production/frontend-7b9f8c4d5-x2k9p:46532 (ID:48291) │ │ │ │ │ namespace pod name port │ └─ Cilium identity! └─ source port
DESTINATION -> production/backend-5d8f7b3a2-k9p2m:8080 (ID:73842) │ │ │ │ namespace pod name port └─ Cilium identity
FLOW TYPE & VERDICT http-request FORWARDED (HTTP/1.1 GET /api/users) │ │ │ protocol │ └─ HTTP details (method, path) └─ FORWARDED = allowed DROPPED = blocked by policy ERROR = something went wrongReal Debugging Scenarios
Section titled “Real Debugging Scenarios”Scenario 1: “My pod can’t reach the database”
# Step 1: See what's being droppedhubble observe \ --from-pod production/myapp \ --to-pod production/postgres \ --verdict DROPPED
# Output:# production/myapp-xxx -> production/postgres-yyy# policy-verdict:none DROPPED (Policy denied)
# The "policy-verdict:none" tells you there's no ALLOW rule# You need to add a policy to permit this trafficScenario 2: “External API calls are failing”
# Check egress traffichubble observe \ --from-pod production/myapp \ --verdict DROPPED \ --type l3/l4
# Output:# production/myapp-xxx -> 52.84.150.1:443# policy-verdict:none DROPPED (Policy denied)
# Your egress policy doesn't allow this IP# Check if you need to add FQDN rulesScenario 3: “DNS is slow/failing”
# Watch DNS querieshubble observe --protocol dns --namespace production
# Output:# production/myapp -> kube-system/coredns# dns-request FORWARDED (Query api.stripe.com A)# kube-system/coredns -> production/myapp# dns-response FORWARDED (Answer: 52.84.150.1)
# If you see DROPPED DNS queries, check your egress policies# Enable metrics during Cilium installcilium install \ --set hubble.enabled=true \ --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"
# Or upgrade existing installationcilium upgrade \ --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"Key metrics to alert on:
# Prometheus alert examplesgroups:- name: cilium rules: # Alert on packet drops (excluding expected drops) - alert: HighPacketDropRate expr: rate(hubble_drop_total{reason!="Policy denied"}[5m]) > 100 for: 5m labels: severity: warning annotations: summary: "High packet drop rate on {{ $labels.instance }}"
# Alert on DNS failures - alert: DNSErrors expr: rate(hubble_dns_responses_total{rcode!="No Error"}[5m]) > 10 for: 5m labels: severity: warning annotations: summary: "DNS errors detected: {{ $labels.rcode }}"
# Alert on HTTP 5xx errors - alert: HTTP5xxErrors expr: rate(hubble_http_responses_total{status=~"5.."}[5m]) > 10 for: 5m labels: severity: critical💡 Did You Know? Hubble captures flows using eBPF, which means there’s no sampling. Compared with coarse network monitoring, Hubble gives detailed flow-level visibility that is especially useful for troubleshooting and auditing. This makes Hubble invaluable for security auditing—you have a complete record of all network communication.
Part 7: Replacing Kube-Proxy
Section titled “Part 7: Replacing Kube-Proxy”Why This Matters
Section titled “Why This Matters”Remember those 147,000 iptables rules? Let’s get rid of them.
# Install Cilium as kube-proxy replacementcilium install --set kubeProxyReplacement=true
# Verify it's workingcilium status | grep KubeProxyReplacement# KubeProxyReplacement: True [eth0 (Direct Routing)]
# See all Services handled by Ciliumkubectl exec -n kube-system ds/cilium -- cilium service list
# Compare the difference:# BEFORE (kube-proxy):# iptables-save | wc -l# 147,291
# AFTER (Cilium):# iptables-save | wc -l# 127 ← Only basic rules remainPerformance Comparison
Section titled “Performance Comparison”Real benchmarks from production clusters:
| Metric | kube-proxy (iptables) | Cilium eBPF | Improvement |
|---|---|---|---|
| Service lookup latency | Can increase as iptables rule sets grow | Often lower with eBPF-based service handling | Context-dependent |
| Memory usage | Often grows with rule-set size | Often more predictable with eBPF maps | Workload-dependent |
| Rule update time | Can slow down noticeably on large rule sets | Usually faster with eBPF-based updates | Environment-dependent |
| Connection drops on update | More likely during disruptive rule churn | Typically reduced with eBPF-based updates | Depends on configuration and rollout path |
| CPU usage at scale | Can rise with service and rule volume | Can be lower with eBPF-based handling | Depends on traffic and cluster shape |
DIRECT SERVER RETURN (DSR)═══════════════════════════════════════════════════════════════════
Without DSR (traditional):───────────────────────────────────────────────────────────────────Client → Load Balancer → Backend PodClient ← Load Balancer ← Backend Pod ↑ Return traffic goes through LB too (extra hop, extra latency)
With DSR (Cilium):───────────────────────────────────────────────────────────────────Client → Load Balancer → Backend PodClient ←──────────────── Backend Pod ↑ Return traffic goes DIRECTLY to client (faster response, less LB load)Enable DSR:
cilium install \ --set kubeProxyReplacement=true \ --set loadBalancer.mode=dsrPart 8: Transparent Encryption with WireGuard
Section titled “Part 8: Transparent Encryption with WireGuard”Encrypting all pod-to-pod traffic sounds hard. With Cilium, it’s one flag.
The Problem
Section titled “The Problem”UNENCRYPTED CLUSTER TRAFFIC═══════════════════════════════════════════════════════════════════
Pod A ─────────────────────────────────────────────▶ Pod B │ │ │ Network traffic crosses: │ │ • Virtual switches │ │ • Physical switches │ │ • Sometimes public internet │ │ (cross-AZ, cross-region) │ │ │ └──── All visible to anyone ─────────┘ with network access
Attackers can:• Read sensitive data• Capture credentials• Man-in-the-middle attacksThe Solution
Section titled “The Solution”# [Enable WireGuard encryption](https://github.com/cilium/cilium/blob/main/Documentation/security/network/encryption-wireguard.rst)cilium install \ --set encryption.enabled=true \ --set encryption.type=wireguard
# Verify encryption statuscilium status | grep Encryption# Encryption: Wireguard [NodeEncryption: Disabled, cilium_wg0 (Pubkey: xxx)]
# Check WireGuard peerskubectl exec -n kube-system ds/cilium -- cilium encrypt statusWhat happens now:
ENCRYPTED CLUSTER TRAFFIC═══════════════════════════════════════════════════════════════════
Pod A ══════════════════════════════════════════════▶ Pod B │ │ │ All traffic encrypted with │ │ WireGuard (state-of-art crypto) │ │ │ │ • No app changes needed │ │ • No sidecar containers │ │ • Kernel-level encryption │ │ • ~5% overhead (negligible) │ │ │ └──── Attackers see garbage ─────────┘Zero application changes. Your apps don’t know encryption is happening. It’s transparent at the kernel level.
Part 9: Common Mistakes (Learn From Others’ Pain)
Section titled “Part 9: Common Mistakes (Learn From Others’ Pain)”| Mistake | Why It Hurts | How To Avoid |
|---|---|---|
| Skipping connectivity test | You think it’s working, it’s not | Always run cilium connectivity test after install |
| Installing over existing CNI | CNI conflicts break everything | Remove old CNI completely first, or use fresh cluster |
| No default deny | Wide open by default = security hole | In most production setups, set a cluster-wide default deny |
| Forgetting DNS in egress | Pods can’t resolve external hosts | Always allow toEntities: [dns] in egress policies |
| Overly broad FQDN patterns | *.com defeats the purpose | Use specific FQDNs: api.stripe.com not *.stripe.com |
| Not enabling Hubble | Flying blind | Hubble is free, so enable it in most cases |
| Ignoring Hubble metrics | Miss issues until they’re incidents | Alert on hubble_drop_total and hubble_dns_* |
War Story: The Policy That Ate Christmas
Section titled “War Story: The Policy That Ate Christmas”A realistic failure mode: a policy change can accidentally block an overlooked dependency during a busy production period.
A restrictive CiliumNetworkPolicy that looked correct in staging was deployed to production.
# The policy that ruined ChristmasapiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: database-security namespace: productionspec: endpointSelector: matchLabels: app: postgres ingress: - fromEndpoints: - matchLabels: app: backend environment: productionWhat they missed: The caching service (Redis) also needed database access. It had app: cache, not app: backend.
Soon after deployment:
- Cache invalidation failed
- Stale product data started serving
- Wrong prices shown to customers
A few minutes later:
- Monitoring detected increased error rates
- On-call engineer paged
When the on-call engineer checked Hubble:
- Engineer ran:
hubble observe --to-pod production/postgres --verdict DROPPED - Output showed:
production/redis-xxx -> production/postgres DROPPED - Root cause identified quickly from the drop verdict and destination
After the policy was updated:
- Policy updated to include cache service
- Traffic restored
The outage stayed short because the policy drop was visible quickly.
Without clear policy visibility, this kind of problem can take much longer to isolate because teams often start by checking other layers first.
Lessons:
- Test policies against all relevant services, not just the obvious ones
- Hubble is not optional—it’s your incident response tool
--verdict DROPPEDis the most important filter you’ll ever use
Did You Know?
Section titled “Did You Know?”- Cilium graduated from the CNCF in October 2023, becoming the first graduated project in the cloud native networking category. Graduation is the CNCF’s highest maturity level and signals that the project meets enterprise governance, security, and contributor-diversity bars — meaning Cilium is officially “boring infrastructure” in the best possible sense.
- WireGuard, the protocol Cilium uses for transparent encryption, was merged into the Linux kernel in version 5.6 (released March 2020). That mainline merge is why Cilium can flip on encryption with a single Helm flag — there is no out-of-tree module to install, no DKMS pain, just kernel-native crypto running in roughly a few thousand lines of audited code.
- The eBPF verifier enforces a hard upper bound on program complexity, traditionally measured by the number of instructions it analyzes during static verification. This is why an eBPF program cannot contain unbounded loops or unverifiable memory accesses — the kernel literally refuses to load it. Cilium’s compiler is structured to keep generated bytecode well under this ceiling so the dataplane stays loadable across kernel versions.
- Hubble was announced and open-sourced by Isovalent in November 2019 as the observability layer purpose-built on Cilium’s eBPF dataplane. Because flow capture happens in-kernel, Hubble does not sample — every flow Cilium handles is observable, which is what makes “show me every dropped packet from this pod in the last minute” a one-line command instead of a tcpdump expedition.
Question 1
Section titled “Question 1”You deploy a default-deny policy and suddenly nothing works. Not even DNS. What’s the minimum policy you need to restore basic functionality?
Show Answer
apiVersion: cilium.io/v2kind: CiliumClusterwideNetworkPolicymetadata: name: allow-essentialspec: endpointSelector: {} egress: - toEntities: - dns # Allows CoreDNS queries - kube-apiserver # Allows pods to reach API server ingress: - fromEntities: - health # Allows health probesThis restores:
- DNS resolution (pods can resolve names)
- API server access (service accounts work)
- Health checks (probes don’t fail)
From here, add specific policies for your workloads.
Question 2
Section titled “Question 2”A pod is failing to connect to api.stripe.com. How do you debug this with Hubble?
Show Answer
# Step 1: Check if connection attempts are being droppedhubble observe \ --from-pod production/payment-service \ --verdict DROPPED
# Step 2: Check DNS is resolvinghubble observe \ --from-pod production/payment-service \ --protocol dns
# Step 3: Check specific destinationhubble observe \ --from-pod production/payment-service \ --to-fqdn api.stripe.com
# Common issues:# - DNS queries dropped → Add toEntities: [dns] to egress# - Connection dropped → Add toFQDNs with matchName: api.stripe.com# - Policy denied → Check your CiliumNetworkPolicyQuestion 3
Section titled “Question 3”Why does Cilium use identity numbers instead of IP addresses for policy enforcement?
Show Answer
IP-based problems:
- Pods get new IPs when restarting
- Scaling creates new IPs constantly
- Rolling updates = continuous IP churn
- Policies must be updated for every IP change
- Can’t express “frontend talks to backend” semantically
Identity-based advantages:
- Identity is based on labels, not IPs
- Same labels = same identity, regardless of IP
- 1 pod or 1000 pods = same identity if labels match
- Policies are stable (no updates needed when IPs change)
- Human-readable: “identity 48291 = frontend” makes sense
- O(1) lookup in eBPF hash maps
Example:
Pod with labels {app: frontend, env: prod} → Identity 48291
This pod can:- Restart 100 times- Scale to 50 replicas- Move across nodes
Identity stays 48291. Policies keep working.Question 4
Section titled “Question 4”Your team rolled out a new checkout-worker Deployment yesterday. It mounts a service account, can resolve DNS, and kubectl logs shows it starting cleanly — but every call from checkout-worker to the existing orders-api Service times out. cilium endpoint list shows the new pods have a different identity number than the old checkout-api pods you previously allowlisted. What do you check, in order, and which Hubble filter pinpoints the problem fastest?
Show Answer
The fastest path is identity-aware, not IP-aware. Run:
hubble observe \ --from-pod production/checkout-worker \ --to-pod production/orders-api \ --verdict DROPPED -o dictYou will see a policy-verdict:none DROPPED (Policy denied) line that shows the source identity number — and that identity number will be different from the one your orders-api ingress policy allowlists. The root cause is that the new Deployment was given new labels (e.g. app=checkout-worker instead of app=checkout-api), so Cilium minted a fresh identity, and your fromEndpoints matchLabels selector does not match it. The fix is either to relabel checkout-worker so it shares the existing identity, or to update the policy’s fromEndpoints to also match the new label set. The teaching point: in Cilium, “I deployed something new and it can’t reach the API” is almost never a DNS or IP problem — it is an identity-membership problem, and Hubble’s policy-verdict is the one signal that tells you which side of the identity boundary the drop happened on.
Question 5
Section titled “Question 5”You enabled kubeProxyReplacement=true and removed the kube-proxy DaemonSet. Cluster-internal Service traffic is faster than ever. But your team complains that a NodePort Service exposed on port 30080, which used to be reachable from a CI runner outside the cluster, now intermittently fails — sometimes it connects, sometimes it hangs. Pod-to-Pod is fine. What is most likely happening, and which Cilium config knob do you reach for?
Show Answer
The most common cause is that Cilium’s kube-proxy replacement is configured in a mode that only handles NodePort traffic on a specific device (often the primary node interface), and the CI runner is hitting nodes via a path Cilium isn’t programmed to load-balance — for example, a secondary NIC, an overlay address, or a Service that routes asymmetrically to a backend on another node when DSR is enabled but the return path isn’t set up. Check cilium status | grep KubeProxyReplacement to see which device(s) Cilium is bound to, then look at the devices (or older nodePort.directRoutingDevice) Helm value. Either widen devices to include every interface that receives external NodePort traffic, or switch the load-balancer mode away from DSR back to SNAT for NodePort while you debug. The teaching point: kube-proxy in iptables mode worked on every interface by accident because iptables hooks the netfilter chain globally; Cilium’s eBPF kube-proxy replacement is explicit about which devices it programs, so “intermittent NodePort hangs after kube-proxy removal” almost always means a missing device binding rather than a policy bug.
Question 6
Section titled “Question 6”A security engineer turns on a CiliumNetworkPolicy with an L7 HTTP rule that only allows GET /api/v1/products.* on the products-api Service. Within minutes, the SRE on call sees hubble_drop_total{reason="Policy denied"} start climbing — but the application’s own error rate stays flat and no users complain. What is the most likely explanation, and how would you confirm it before either rolling back or tightening further?
Show Answer
The most likely explanation is that the L7 policy is correctly blocking traffic the application was already silently tolerating — for example, periodic health-check probes that hit a non-allowlisted path, an internal admin sidecar polling /metrics, or a forgotten cron job calling POST /api/v1/products/refresh. Because Cilium denies these at L7, the application server never sees them, so application metrics stay green; only the network drop counter rises. Confirm by running hubble observe --pod production/products-api --verdict DROPPED --protocol http -o json | jq '.l7.http | {method, url}' and looking at which methods and paths are being denied. If the dropped requests are legitimate (health checks, metrics scrapes), expand the policy with explicit rules for those paths. If they are not (a stale debug sidecar, an attacker probing for /admin), the rising drop counter is exactly the signal you wanted — the policy is working as intended. The teaching point: an L7 policy moving traffic from “silently 200/404 at the app” to “explicitly dropped at the network” surfaces calls the application was masking, which is a feature, not a regression — but you have to read the dropped flows before deciding which side of that line each call belongs on.
Question 7
Section titled “Question 7”You enabled WireGuard transparent encryption with encryption.type=wireguard. Initial smoke tests pass. A day later, a batch job that ships several-hundred-megabyte payloads between pods on different nodes starts failing with “connection reset” partway through transfers, while small JSON API calls work fine. What is the prime suspect, and what would you measure to confirm?
Show Answer
The prime suspect is MTU (maximum transmission unit). WireGuard prepends an encryption header to every encapsulated packet, which shrinks the usable payload by roughly 60 bytes for IPv4 (more for IPv6). If your pod MTU was not lowered to account for this overhead, large packets that previously fit now exceed the link MTU and either get fragmented, dropped silently by a router that has DF set, or trigger a TCP reset on path-MTU-discovery failure. Small requests fit under the MTU and look fine; large bulk transfers blow up. Confirm by running ip link show cilium_wg0 on a node and comparing the WireGuard interface MTU to the pod interface MTU, then run ping -M do -s 1400 <pod-ip> from one pod to another to find the actual max packet size that gets through. The fix is to reduce the pod MTU (Cilium auto-configures this when the agent restarts, but pre-existing pods inherit the old MTU until they are recreated). The teaching point: encryption is not free even when the CPU overhead is negligible — every encapsulation tunnel eats header bytes, and the symptoms are always “small things work, big things break” rather than a clean total outage.
Hands-On Exercise: Build a Secure Microservices Setup
Section titled “Hands-On Exercise: Build a Secure Microservices Setup”Objective
Section titled “Objective”Deploy a three-tier application with Cilium, implement zero-trust networking, and observe traffic with Hubble.
Scenario
Section titled “Scenario”You’re deploying a web application with:
- Frontend: Nginx serving static content
- API: Node.js backend
- Database: PostgreSQL
Security requirements:
- Default deny all traffic
- Frontend can only reach API on port 3000
- API can only reach database on port 5432
- All pods can reach DNS
- No direct frontend-to-database access
Part 1: Setup the Cluster
Section titled “Part 1: Setup the Cluster”# Create a kind cluster without default CNIcat > kind-config.yaml << 'EOF'kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4networking: disableDefaultCNI: true kubeProxyMode: nonenodes:- role: control-plane- role: worker- role: workerEOF
kind create cluster --config kind-config.yaml --name cilium-lab
# Install Ciliumcilium install \ --set kubeProxyReplacement=true \ --set hubble.enabled=true \ --set hubble.relay.enabled=true \ --set hubble.ui.enabled=true
# Wait for Cilium to be readycilium status --wait
# Verify installationcilium connectivity testPart 2: Deploy the Application
Section titled “Part 2: Deploy the Application”# Create namespacekubectl create namespace demo
# Deploy databasekubectl -n demo apply -f - << 'EOF'apiVersion: v1kind: Podmetadata: name: database labels: app: database tier: dataspec: containers: - name: postgres image: postgres:15 env: - name: POSTGRES_PASSWORD value: "secret" ports: - containerPort: 5432---apiVersion: v1kind: Servicemetadata: name: databasespec: selector: app: database ports: - port: 5432EOF
# Deploy APIkubectl -n demo apply -f - << 'EOF'apiVersion: v1kind: Podmetadata: name: api labels: app: api tier: backendspec: containers: - name: api image: nginx ports: - containerPort: 3000---apiVersion: v1kind: Servicemetadata: name: apispec: selector: app: api ports: - port: 3000EOF
# Deploy frontendkubectl -n demo apply -f - << 'EOF'apiVersion: v1kind: Podmetadata: name: frontend labels: app: frontend tier: webspec: containers: - name: nginx image: nginx ports: - containerPort: 80EOFPart 3: Test Without Policies (Everything Works)
Section titled “Part 3: Test Without Policies (Everything Works)”# Start Hubble port-forward in backgroundcilium hubble port-forward &
# Test frontend → api (should work)kubectl -n demo exec frontend -- curl -s --max-time 5 api:3000echo "Frontend → API: SUCCESS"
# Test frontend → database (should also work - this is the problem!)kubectl -n demo exec frontend -- nc -zv database 5432echo "Frontend → Database: SUCCESS (but shouldn't be allowed!)"
# Test api → database (should work)kubectl -n demo exec api -- nc -zv database 5432echo "API → Database: SUCCESS"
# Watch traffic with Hubblehubble observe --namespace demoPart 4: Implement Zero-Trust Policies
Section titled “Part 4: Implement Zero-Trust Policies”# Step 1: Default deny everythingkubectl -n demo apply -f - << 'EOF'apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: default-denyspec: endpointSelector: {} ingress: [] egress: []EOF
# Test again - everything should fail nowkubectl -n demo exec frontend -- curl -s --max-time 5 api:3000 || echo "Frontend → API: BLOCKED (expected)"kubectl -n demo exec api -- nc -zv -w 2 database 5432 || echo "API → Database: BLOCKED (expected)"
# Watch the drops!hubble observe --namespace demo --verdict DROPPED# Step 2: Allow DNS (required for name resolution)kubectl -n demo apply -f - << 'EOF'apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: allow-dnsspec: endpointSelector: {} egress: - toEntities: - dnsEOF
# Step 3: Allow frontend → apikubectl -n demo apply -f - << 'EOF'apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: frontend-to-apispec: endpointSelector: matchLabels: app: api ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: "3000"---apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: frontend-egressspec: endpointSelector: matchLabels: app: frontend egress: - toEndpoints: - matchLabels: app: api toPorts: - ports: - port: "3000"EOF
# Step 4: Allow api → databasekubectl -n demo apply -f - << 'EOF'apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: api-to-databasespec: endpointSelector: matchLabels: app: database ingress: - fromEndpoints: - matchLabels: app: api toPorts: - ports: - port: "5432"---apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: api-egressspec: endpointSelector: matchLabels: app: api egress: - toEndpoints: - matchLabels: app: database toPorts: - ports: - port: "5432"EOFPart 5: Verify Security
Section titled “Part 5: Verify Security”# Frontend → API: Should workkubectl -n demo exec frontend -- curl -s --max-time 5 api:3000echo "✓ Frontend → API: ALLOWED"
# API → Database: Should workkubectl -n demo exec api -- nc -zv -w 2 database 5432echo "✓ API → Database: ALLOWED"
# Frontend → Database: Should be BLOCKEDkubectl -n demo exec frontend -- nc -zv -w 2 database 5432 || echo "✓ Frontend → Database: BLOCKED (as intended!)"
# Watch the flow in Hubblehubble observe --namespace demo
# See what's being droppedhubble observe --namespace demo --verdict DROPPEDSuccess Criteria
Section titled “Success Criteria”- Cilium installed and connectivity test passes
- Default deny policy blocks all traffic
- Hubble shows DROPPED verdict for blocked traffic
- Frontend can reach API on port 3000
- API can reach Database on port 5432
- Frontend CANNOT reach Database directly
- Hubble shows FORWARDED for allowed traffic
Bonus Challenge
Section titled “Bonus Challenge”Add an L7 policy that only allows HTTP GET requests from frontend to api:
apiVersion: cilium.io/v2kind: CiliumNetworkPolicymetadata: name: frontend-to-api-l7 namespace: demospec: endpointSelector: matchLabels: app: api ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: "3000" rules: http: - method: "GET" path: "/.*"Test that POST requests are blocked:
kubectl -n demo exec frontend -- curl -X POST api:3000 || echo "POST blocked by L7 policy"kubectl -n demo exec frontend -- curl -X GET api:3000 && echo "GET allowed"Cleanup
Section titled “Cleanup”# Delete the lab clusterkind delete cluster --name cilium-labFurther Reading
Section titled “Further Reading”- Cilium Documentation - The official docs, well-written
- eBPF.io - Deep dive into eBPF technology
- Cilium Network Policy Editor - Visual policy builder (great for learning)
- Hubble Documentation
- Isovalent Blog - Advanced Cilium use cases from the creators
Next Module
Section titled “Next Module”Continue to Module 5.2: Service Mesh to learn about service mesh patterns with Istio, and when sidecar-free approaches make sense.
“The network that explains itself is the network you can actually secure.”
Sources
Section titled “Sources”- Cilium Repository README — Upstream overview of Cilium’s eBPF dataplane, identity model, policy features, and observability integrations.
- Cilium CLI Repository README — Upstream reference for installing Cilium and running
cilium connectivity testchecks. - Hubble Repository README — Upstream overview of Hubble’s flow visibility, troubleshooting workflow, and service-map style observability.
- Linux Kernel eBPF Verifier Documentation — Primary kernel documentation for how the eBPF verifier checks program safety and memory access.
- kubernetes.io: nftables kube proxy — The Kubernetes nftables blog explains iptables-mode kube-proxy rule scaling, O(n) lookup, and large-cluster programming latency.
- github.com: component overview.rst — Cilium’s component overview describes eBPF’s packet-filter origin, extensions, verifier, JIT compiler, and kernel hook points.
- github.com: bpf.h — The Linux kernel header defines BPF_COMPLEXITY_LIMIT_INSNS as 1000000.
- github.com: terminology.rst — Cilium terminology docs directly define identity derivation, cluster-wide numeric identifiers, shared identities, and policy use.
- github.com: numericidentity.go — Cilium’s numeric identity code defines MinimalNumericIdentity as 256 and enumerates the reserved host, world, and health identities.
- github.com: policy.rst — The Cilium Kubernetes policy docs list all three policy formats and their scope.
- github.com: layer7.rst — The L7 policy docs define HTTP method/path/header matching and state that requests not matching rules are denied.
- github.com: layer3.rst — The Layer 3 policy docs describe DNS-based rules, DNS proxy data collection, cached DNS responses, and TTL handling.
- github.com: setup.rst — The Hubble setup docs cover enabling Hubble and troubleshooting metrics configuration.
- github.com: kubeproxy free.rst — The kube-proxy-free docs state that Cilium’s eBPF kube-proxy replacement handles those Kubernetes Service types.
- github.com: encryption wireguard.rst — The WireGuard transparent encryption docs show the same Cilium install options and explain node-to-node tunnel setup.
- cncf.io: cloud native computing foundation announces cilium graduation — The CNCF announcement directly states the graduation date and describes the due diligence, audit, and maturity validation.
- github.com: v5.6 — The Linux v5.6 GitHub release tag shows the Linux 5.6 release date.
- github.com: wireguard — The Linux v5.6 source tree contains the drivers/net/wireguard implementation.