Module 5.2: EKS Networking Deep Dive (VPC CNI)
Complexity: [COMPLEX] | Time to Complete: 3.5h | Prerequisites: Module 5.1 (EKS Architecture & Control Plane)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure the AWS VPC CNI plugin with custom networking, prefix delegation, and secondary CIDR ranges for large clusters
- Implement EKS networking with security groups for pods, network policies, and pod-level traffic isolation
- Deploy AWS Load Balancer Controller to provision ALB ingress and NLB services from Kubernetes manifests
- Diagnose pod networking failures related to IP exhaustion, ENI limits, and subnet routing misconfigurations
Why This Module Matters
Section titled “Why This Module Matters”In March 2023, a major European e-commerce platform running 800 pods across 40 EKS nodes hit a wall during their annual spring sale. At 09:12, Kubernetes could not schedule new pods. The error was not about CPU or memory. It was FailedCreatePodSandBox: failed to setup network for sandbox: no available IP addresses. Their VPC subnets had run out of IP addresses. The VPC CNI plugin assigns a real VPC IP address to every single pod — and with each m5.xlarge node consuming up to 58 IP addresses (14 ENIs x 4 secondary IPs + the node’s primary IPs), their /24 subnets were mathematically exhausted. The sale was live, customers were clicking, and the platform could not scale. The engineering team spent 90 minutes frantically adding secondary CIDR blocks and reconfiguring subnets while losing an estimated EUR 2.3 million in revenue.
This is the most common production failure mode specific to EKS. Unlike most Kubernetes distributions that use overlay networks (where pod IPs are virtual and unlimited), EKS uses the VPC CNI plugin, which gives every pod a routable VPC IP address. This is both a superpower (native VPC networking, security groups on pods, no overlay overhead) and a trap (finite IP address space that can run out at the worst possible moment).
In this module, you will master the VPC CNI mechanics, understand IP allocation modes including Prefix Delegation that 16x your IP capacity per ENI slot, learn how to solve IP exhaustion with Custom Networking and secondary CIDRs, configure Security Groups for Pods, set up the AWS Load Balancer Controller for ALB and NLB ingress, and understand EKS IPv6 networking.
VPC CNI: How Pods Get Their IP Addresses
Section titled “VPC CNI: How Pods Get Their IP Addresses”The Amazon VPC CNI plugin (aws-node DaemonSet) is the default networking solution for EKS. Unlike overlay networks (Calico, Cilium in overlay mode, Flannel), the VPC CNI assigns each pod a real, routable IP address from your VPC subnet. This means pods can communicate directly with any VPC resource — RDS databases, ElastiCache clusters, Lambda functions — without NAT or encapsulation.
Secondary IP Mode (Default)
Section titled “Secondary IP Mode (Default)”In the default mode, the VPC CNI pre-allocates secondary IP addresses on each node’s Elastic Network Interfaces (ENIs). When a pod is scheduled, it receives one of these pre-allocated IPs.
EC2 Instance (m5.xlarge)├── ENI-0 (Primary)│ ├── Primary IP: 10.0.10.5 (node IP)│ ├── Secondary IP: 10.0.10.6 → Pod A│ ├── Secondary IP: 10.0.10.7 → Pod B│ └── Secondary IP: 10.0.10.8 → (warm pool)├── ENI-1 (Secondary)│ ├── Primary IP: 10.0.10.20 (ENI primary, not used by pods)│ ├── Secondary IP: 10.0.10.21 → Pod C│ ├── Secondary IP: 10.0.10.22 → Pod D│ └── Secondary IP: 10.0.10.23 → (warm pool)└── ENI-2 (Secondary) ├── Primary IP: 10.0.10.35 ├── Secondary IP: 10.0.10.36 → Pod E ├── Secondary IP: 10.0.10.37 → (warm pool) └── Secondary IP: 10.0.10.38 → (warm pool)The number of pods a node can run is directly limited by the formula:
Max Pods = (Number of ENIs x (IPs per ENI - 1)) + 2
For m5.xlarge: ENIs: 4, IPs per ENI: 15 Max Pods = (4 x (15 - 1)) + 2 = 58The -1 accounts for the primary IP on each ENI (used by the ENI itself, not assignable to pods). The +2 accounts for the node’s host-networking pods (kube-proxy and aws-node themselves).
The Warm Pool: WARM_ENI_TARGET and WARM_IP_TARGET
Section titled “The Warm Pool: WARM_ENI_TARGET and WARM_IP_TARGET”The VPC CNI pre-allocates IPs to reduce pod startup latency. By default, it maintains one “warm” ENI (an ENI with all its IPs pre-allocated but unassigned to pods). This means a fresh node immediately consumes IPs for the entire warm ENI, even if no pods are scheduled.
# Check current VPC CNI configurationk get daemonset aws-node -n kube-system -o json | \ jq '.spec.template.spec.containers[0].env[] | select(.name | startswith("WARM"))'Tuning the warm pool is critical for IP conservation:
| Variable | Default | Effect |
|---|---|---|
WARM_ENI_TARGET | 1 | Number of warm (fully pre-allocated) ENIs to keep ready |
WARM_IP_TARGET | Not set | Number of warm IPs to keep ready (overrides WARM_ENI_TARGET) |
MINIMUM_IP_TARGET | Not set | Minimum IPs to keep allocated at all times |
For IP-constrained environments, set WARM_IP_TARGET instead of WARM_ENI_TARGET:
# Configure VPC CNI to keep only 2 warm IPs instead of an entire warm ENIk set env daemonset aws-node -n kube-system \ WARM_IP_TARGET=2 \ WARM_ENI_TARGET=0 \ MINIMUM_IP_TARGET=4This reduces IP waste from ~15 IPs per node (one warm ENI) to just 2, but pod startup may be slightly slower when new ENIs need to be attached.
Prefix Delegation Mode
Section titled “Prefix Delegation Mode”Prefix Delegation fundamentally changes the IP math. Instead of assigning individual secondary IPs to each ENI slot, the VPC CNI assigns /28 prefixes (16 IP addresses each) to each ENI slot. This multiplies your pod capacity by up to 16x per node.
Secondary IP Mode (default): Prefix Delegation Mode:ENI Slot → 1 IP address ENI Slot → /28 prefix (16 IPs)
m5.xlarge: m5.xlarge: 4 ENIs x 15 slots = 60 IPs max 4 ENIs x 15 slots x 16 = 960 IPs max Max pods: ~58 Max pods: 110 (capped by EKS)EKS caps the maximum pods at 110 for most instance types (250 for some larger types), even if Prefix Delegation provides more IPs than that. The bottleneck shifts from IP addresses to node CPU and memory.
Stop and think: If Prefix Delegation multiplies IP capacity by 16x, why does EKS still cap an m5.xlarge at 110 pods instead of the theoretical 960? (Hint: IP addresses are not the only resource a pod consumes on a node).
# Enable Prefix Delegationk set env daemonset aws-node -n kube-system \ ENABLE_PREFIX_DELEGATION=true \ WARM_PREFIX_TARGET=1
# IMPORTANT: Update your node group's max-pods setting# For managed node groups, use a launch template with custom user data:# --kubelet-extra-args '--max-pods=110'
# Verify prefix delegation is activek get ds aws-node -n kube-system -o json | \ jq '.spec.template.spec.containers[0].env[] | select(.name=="ENABLE_PREFIX_DELEGATION")'After enabling Prefix Delegation, you must also update the max-pods setting on your nodes. Without this, the kubelet still uses the old secondary-IP-based pod limit, and the extra IPs go to waste.
How Prefix Delegation looks on a node:
EC2 Instance (m5.xlarge) with Prefix Delegation├── ENI-0 (Primary)│ ├── Primary IP: 10.0.10.5 (node IP)│ ├── Prefix: 10.0.10.16/28 → 16 IPs for pods│ ├── Prefix: 10.0.10.32/28 → 16 IPs for pods│ └── Prefix: 10.0.10.48/28 → 16 IPs (warm pool)├── ENI-1 (Secondary)│ ├── Primary IP: 10.0.10.100│ ├── Prefix: 10.0.10.112/28 → 16 IPs for pods│ └── Prefix: 10.0.10.128/28 → 16 IPs (warm pool)...Solving IP Exhaustion
Section titled “Solving IP Exhaustion”Even with Prefix Delegation, large clusters can exhaust their subnet IP space. Here are the production-grade solutions.
Solution 1: Secondary CIDR Blocks
Section titled “Solution 1: Secondary CIDR Blocks”Add a non-routable (RFC 6598) CIDR block to your VPC specifically for pod IPs. The 100.64.0.0/10 range is commonly used because it does not conflict with typical RFC 1918 ranges.
# Add secondary CIDR to VPCaws ec2 associate-vpc-cidr-block \ --vpc-id $VPC_ID \ --cidr-block 100.64.0.0/16
# Create new subnets in the secondary CIDR rangePOD_SUB1=$(aws ec2 create-subnet \ --vpc-id $VPC_ID \ --cidr-block 100.64.0.0/19 \ --availability-zone us-east-1a \ --query 'Subnet.SubnetId' --output text)
POD_SUB2=$(aws ec2 create-subnet \ --vpc-id $VPC_ID \ --cidr-block 100.64.32.0/19 \ --availability-zone us-east-1b \ --query 'Subnet.SubnetId' --output text)
# Tag for EKSaws ec2 create-tags --resources $POD_SUB1 $POD_SUB2 \ --tags Key=Name,Value=EKS-Pod-SubnetSolution 2: Custom Networking (ENIConfig)
Section titled “Solution 2: Custom Networking (ENIConfig)”Custom Networking tells the VPC CNI to place pod ENIs in different subnets than the node’s primary ENI. Combined with secondary CIDRs, this gives pods a massive, separate IP space.
# Enable custom networking on the VPC CNIk set env daemonset aws-node -n kube-system \ AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true \ ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zoneCreate ENIConfig resources that map Availability Zones to pod subnets:
apiVersion: crd.k8s.amazonaws.com/v1alpha1kind: ENIConfigmetadata: name: us-east-1aspec: subnet: subnet-aaa111 # Pod subnet in 100.64.0.0/19 securityGroups: - sg-0abc123def456 # Security group for pods---# eniconfig-us-east-1b.yamlapiVersion: crd.k8s.amazonaws.com/v1alpha1kind: ENIConfigmetadata: name: us-east-1bspec: subnet: subnet-bbb222 # Pod subnet in 100.64.32.0/19 securityGroups: - sg-0abc123def456k apply -f eniconfig-us-east-1a.yamlk apply -f eniconfig-us-east-1b.yamlAfter enabling Custom Networking, the architecture looks like this:
┌────────────────────────────────────────────────────────────────┐│ VPC: Primary CIDR 10.0.0.0/16 + Secondary CIDR 100.64.0.0/16││ ││ ┌────── Node Subnet (10.0.10.0/24) ──────┐ ││ │ Node Primary ENI: 10.0.10.x │ ││ │ (only node IPs live here) │ ││ └─────────────────────────────────────────┘ ││ ││ ┌────── Pod Subnet (100.64.0.0/19) ──────┐ ││ │ Pod ENIs: 100.64.x.x │ ││ │ 8,192 IPs available for pods! │ ││ └─────────────────────────────────────────┘ │└────────────────────────────────────────────────────────────────┘Pause and predict: If we place pod ENIs into a separate subnet from the node’s primary ENI, what happens to the ENI slot that the node’s primary interface occupies? Can pods still use it?
Important: With Custom Networking, the node’s primary ENI is NOT used for pod IPs. This means the max-pods formula loses one ENI:
Max Pods = ((ENIs - 1) x (IPs per ENI - 1)) + 2. Form5.xlarge, that drops from 58 to 44 in secondary IP mode. Combine Custom Networking with Prefix Delegation to get the best of both worlds.
War Story: The e-commerce company from the opening eventually implemented Custom Networking with a 100.64.0.0/16 secondary CIDR. Their pod subnets went from 251 IPs (a /24) to 8,192 IPs per AZ (a /19). Combined with Prefix Delegation, they ran their next spring sale with 2,400 pods and never came close to exhaustion. The total migration took two weeks, including new node groups — you cannot enable Custom Networking on existing nodes.
Security Groups for Pods
Section titled “Security Groups for Pods”By default, all pods on a node share the node’s security groups. Security Groups for Pods allows you to assign VPC security groups directly to individual pods, enabling network-level isolation at the pod granularity rather than the node level.
How It Works
Section titled “How It Works”Security Groups for Pods uses a feature called “branch ENIs” (also called trunk ENIs). The VPC CNI creates a trunk ENI on the node and then creates branch ENIs off that trunk, each with its own security group.
Without SG for Pods: With SG for Pods:┌────────────────────┐ ┌────────────────────┐│ Node SG: sg-node │ │ Trunk ENI ││ │ │ ├─ Branch ENI ││ Pod A ─┐ │ │ │ SG: sg-frontend││ Pod B ─┤ All use │ │ │ → Pod A ││ Pod C ─┘ sg-node │ │ ├─ Branch ENI ││ │ │ │ SG: sg-backend │└────────────────────┘ │ │ → Pod B │ │ └─ Branch ENI │ │ SG: sg-db │ │ → Pod C │ └────────────────────┘Enabling Security Groups for Pods
Section titled “Enabling Security Groups for Pods”# Enable the feature on the VPC CNIk set env daemonset aws-node -n kube-system \ ENABLE_POD_ENI=true \ POD_SECURITY_GROUP_ENFORCING_MODE=standardCreate a SecurityGroupPolicy resource that maps pods to security groups:
apiVersion: vpcresources.k8s.aws/v1beta1kind: SecurityGroupPolicymetadata: name: backend-sgp namespace: productionspec: podSelector: matchLabels: app: payment-service securityGroups: groupIds: - sg-0abc123def456 # Allow only port 8080 from ALB - sg-0def789ghi012 # Allow only port 5432 to RDSAny pod in the production namespace with the label app: payment-service will now use these specific security groups instead of the node’s security groups.
Limitations to Know
Section titled “Limitations to Know”- Pods with security groups cannot use NodePort or HostPort services
- Node must use a Nitro-based instance type (m5, m6i, c5, r5, etc.)
- Each branch ENI consumes one of the node’s ENI slots, reducing pod capacity
- Security group changes require pod restart (not hot-reloaded)
AWS Load Balancer Controller
Section titled “AWS Load Balancer Controller”The AWS Load Balancer Controller is the modern replacement for the legacy kube-proxy-based Service type LoadBalancer. It provisions and configures AWS Application Load Balancers (ALBs) for Ingress resources and Network Load Balancers (NLBs) for Service type LoadBalancer.
Installation
Section titled “Installation”# Install via Helmhelm repo add eks https://aws.github.io/eks-chartshelm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ --set clusterName=my-cluster \ --set serviceAccount.create=true \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/AWSLoadBalancerControllerRoleALB for HTTP/HTTPS Traffic
Section titled “ALB for HTTP/HTTPS Traffic”The controller creates an ALB when you create an Ingress resource with the alb ingress class:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: web-ingress namespace: production annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc-123 alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]' alb.ingress.kubernetes.io/ssl-redirect: "443" alb.ingress.kubernetes.io/healthcheck-path: /healthz alb.ingress.kubernetes.io/group.name: shared-albspec: ingressClassName: alb rules: - host: app.example.com http: paths: - path: / pathType: Prefix backend: service: name: web-service port: number: 80Key annotations explained:
| Annotation | Purpose |
|---|---|
scheme: internet-facing | Public ALB (vs. internal for private) |
target-type: ip | Route directly to pod IPs (vs. instance for NodePort) |
group.name | Share one ALB across multiple Ingress resources (cost savings) |
ssl-redirect | Automatic HTTP-to-HTTPS redirect |
certificate-arn | ACM certificate for TLS termination |
The target-type: ip annotation is critical for EKS. It tells the ALB to send traffic directly to pod IP addresses (which are real VPC IPs, thanks to the VPC CNI). This bypasses the kube-proxy hop and gives you direct pod-level health checking.
NLB for gRPC and TCP Traffic
Section titled “NLB for gRPC and TCP Traffic”For non-HTTP workloads (gRPC, TCP, UDP), use a Service type LoadBalancer that creates an NLB:
apiVersion: v1kind: Servicemetadata: name: grpc-service namespace: production annotations: service.beta.kubernetes.io/aws-load-balancer-type: external service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: HTTP service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /grpc.health.v1.Health/Check service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"spec: type: LoadBalancer loadBalancerClass: service.k8s.aws/nlb selector: app: grpc-backend ports: - name: grpc port: 443 targetPort: 8443 protocol: TCPALB vs. NLB Decision Matrix
Section titled “ALB vs. NLB Decision Matrix”| Feature | ALB (Application LB) | NLB (Network LB) |
|---|---|---|
| OSI Layer | Layer 7 (HTTP/HTTPS) | Layer 4 (TCP/UDP/TLS) |
| Protocols | HTTP, HTTPS, gRPC (HTTP/2) | TCP, UDP, TLS |
| Path routing | Yes (host, path, header) | No |
| WebSocket | Yes | Yes (TCP) |
| Static IP | No (use Global Accelerator) | Yes (Elastic IP per AZ) |
| Latency | ~1-5ms added | ~100us added |
| gRPC | ALB supports gRPC natively | NLB via TLS passthrough |
| Cost | $0.0225/hr + LCU | $0.0225/hr + NLCU |
| Best for | Web apps, REST APIs | gRPC, databases, gaming, IoT |
Pause and predict: If your application uses WebSockets which require long-lived persistent connections, which load balancer type would provide the most efficient routing without connection drops during scaling events?
IPv6 on EKS
Section titled “IPv6 on EKS”EKS supports IPv6-only pods, which eliminates IP exhaustion entirely by giving every pod a unique IPv6 address from your VPC’s /56 range (over 4 billion billion IPs).
Enabling IPv6
Section titled “Enabling IPv6”IPv6 must be configured at cluster creation — you cannot migrate an existing IPv4 cluster to IPv6.
# Create an IPv6 clusteraws eks create-cluster \ --name ipv6-cluster \ --role-arn $EKS_ROLE_ARN \ --kubernetes-network-config ipFamily=ipv6 \ --resources-vpc-config subnetIds=$SUB1,$SUB2,endpointPublicAccess=true,endpointPrivateAccess=true \ --kubernetes-version 1.32In IPv6 mode:
- Pods get IPv6 addresses only
- Services get both IPv4 and IPv6 cluster IPs (dual-stack)
- Node-to-node traffic uses IPv6
- External traffic uses IPv6 (requires IPv6-capable VPC and subnets)
- No IP exhaustion (the
/56provides 4,722,366,482,869,645,213,696 addresses)
The trade-off is that many AWS services and third-party tools still have limited IPv6 support. Test thoroughly before adopting.
Did You Know?
Section titled “Did You Know?”-
The VPC CNI’s default behavior of pre-allocating one warm ENI per node means a 100-node cluster with
m5.xlargeinstances wastes approximately 1,400 IP addresses just on warm pools (14 IPs per warm ENI x 100 nodes). By switchingWARM_IP_TARGET=2andWARM_ENI_TARGET=0, you can recover roughly 1,200 of those IPs immediately. For teams hitting IP exhaustion, this is often the fastest fix before migrating to Prefix Delegation. -
Prefix Delegation was introduced in 2021 and is now the AWS-recommended default for new clusters. A single
m5.xlargenode goes from supporting 58 pods to 110 pods (the EKS-imposed cap), while consuming fewer IP reservation calls because a/28prefix is allocated atomically rather than 15 individual IPs. This also reduces EC2 API throttling during large-scale node launches. -
The AWS Load Balancer Controller’s
group.nameannotation lets you share a single ALB across dozens of Ingress resources. Without it, every Ingress creates its own ALB at $16/month minimum. A team with 30 microservices each exposing an Ingress could be paying $480/month in ALB charges when a single shared ALB with path-based routing would cost $16/month plus traffic. -
Security Groups for Pods use Nitro’s “branch ENI” capability, which was originally designed for AWS ECS. The trunk/branch architecture allows up to 110 branch ENIs on a single node (depending on instance type), each with its own security group. This is the same technology that makes ECS task-level networking work, repurposed for Kubernetes pods.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Not enabling Prefix Delegation on new clusters | Unaware it exists, or using default VPC CNI settings from older guides. | Enable ENABLE_PREFIX_DELEGATION=true and update max-pods in your node group launch template. This should be default for all new clusters. |
| IP exhaustion from warm ENI pre-allocation | Default WARM_ENI_TARGET=1 wastes 14+ IPs per node on pre-allocated but unused ENIs. | Set WARM_IP_TARGET=2 and WARM_ENI_TARGET=0 in the aws-node DaemonSet environment variables. |
Using target-type: instance with ALB | Copying old examples that pre-date the ip target type. Instance mode adds a NodePort hop and loses pod-level health checks. | Always use target-type: ip with the AWS Load Balancer Controller. It routes directly to pod IPs and enables pod-level health checking. |
| Creating a separate ALB per Ingress | Not knowing about the group.name annotation for ALB sharing. | Add alb.ingress.kubernetes.io/group.name: shared-alb to Ingress annotations. Multiple Ingress resources share one ALB. |
| Forgetting max-pods after enabling Prefix Delegation | Enabling PD on the VPC CNI but not updating the kubelet configuration on nodes. | Use a launch template with --kubelet-extra-args '--max-pods=110' or use the EKS-recommended max-pods calculator script. |
| Custom Networking without new node groups | Enabling AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true on existing nodes that were not provisioned with ENIConfig. | Custom Networking requires rolling out new node groups. Existing nodes must be drained and replaced. |
| NLB with missing cross-zone annotation | Assuming NLB distributes evenly across AZs by default. NLB is zonal by default — each AZ node gets equal share regardless of pod count. | Set aws-load-balancer-cross-zone-load-balancing-enabled: "true" for even distribution. |
| Security Groups for Pods on non-Nitro instances | Using t2 or m4 instance types that do not support trunk/branch ENIs. | Use Nitro-based instances (m5, m6i, c5, r5, t3, and newer). Check the instance type compatibility matrix. |
Question 1: Your EKS cluster runs on m5.xlarge nodes. In secondary IP mode, each node can run 58 pods. After enabling Prefix Delegation, you expect 110 pods per node, but nodes still cap at 58 pods. What did you miss?
You forgot to update the max-pods setting on the nodes. Prefix Delegation changes how the VPC CNI allocates IPs, but the kubelet enforces its own pod limit independently. You need to update the launch template’s user data to include --kubelet-extra-args '--max-pods=110' and roll out new nodes. The VPC CNI can allocate hundreds of IPs via prefix delegation, but if the kubelet still thinks the max is 58, it will reject any scheduling beyond that limit.
Question 2: Your EKS cluster is running 50 nodes of `m5.xlarge`. You notice that even though you only have 100 pods deployed across the entire cluster, you have exhausted over 700 IPs from your VPC subnet. The cluster is using default VPC CNI settings. A colleague suggests changing `WARM_ENI_TARGET` to 0 and setting `WARM_IP_TARGET=2`. Will this resolve the IP exhaustion, and what trade-off are you making?
Yes, this will immediately recover a massive number of IPs. By default, WARM_ENI_TARGET=1 keeps an entire ENI (up to 14 secondary IPs on an m5.xlarge) fully pre-allocated per node, which means 50 nodes waste about 700 IPs just sitting idle in the warm pool. By switching to WARM_IP_TARGET=2, you instruct the VPC CNI to only keep 2 IPs pre-allocated per node, returning the rest to the VPC. The trade-off is that when a node needs to schedule a 3rd pod rapidly, it must make an AWS API call to attach a new ENI or assign a new IP, introducing 1-2 seconds of pod startup latency.
Question 3: You just migrated your EKS cluster to use Custom Networking to solve IP exhaustion, mapping pod IPs to a massive `100.64.0.0/16` secondary CIDR. However, immediately after rolling out the new node groups, you get alerts that `m5.xlarge` nodes are failing to schedule more than 44 pods, even though they used to schedule 58 pods before the migration. What is causing this capacity reduction, and how can you fix it?
The reduction is happening because Custom Networking reserves the node’s primary ENI exclusively for node-level communication in the primary subnet, completely removing it from the pod IP allocation pool. Previously, the primary ENI could host secondary IPs for pods, but now only the secondary ENIs (which are attached to the Custom Networking subnets) can host pods. For an m5.xlarge, this reduces the usable ENIs from 4 to 3, dropping max pods from 58 to 44. To fix this and massively increase capacity, you should enable Prefix Delegation alongside Custom Networking, which will assign /28 prefixes to those remaining ENI slots and allow the node to easily hit the EKS hard cap of 110 pods.
Question 4: You have a Kubernetes Ingress with annotation `target-type: instance` and pods running on 10 nodes across 3 AZs. A pod fails its health check. What happens to traffic?
With target-type: instance, the ALB targets the NodePort on each node, not individual pods. The ALB health checks the NodePort — and if any pod behind that NodePort on a specific node fails, kube-proxy may still route traffic to the unhealthy pod because the ALB only sees the node as healthy or unhealthy. This means traffic can reach unhealthy pods until kube-proxy removes the endpoint. With target-type: ip, the ALB health-checks each pod directly and stops sending traffic to failed pods within seconds, regardless of the node.
Question 5: Your team uses Security Groups for Pods to isolate a payment service. After applying the SecurityGroupPolicy, the payment pods cannot resolve DNS. What went wrong?
When you assign security groups to pods via SecurityGroupPolicy, those pods use the specified security groups instead of the node’s security groups. If the pod-specific security groups do not include an outbound rule allowing DNS traffic (UDP port 53 to the CoreDNS service IP, typically 10.100.0.10), DNS resolution fails. The fix is to add an outbound rule for UDP/TCP port 53 to the CoreDNS cluster IP CIDR (or the VPC CIDR) in the pod’s security group.
Question 6: Your platform hosts 45 different microservices, each with its own standard Kubernetes Ingress resource using the `alb` ingress class. Finance just flagged your AWS bill because you are spending over $700 per month just on Application Load Balancers. You need to reduce this cost immediately without changing the routing behavior for the clients. How can you architect this change using the AWS Load Balancer Controller, and what operational risk does it introduce?
You can consolidate all 45 microservices behind a single Application Load Balancer by adding the alb.ingress.kubernetes.io/group.name: shared-alb annotation to all 45 Ingress resources. The AWS Load Balancer Controller will merge these into a single ALB with path-based or host-based listener rules, reducing your fixed ALB hourly costs from 45 LBs down to just 1. However, this introduces a shared blast radius risk: if someone deploys a misconfigured Ingress that breaks the ALB listener rules, or if you exceed the AWS quota of 100 rules per ALB, all 45 microservices could experience routing failures simultaneously. It is best practice to group non-critical services together while keeping highly critical domains on dedicated ALBs.
Question 7: Your VPC uses 10.0.0.0/16 and you have exhausted all IPs in your EKS subnets. You need more IPs immediately. What are your two fastest options?
Option 1: Tune the VPC CNI warm pool. Set WARM_IP_TARGET=1 and WARM_ENI_TARGET=0 on the aws-node DaemonSet. This immediately releases pre-allocated but unused IPs across all nodes, often recovering hundreds of IPs within minutes. Option 2: Enable Prefix Delegation (ENABLE_PREFIX_DELEGATION=true). This changes the allocation from individual IPs to /28 prefixes, dramatically reducing the number of IPs consumed per ENI slot while increasing pod capacity. Both changes take effect within minutes as the aws-node DaemonSet rolls out, though Prefix Delegation requires updating max-pods on nodes (meaning a rolling restart). For a longer-term fix, add a secondary CIDR (e.g., 100.64.0.0/16) with Custom Networking.
Hands-On Exercise: Prefix Delegation + ALB for Web + NLB for gRPC
Section titled “Hands-On Exercise: Prefix Delegation + ALB for Web + NLB for gRPC”In this exercise, you will configure an EKS cluster with Prefix Delegation for maximum IP efficiency, deploy a web application behind an ALB, and a gRPC service behind an NLB.
What you will build:
┌────────────────────────────────────────────────────────────────┐│ Internet ││ │ │ ││ ▼ ▼ ││ ┌──────────┐ ┌──────────┐ ││ │ ALB │ │ NLB │ ││ │ (HTTPS) │ │ (TCP) │ ││ └────┬─────┘ └────┬─────┘ ││ │ │ ││ ▼ ▼ ││ ┌─────────┐ ┌─────────┐ ││ │Web Pods │ │gRPC Pods│ ││ │(IP mode)│ │(IP mode)│ ││ └─────────┘ └─────────┘ ││ ││ VPC CNI: Prefix Delegation enabled ││ Max Pods: 110 per node │└────────────────────────────────────────────────────────────────┘Task 1: Enable Prefix Delegation on the VPC CNI
Section titled “Task 1: Enable Prefix Delegation on the VPC CNI”Configure the VPC CNI for Prefix Delegation and verify it is working.
Solution
# Enable Prefix Delegationk set env daemonset aws-node -n kube-system \ ENABLE_PREFIX_DELEGATION=true \ WARM_PREFIX_TARGET=1
# Wait for the DaemonSet to roll outk rollout status daemonset aws-node -n kube-system --timeout=120s
# Verify on a node (check that prefixes are assigned, not individual IPs)NODE_NAME=$(k get nodes -o jsonpath='{.items[0].metadata.name}')k get node $NODE_NAME -o json | jq '.status.allocatable["vpc.amazonaws.com/pod-ens"]'
# Check ENI details via AWS CLIINSTANCE_ID=$(k get node $NODE_NAME -o json | jq -r '.spec.providerID' | cut -d'/' -f5)aws ec2 describe-instances --instance-ids $INSTANCE_ID \ --query 'Reservations[0].Instances[0].NetworkInterfaces[*].{ENI:NetworkInterfaceId, Ipv4Prefixes:Ipv4Prefixes[*].Ipv4Prefix}' \ --output json
# You should see /28 prefixes instead of individual secondary IPsTask 2: Update Node Group Max-Pods
Section titled “Task 2: Update Node Group Max-Pods”Ensure the kubelet allows 110 pods to take advantage of Prefix Delegation.
Solution
# Create a new launch template with updated max-podscat > /tmp/eks-userdata.txt << 'USERDATA'#!/bin/bash/etc/eks/bootstrap.sh my-cluster \ --kubelet-extra-args '--max-pods=110'USERDATA
USERDATA_B64=$(base64 -i /tmp/eks-userdata.txt)
# Create launch templateLT_ID=$(aws ec2 create-launch-template \ --launch-template-name eks-prefix-delegation \ --launch-template-data "{ \"UserData\": \"$USERDATA_B64\", \"InstanceType\": \"m6i.large\" }" \ --query 'LaunchTemplate.LaunchTemplateId' --output text)
# Update the node group to use the new launch templateaws eks update-nodegroup-config \ --cluster-name my-cluster \ --nodegroup-name standard-workers \ --launch-template id=$LT_ID,version=1
# Wait for the update (this triggers a rolling replacement)aws eks wait nodegroup-active \ --cluster-name my-cluster \ --nodegroup-name standard-workers
# Verify max-pods on a new nodek get node -o json | jq '.items[0].status.allocatable.pods'# Should show "110"Task 3: Install the AWS Load Balancer Controller
Section titled “Task 3: Install the AWS Load Balancer Controller”Solution
# Add the EKS Helm repohelm repo add eks https://aws.github.io/eks-chartshelm repo update
# Create the IAM policy for the controllercurl -o /tmp/iam_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json
aws iam create-policy \ --policy-name AWSLoadBalancerControllerIAMPolicy \ --policy-document file:///tmp/iam_policy.json
# Install the controllerhelm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ --set clusterName=my-cluster \ --set serviceAccount.create=true \ --set serviceAccount.name=aws-load-balancer-controller \ --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/AWSLoadBalancerControllerRole
# Verify the controller is runningk get deployment aws-load-balancer-controller -n kube-systemk get pods -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controllerTask 4: Deploy a Web Application Behind an ALB
Section titled “Task 4: Deploy a Web Application Behind an ALB”Solution
# Create namespacek create namespace web-demo
# Deploy the web applicationcat <<'EOF' | k apply -f -apiVersion: apps/v1kind: Deploymentmetadata: name: web-app namespace: web-demospec: replicas: 3 selector: matchLabels: app: web-app template: metadata: labels: app: web-app spec: containers: - name: nginx image: nginx:1.27 ports: - containerPort: 80 readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 5 periodSeconds: 10 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi---apiVersion: v1kind: Servicemetadata: name: web-app-svc namespace: web-demospec: selector: app: web-app ports: - port: 80 targetPort: 80 type: ClusterIP---apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: web-app-ingress namespace: web-demo annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/healthcheck-path: / alb.ingress.kubernetes.io/group.name: dojo-shared-albspec: ingressClassName: alb rules: - http: paths: - path: / pathType: Prefix backend: service: name: web-app-svc port: number: 80EOF
# Wait for ALB to provision (takes 2-3 minutes)echo "Waiting for ALB to provision..."sleep 30ALB_URL=$(k get ingress web-app-ingress -n web-demo -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')echo "ALB URL: http://$ALB_URL"
# Test (may take a minute for DNS propagation)curl -s -o /dev/null -w "%{http_code}" http://$ALB_URLTask 5: Deploy a gRPC Service Behind an NLB
Section titled “Task 5: Deploy a gRPC Service Behind an NLB”Solution
# Deploy a gRPC health check service (using grpcbin as example)cat <<'EOF' | k apply -f -apiVersion: apps/v1kind: Deploymentmetadata: name: grpc-service namespace: web-demospec: replicas: 2 selector: matchLabels: app: grpc-service template: metadata: labels: app: grpc-service spec: containers: - name: grpcbin image: moul/grpcbin:latest ports: - containerPort: 9000 name: grpc - containerPort: 9001 name: grpc-insecure resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi---apiVersion: v1kind: Servicemetadata: name: grpc-nlb namespace: web-demo annotations: service.beta.kubernetes.io/aws-load-balancer-type: external service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"spec: type: LoadBalancer loadBalancerClass: service.k8s.aws/nlb selector: app: grpc-service ports: - name: grpc port: 9000 targetPort: 9000 protocol: TCPEOF
# Wait for NLB to provisionsleep 30NLB_HOST=$(k get svc grpc-nlb -n web-demo -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')echo "NLB hostname: $NLB_HOST"
# Verify NLB targets are healthyNLB_ARN=$(aws elbv2 describe-load-balancers \ --query "LoadBalancers[?DNSName=='$NLB_HOST'].LoadBalancerArn" --output text)TG_ARN=$(aws elbv2 describe-target-groups \ --load-balancer-arn $NLB_ARN \ --query 'TargetGroups[0].TargetGroupArn' --output text)aws elbv2 describe-target-health --target-group-arn $TG_ARN \ --query 'TargetHealthDescriptions[*].{Target:Target.Id, Port:Target.Port, Health:TargetHealth.State}' \ --output tableTask 6: Verify Pod IP Allocation with Prefix Delegation
Section titled “Task 6: Verify Pod IP Allocation with Prefix Delegation”Confirm that pods are using IPs from allocated prefixes, not individual secondary IPs.
Solution
# Get pod IPsk get pods -n web-demo -o wide
# Pick a node and inspect its ENI prefixesNODE=$(k get pods -n web-demo -o jsonpath='{.items[0].spec.nodeName}')INSTANCE_ID=$(k get node $NODE -o json | jq -r '.spec.providerID' | cut -d'/' -f5)
# Show allocated prefixes on the instanceaws ec2 describe-instances --instance-ids $INSTANCE_ID \ --query 'Reservations[0].Instances[0].NetworkInterfaces[*].{ ENI: NetworkInterfaceId, Prefixes: Ipv4Prefixes[*].Ipv4Prefix, SecondaryIPs: PrivateIpAddresses[?Primary==`false`].PrivateIpAddress }' --output json
# You should see Prefixes populated and SecondaryIPs empty (or minimal)# Each prefix is a /28 = 16 IPs
# Verify max-podsk get node $NODE -o json | jq '.status.allocatable.pods'Clean Up
Section titled “Clean Up”k delete namespace web-demohelm uninstall aws-load-balancer-controller -n kube-system# Clean up ALB/NLB if they persist (check the AWS console)Success Criteria
Section titled “Success Criteria”- I enabled Prefix Delegation on the VPC CNI and verified
/28prefixes on node ENIs - I updated node max-pods to 110 to take advantage of Prefix Delegation
- I installed the AWS Load Balancer Controller via Helm
- I deployed a web application accessible through an ALB with
target-type: ip - I deployed a gRPC service accessible through an NLB with cross-zone load balancing
- I verified ALB/NLB target health shows pod IPs (not node IPs)
- I can explain why Prefix Delegation solves IP exhaustion for most clusters
Next Module
Section titled “Next Module”Your pods have IP addresses and your traffic flows through load balancers. But how do those pods authenticate to AWS services like S3, DynamoDB, and SQS? Head to Module 5.3: EKS Identity (IRSA vs Pod Identity) to master the transition from IRSA to the simpler Pod Identity system.