Skip to content

Module 4.4: Cloud-Native Networking and VPC Topologies

Complexity: [COMPLEX]

Time to Complete: 3.5 hours

Prerequisites: Module 4.1: Managed vs Self-Managed Kubernetes, Module 4.2: Multi-Cluster and Multi-Region Architectures

Track: Cloud Architecture Patterns

After completing this module, you will be able to:

  • Design cloud-native VPC/VNet topologies optimized for Kubernetes cluster networking across availability zones
  • Configure CIDR planning strategies that accommodate pod networking, service ranges, and future cluster growth
  • Implement private cluster architectures with private API endpoints and VPC-native routing
  • Compare overlay vs native CNI networking models across EKS (VPC CNI), GKE (Dataplane V2), and AKS (Azure CNI)

September 2022. A logistics company running 14 Kubernetes clusters on AWS.

The platform team got a Slack message at 2:15 AM: “Pods stuck in ContainerCreating.” The on-call engineer checked. Eighteen pods were waiting to be scheduled, and every attempt to create a new pod failed with the same error: failed to assign an IP address to the pod. The error was clear but the cause wasn’t obvious — the nodes had available CPU and memory.

The root cause took three hours to diagnose. The EKS clusters used the VPC CNI plugin in its default mode, where every pod receives a real VPC IP address. The team had provisioned their subnets with /24 CIDR blocks — 251 usable IPs per subnet. Each m5.2xlarge node could attach 4 Elastic Network Interfaces with 15 IPs each, consuming 60 IPs per node. With 12 nodes in the subnet, they needed 720 IPs. The subnet had 251.

The cluster had been slowly approaching this limit for months. Node autoscaling added more nodes, each consuming more IPs. Nobody monitored subnet IP utilization. When the threshold was crossed, new pods couldn’t get IPs, deployments failed, and the horizontal pod autoscaler’s scale-up attempts made the problem worse by requesting more pods that couldn’t get IPs.

The fix was an emergency subnet expansion — adding secondary CIDR blocks to the VPC and creating new, larger subnets. This is a disruptive change in production. Nodes had to be drained and relaunched in the new subnets. The full recovery took 11 hours.

This incident was entirely preventable with proper IPAM planning. In this module, you’ll learn to design VPC topologies that accommodate Kubernetes’ IP consumption patterns from day one, architect egress and ingress correctly, and connect multiple environments without the subnet overlaps that make peering impossible.


VPC Design for Kubernetes: IPAM Fundamentals

Section titled “VPC Design for Kubernetes: IPAM Fundamentals”

IP Address Management (IPAM) for Kubernetes is different from traditional infrastructure. In a VM-based world, each machine gets one IP. In Kubernetes with VPC-native networking (the default on all major managed platforms), each pod gets its own VPC IP address. This fundamentally changes how you plan subnets.

How Many IPs Does Kubernetes Actually Need?

Section titled “How Many IPs Does Kubernetes Actually Need?”
IP CONSUMPTION: TRADITIONAL VS KUBERNETES
═══════════════════════════════════════════════════════════════
Traditional (VM-based):
10 servers = 10 IPs
Planning: /24 subnet (251 IPs) lasts years
Kubernetes (VPC CNI, per-pod IP):
10 nodes × 30 pods each = 300 pod IPs + 10 node IPs = 310 IPs
Planning: /24 subnet (251 IPs) exhausted before you reach 10 nodes
Kubernetes (VPC CNI with prefix delegation):
10 nodes × 110 pods each = 1,100 pod IPs + 10 node IPs = 1,110 IPs
Planning: Need /20 (4,091 IPs) minimum for a medium cluster
Kubernetes (overlay network, e.g., Calico VXLAN):
10 nodes = 10 VPC IPs (pods use overlay, invisible to VPC)
Planning: /24 subnet is fine, but overlay adds network latency

On EKS with the default VPC CNI plugin, every pod gets a VPC IP address. Here’s exactly how IPs are consumed per node:

AWS VPC CNI: IP ALLOCATION PER NODE
═══════════════════════════════════════════════════════════════
Instance type: m5.2xlarge
Max ENIs: 4
Max IPs per ENI: 15
Max pods: (4 ENIs × (15 IPs - 1)) + 2 = 58 pods
How it works:
┌──────────────────────────────────────────────────────┐
│ Node: m5.2xlarge │
│ │
│ ENI 0 (primary) ENI 1 │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ IP 1: node addr │ │ IP 1: ENI addr │ │
│ │ IP 2: pod-a │ │ IP 2: pod-f │ │
│ │ IP 3: pod-b │ │ IP 3: pod-g │ │
│ │ ... │ │ ... │ │
│ │ IP 15: pod-n │ │ IP 15: pod-z │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ENI 2 ENI 3 │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ IP 1: ENI addr │ │ IP 1: ENI addr │ │
│ │ IP 2: pod-aa │ │ IP 2: pod-ff │ │
│ │ ... │ │ ... │ │
│ │ IP 15: pod-nn │ │ IP 15: pod-zz │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ Total IPs consumed from VPC: 60 │
│ (4 ENIs × 15 IPs each) │
└──────────────────────────────────────────────────────┘
With prefix delegation (recommended):
Each ENI gets /28 prefixes (16 IPs) instead of individual IPs
Max pods: 110 (Kubernetes limit, not ENI limit)
IPs consumed: Still 60 from VPC perspective, but each
/28 prefix provides 16 pod IPs from the prefix space
Effective: 110 pods using far fewer VPC-level IPs
Terminal window
# Enable prefix delegation on EKS (recommended for new clusters)
kubectl set env daemonset aws-node \
-n kube-system \
ENABLE_PREFIX_DELEGATION=true \
WARM_PREFIX_TARGET=1
# Check current IP allocation on a node
kubectl get node ip-10-0-1-42.ec2.internal -o json | \
jq '.status.allocatable["vpc.amazonaws.com/pod-eni"] // .status.capacity["vpc.amazonaws.com/PrivateIPv4Address"]'

Let’s walk through sizing a subnet for a new production cluster.

The Scenario: You are provisioning an EKS cluster using VPC CNI (without prefix delegation). The cluster will scale up to 20 worker nodes (m5.large, which support up to 3 ENIs and 10 IPs per ENI). You want to ensure the subnet can handle this maximum capacity plus 100% headroom for future growth.

Step 1: Calculate IPs per node An m5.large can attach 3 ENIs, each with 10 IPs. Total IPs per node = 3 * 10 = 30 IPs.

Step 2: Calculate total IPs for maximum capacity 20 nodes * 30 IPs/node = 600 IPs required from the VPC.

Step 3: Add headroom 100% headroom means we need space for 1,200 IPs.

Step 4: Select the subnet CIDR

  • A /24 provides 251 usable IPs (Too small)
  • A /23 provides 507 usable IPs (Too small)
  • A /22 provides 1,019 usable IPs (Too small)
  • A /21 provides 2,043 usable IPs (Perfect)

For this cluster, a /21 subnet is the minimum safe choice to guarantee 1,200 IPs are available even if other resources are deployed in the same subnet.

Cluster SizeNodesPods (est.)VPC CNI (standard)VPC CNI (prefix delegation)Overlay
Small (dev)3-5~100/24 (tight)/24 (comfortable)/27
Medium10-20~500/21 minimum/23/25
Large50-100~3,000/19 minimum/21/24
Very Large200+~10,000/17 minimum/19/23

The golden rule: always provision subnets at least 2x larger than your current needs. IP address space is free. Expanding subnets later is painful.

Pause and predict: If you use a /24 subnet (251 IPs) for a cluster configured with Calico VXLAN (overlay) and scale to 50 nodes running 2,000 pods total, will you exhaust the VPC subnet? Why or why not?


Overlay vs Underlay: The Networking Architecture Choice

Section titled “Overlay vs Underlay: The Networking Architecture Choice”

This is the foundational networking decision for your Kubernetes clusters. It affects performance, IP consumption, observability, and cloud integration.

Pods get real VPC IP addresses. Cloud network infrastructure routes pod traffic natively.

UNDERLAY NETWORKING (VPC CNI)
═══════════════════════════════════════════════════════════════
┌─────────── VPC: 10.0.0.0/16 ────────────────────────┐
│ │
│ Node A (10.0.1.10) Node B (10.0.1.20) │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ Pod 1: 10.0.1.15│ │ Pod 3: 10.0.1.25│ │
│ │ Pod 2: 10.0.1.16│ │ Pod 4: 10.0.1.26│ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ─────────┴───────────────────────────┴────────── │
│ VPC Router │
│ Pod-to-pod traffic: Routed natively by VPC │
│ No encapsulation. No tunnel. Full line speed. │
└──────────────────────────────────────────────────────┘
Cloud load balancers → target pods directly by IP
Cloud security groups → applied to pod IPs
VPC Flow Logs → show individual pod traffic
Network ACLs → filter pod traffic natively

Pods get IPs from a separate, private address space. Traffic between nodes is encapsulated in tunnels (VXLAN, Geneve, or IP-in-IP).

OVERLAY NETWORKING (Calico VXLAN)
═══════════════════════════════════════════════════════════════
┌─────────── VPC: 10.0.0.0/16 ────────────────────────┐
│ │
│ Node A (10.0.1.10) Node B (10.0.1.20) │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ Pod 1: 192.168.│ │ Pod 3: 192.168.│ │
│ │ 1.15 (overlay)│ │ 2.25 (overlay)│ │
│ │ Pod 2: 192.168.│ │ Pod 4: 192.168.│ │
│ │ 1.16 (overlay)│ │ 2.26 (overlay)│ │
│ └───────┬────────┘ └────────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ VXLAN Tunnel ════════════════ VXLAN Tunnel │
│ Outer: 10.0.1.10 → 10.0.1.20 │
│ Inner: 192.168.1.15 → 192.168.2.25 │
│ │
│ VPC only sees: Node A (10.0.1.10) → Node B │
│ VPC cannot see: Individual pod traffic │
└──────────────────────────────────────────────────────┘
Cloud load balancers → must target nodes (extra hop)
Cloud security groups → applied to nodes, not pods
VPC Flow Logs → show node-to-node, not pod-to-pod
Network ACLs → cannot filter individual pod traffic
FactorUnderlay (VPC CNI)Overlay (Calico/Cilium VXLAN)
PerformanceNative wire speed, no overhead5-15% throughput overhead (encapsulation)
IP consumptionHigh (1 VPC IP per pod)Low (pods use private range)
Cloud integrationFull (LB targets pods, SGs per pod)Limited (LB targets nodes, SGs per node)
ObservabilityVPC Flow Logs show pod trafficNeed CNI-level logs for pod traffic
Multi-clusterVPC peering routes pod IPs nativelyOverlay IPs not routable cross-VPC by default
Subnet planningCritical (must plan for pod growth)Simple (overlay range is independent)
Network policyEnforced at VPC + Calico/CiliumEnforced at CNI level only
Best forCloud-native apps needing deep cloud integrationMulti-cloud, IP-constrained environments

Most teams on a single cloud provider should use underlay (VPC-native) networking with prefix delegation. The cloud integration benefits — direct pod targeting by load balancers, security group per pod, native VPC Flow Logs — outweigh the IP planning overhead.


Private Cluster Architectures: Securing the API Server

Section titled “Private Cluster Architectures: Securing the API Server”

By default, managed Kubernetes services like EKS and GKE provision the cluster API server endpoint with a public IP address. This means kubectl commands traverse the public internet to reach your cluster. For enterprise environments, this is often unacceptable.

When configuring your cluster, you have three primary architectural choices for the API server endpoint:

  1. Public Only (Default but Risky) The API server is accessible from the internet. Security relies entirely on Kubernetes RBAC and IAM authentication. If a vulnerability is found in the API server itself, your cluster is immediately exposed to the world.

  2. Public and Private (The Compromise) The API server has both a public IP and a private IP within your VPC. Nodes use the private IP to communicate with the control plane, keeping node-to-control-plane traffic off the internet. Developers can still use the public endpoint from their laptops (often restricted by a CIDR allowlist).

  3. Private Only (Enterprise Standard) The API server only has a private IP within your VPC. There is no public routing to the control plane. This is the most secure posture but requires additional architecture for developer access.

When you choose a fully private cluster, how do developers and CI/CD pipelines run kubectl apply? You must provide a secure path into the VPC:

  • VPN / Direct Connect: Developers connect to the corporate VPN, which is peered to the VPC. Traffic flows privately.
  • Bastion Host: A hardened EC2 instance in a public subnet. Users SSH into the bastion (or use AWS Systems Manager Session Manager) and run kubectl from there.
  • CI/CD Runners in VPC: GitHub Actions runners or GitLab runners are deployed as EC2 instances or pods within the VPC itself, allowing them to communicate natively with the private API server.
Terminal window
# EKS: Update cluster to Private-Only mode
aws eks update-cluster-config \
--name production-cluster \
--resources-vpc-config endpointPublicAccess=false,endpointPrivateAccess=true

Stop and think: If you switch an existing cluster to “Private Only” without having a VPN or Bastion host set up, what will happen to your current kubectl session? How will the worker nodes be affected?


Egress Architecture: How Traffic Leaves Your Cluster

Section titled “Egress Architecture: How Traffic Leaves Your Cluster”

Every pod that calls an external API, downloads a package, or talks to a SaaS service needs an egress path. This path has cost, security, and compliance implications.

NAT Gateway: The Default (and Expensive) Path

Section titled “NAT Gateway: The Default (and Expensive) Path”
NAT GATEWAY EGRESS
═══════════════════════════════════════════════════════════════
Pod (10.0.2.15) Internet
┌──────────┐ ┌──────────┐
│ curl │──▶ Route Table ──▶ NAT GW ──▶│ api. │
│ api.com │ 0.0.0.0/0 (public │ example │
└──────────┘ → nat-gw-id subnet) │ .com │
│ └──────────┘
Elastic IP
52.1.2.3
(your public IP)
Cost:
NAT Gateway hourly: $0.045/hr × 730 hrs = $32.85/mo
Data processing: $0.045/GB
At 1TB/month egress: $32.85 + $45.00 = $77.85/mo per AZ
With 3 AZs: $233.55/mo JUST for NAT
(plus standard data transfer charges on top)

NAT Gateways are the single most expensive surprise in AWS Kubernetes deployments. A medium cluster pulling container images, calling external APIs, and sending logs to a SaaS observability platform can easily generate 5-10 TB of NAT data processing per month.

COST-OPTIMIZED EGRESS ARCHITECTURE
═══════════════════════════════════════════════════════════════
Strategy 1: VPC Endpoints (eliminate NAT for AWS services)
┌──────────┐ ┌──────────────────┐
│ Pod │──▶ VPC Endpoint ──▶│ S3 (no NAT) │
│ │ (Gateway type) │ Free data path │
└──────────┘ └──────────────────┘
┌──────────┐ ┌──────────────────┐
│ Pod │──▶ VPC Endpoint ──▶│ ECR (no NAT) │
│ │ (Interface type)│ $0.01/hr + free │
└──────────┘ $7.30/mo each │ data processing │
└──────────────────┘
Strategy 2: ECR pull-through cache (reduce image pulls)
First pull: ECR → upstream registry → cache
Subsequent: ECR → local cache (in-VPC, no NAT)
Strategy 3: NAT Instance (cheaper for low traffic)
t4g.nano: $3.02/mo (vs $32.85/mo for NAT GW)
Trade-off: No HA, lower bandwidth, you manage it
Terminal window
# Create VPC endpoints for common AWS services
# These eliminate NAT Gateway data processing charges
# S3 Gateway Endpoint (free)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-private-1 rtb-private-2
# ECR API endpoint (Interface type, $7.30/mo)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.ecr.api \
--subnet-ids subnet-private-1a subnet-private-1b \
--security-group-ids sg-vpce-ecr
# ECR Docker endpoint
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.ecr.dkr \
--subnet-ids subnet-private-1a subnet-private-1b \
--security-group-ids sg-vpce-ecr
# CloudWatch Logs endpoint
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.logs \
--subnet-ids subnet-private-1a subnet-private-1b \
--security-group-ids sg-vpce-logs
# STS endpoint (needed for IRSA token exchange)
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345 \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.us-east-1.sts \
--subnet-ids subnet-private-1a subnet-private-1b \
--security-group-ids sg-vpce-sts

Pause and predict: Your monthly AWS bill shows a $4,000 charge for NAT Gateway Data Processing. Your cluster heavily uses S3 and DynamoDB. What single architectural change would drastically reduce this cost tomorrow without changing any application code?

Some regulated environments require all egress traffic to flow through an inspection proxy. This provides URL-level filtering, TLS inspection, and logging.

PROXY-BASED EGRESS
═══════════════════════════════════════════════════════════════
Pod → Proxy (Squid/Envoy) → Internet
├── Allow: api.stripe.com (payment processor)
├── Allow: registry.npmjs.org (package registry)
├── Allow: *.datadog.com (observability)
├── Block: * (everything else)
└── Full URL logging for audit trail
Implementation:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Pod │────▶│ Egress │────▶│ NAT GW │──▶ Internet
│ │ │ Proxy │ │ │
│ HTTP_ │ │ (Envoy) │ └──────────┘
│ PROXY= │ │ - Allow │
│ proxy: │ │ list │
│ 3128 │ │ - Logging│
└──────────┘ │ - TLS │
│ inspect│
└──────────┘
# Kubernetes: Force egress through proxy using NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
# Allow DNS
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Allow traffic to egress proxy only
- to:
- podSelector:
matchLabels:
app: egress-proxy
ports:
- protocol: TCP
port: 3128
# Allow in-cluster traffic
- to:
- namespaceSelector: {}

Ingress Architecture: How Traffic Reaches Your Cluster

Section titled “Ingress Architecture: How Traffic Reaches Your Cluster”

Ingress is the mirror of egress. It’s how external traffic reaches your Kubernetes services. The architecture differs significantly between cloud providers and use cases.

INGRESS PATH: CLOUD LB → KUBERNETES
═══════════════════════════════════════════════════════════════
Option A: NLB → NodePort (L4)
┌────────┐ ┌──────┐ ┌──────┐ ┌─────┐
│ Client │────▶│ NLB │────▶│ Node │────▶│ Pod │
└────────┘ │ (L4) │ │ Port │ └─────┘
└──────┘ │30080 │
└──────┘
Pros: Simple, preserves source IP
Cons: Extra hop (NodePort), uneven distribution
Option B: NLB → Pod IP directly (L4, IP target mode)
┌────────┐ ┌──────┐ ┌─────┐
│ Client │────▶│ NLB │──────────────────▶│ Pod │
└────────┘ │ (L4) │ (pod IP is LB │10.0.│
└──────┘ target) │1.42 │
└─────┘
Pros: No extra hop, even distribution, lower latency
Cons: Requires VPC CNI (underlay networking)
Option C: ALB → Pod IP (L7, via Ingress/Gateway API)
┌────────┐ ┌──────┐ ┌─────┐
│ Client │────▶│ ALB │──────────────────▶│ Pod │
└────────┘ │ (L7) │ (TLS terminated │ │
│ WAF │ at ALB, routes │ │
│ Auth │ by path/host) └─────┘
└──────┘
Pros: L7 routing, WAF integration, auth offloading
Cons: ALB cost ($16/mo + LCU charges)
# Gateway API is replacing Ingress as the standard
# More expressive, role-oriented, portable
# Infrastructure admin creates the Gateway
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: production-gateway
namespace: infrastructure
annotations:
# AWS: Use ALB
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
gatewayClassName: aws-alb # or istio, cilium, nginx, etc.
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: production-tls
namespace: infrastructure
allowedRoutes:
namespaces:
from: Selector
selector:
matchLabels:
gateway-access: "true"
---
# Application team creates HTTPRoutes
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: payment-api-route
namespace: production
spec:
parentRefs:
- name: production-gateway
namespace: infrastructure
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/payments
backendRefs:
- name: payment-api
port: 8080
weight: 100
- matches:
- path:
type: PathPrefix
value: /v1/orders
backendRefs:
- name: order-api
port: 8080
weight: 100

Web Application Firewall (WAF) should sit in front of any public-facing Kubernetes service.

Terminal window
# AWS WAF with ALB Ingress Controller
# The ALB created by the Ingress controller can have WAF attached
# Create a WAF Web ACL
aws wafv2 create-web-acl \
--name production-waf \
--scope REGIONAL \
--default-action Allow={} \
--rules '[
{
"Name": "RateLimit",
"Priority": 1,
"Action": {"Block": {}},
"Statement": {
"RateBasedStatement": {
"Limit": 2000,
"AggregateKeyType": "IP"
}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "RateLimit"
}
},
{
"Name": "AWSManagedRulesCommonRuleSet",
"Priority": 2,
"OverrideAction": {"None": {}},
"Statement": {
"ManagedRuleGroupStatement": {
"VendorName": "AWS",
"Name": "AWSManagedRulesCommonRuleSet"
}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "CommonRules"
}
}
]' \
--visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=production-waf

When you have multiple VPCs (dev, staging, production, shared services), they need to communicate. The two primary mechanisms are VPC Peering and Transit Gateways.

VPC PEERING: DIRECT CONNECTIONS
═══════════════════════════════════════════════════════════════
2 VPCs = 1 peering connection
3 VPCs = 3 peering connections
4 VPCs = 6 peering connections
N VPCs = N×(N-1)/2 connections
┌─────────────┐ ┌─────────────┐
│ Production │◀──────▶│ Staging │
│ 10.1.0.0/16│ │ 10.2.0.0/16│
└──────┬──────┘ └──────┬──────┘
│ │
│ ┌─────────────┐ │
└───▶│ Shared Svc │◀──┘
│ 10.10.0.0/16│
└─────────────┘
3 VPCs = 3 peering connections. Manageable.
With 10 VPCs:
10 × 9 / 2 = 45 peering connections.
Not manageable.
TRANSIT GATEWAY: CENTRALIZED ROUTING
═══════════════════════════════════════════════════════════════
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Prod │ │ Staging │ │ Dev │ │ Shared │
│10.1.0/16│ │10.2.0/16│ │10.3.0/16│ │10.10.0/16│
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
└──────────────┼──────────────┼──────────────┘
│ │
┌───────▼──────────────▼───────┐
│ Transit Gateway │
│ │
│ Route Tables: │
│ Prod → Shared, Staging │
│ Staging → Shared, Prod │
│ Dev → Shared only │
│ Shared → All │
│ │
│ + On-Premises via VPN/DX │
└───────────────────────────────┘
Any number of VPCs: 1 TGW attachment per VPC.
Centralized routing policies.
Route table segmentation (Dev can't reach Prod).
Terminal window
# Create a Transit Gateway
aws ec2 create-transit-gateway \
--description "Production TGW" \
--options "AmazonSideAsn=64512,AutoAcceptSharedAttachments=disable,DefaultRouteTableAssociation=disable,DefaultRouteTablePropagation=disable,DnsSupport=enable"
# Attach VPCs
aws ec2 create-transit-gateway-vpc-attachment \
--transit-gateway-id tgw-12345 \
--vpc-id vpc-prod \
--subnet-ids subnet-prod-1a subnet-prod-1b
aws ec2 create-transit-gateway-vpc-attachment \
--transit-gateway-id tgw-12345 \
--vpc-id vpc-staging \
--subnet-ids subnet-staging-1a subnet-staging-1b
# Create separate route tables for segmentation
aws ec2 create-transit-gateway-route-table \
--transit-gateway-id tgw-12345 \
--tags Key=Name,Value=prod-routes
aws ec2 create-transit-gateway-route-table \
--transit-gateway-id tgw-12345 \
--tags Key=Name,Value=dev-routes
ComponentCost
TGW per hour per AZ attachment$0.05/hr (~$36.50/mo)
Data processing$0.02/GB
5 VPCs, 2 AZs each$365/mo just for attachments
1 TB cross-VPC traffic$20/mo data processing

Transit Gateway is worth it when you have 4+ VPCs or need centralized routing policies. Below that, VPC Peering is cheaper and simpler.


Connecting Kubernetes clusters to on-premises data centers requires choosing between VPN (encrypted over internet) and Direct Connect (dedicated private link).

CONNECTIVITY OPTIONS
═══════════════════════════════════════════════════════════════
Option 1: Site-to-Site VPN
┌────────────┐ IPSec Tunnel ┌────────────┐
│ On-Prem │◀═══════════════════▶│ AWS VPC │
│ Datacenter │ (over internet) │ / TGW │
└────────────┘ └────────────┘
Cost: $0.05/hr (~$36.50/mo) + data transfer
Bandwidth: Up to 1.25 Gbps per tunnel (2 tunnels for HA)
Latency: Variable (internet-dependent)
Setup time: Hours
Option 2: AWS Direct Connect
┌────────────┐ Dedicated Fiber ┌────────────┐
│ On-Prem │◀═══════════════════▶│ AWS DX │
│ Datacenter │ (private circuit) │ Location │
└────────────┘ └────────────┘
Cost: $0.30/hr (1Gbps port) + data transfer
Bandwidth: 1, 10, or 100 Gbps dedicated
Latency: Consistent (no internet hops)
Setup time: Weeks to months
Option 3: Direct Connect + VPN Backup
Primary: Direct Connect (high bandwidth, consistent latency)
Backup: Site-to-Site VPN (automatic failover if DX fails)
Best for: Production workloads needing reliability + performance

When connecting cloud VPCs to on-premises networks, CIDR overlap is the most common and painful mistake. If your on-prem network uses 10.0.0.0/8 and your VPC also uses 10.0.0.0/16, routing breaks. Traffic destined for 10.0.1.5 could mean a pod in your cluster or a server in your data center.

THE OVERLAPPING CIDR DISASTER
═══════════════════════════════════════════════════════════════
Before peering (everyone used 10.0.0.0/16):
On-Prem: 10.0.0.0/8 VPC Prod: 10.0.0.0/16
Server: 10.0.1.50 Pod: 10.0.1.50
Peering attempt → REJECTED
"CIDR blocks overlap. Cannot create peering connection."
Fix: Re-IP one side. In production. With zero downtime.
Difficulty: Nightmare. This is a multi-month project.
Correct planning from day one:
On-Prem: 172.16.0.0/12 (172.16.0.0 - 172.31.255.255)
AWS Prod: 10.1.0.0/16
AWS Staging: 10.2.0.0/16
AWS Dev: 10.3.0.0/16
GCP: 10.100.0.0/16
Azure: 10.200.0.0/16
No overlaps. Everything can peer with everything.

  • AWS NAT Gateway data processing charges are the number one surprise cost for Kubernetes teams. A single EKS cluster pulling container images, sending logs to Datadog, and communicating with managed services can generate $500-$2,000/month in NAT charges alone. VPC endpoints for S3, ECR, CloudWatch, and STS can reduce this by 60-80%.

  • The maximum number of IP addresses in a single AWS VPC is 65,536 (a /16 CIDR block). With secondary CIDRs, you can add up to 4 additional blocks, but many teams hit IP limits long before that because they under-sized their subnets. GCP has it easier: VPC subnets can span 8,000+ IP addresses across regions automatically.

  • Kubernetes pod-to-pod traffic within the same AZ on AWS is free, but cross-AZ traffic costs $0.01/GB in each direction ($0.02/GB round trip). For a cluster spanning 3 AZs with chatty microservices, this adds up. Topology-aware routing (topology.kubernetes.io/zone) can reduce cross-AZ traffic by preferring same-zone backends.

  • The Gateway API specification reached GA (v1.0) in October 2023 after three years of development. Unlike the Ingress resource (which was never formally versioned and has inconsistent behavior across controllers), Gateway API has formal conformance tests. Every conformant implementation must behave identically for the same configuration, making it truly portable across providers.


MistakeWhy It HappensHow to Fix It
Using /24 subnets for EKSTreating K8s like VMs where each host gets one IPSize subnets for pod count: /20 or larger for production clusters
Not enabling prefix delegationUsing default VPC CNI settingsEnable ENABLE_PREFIX_DELEGATION=true on aws-node DaemonSet. Reduces IP consumption dramatically
Skipping VPC endpointsDon’t realize NAT Gateway processes AWS service trafficCreate Gateway endpoints (S3, DynamoDB) and Interface endpoints (ECR, STS, CloudWatch)
Overlapping CIDRs across environmentsUsing default 10.0.0.0/16 everywherePlan a global IPAM scheme before creating the first VPC. Document it. Enforce it
Single NAT Gateway (one AZ)“We only need one”Deploy NAT Gateway per AZ for HA. One NAT GW failure shouldn’t break all egress
No network policies”We’ll add them later”Start with a default-deny policy per namespace. Explicitly allow required traffic
ALB per serviceEach Ingress creates a new ALB ($16/mo each)Use a shared ALB with path-based or host-based routing. One ALB can serve many services
Ignoring cross-AZ transfer costsFree within AZ, $0.02/GB cross-AZ seems smallAt 10TB/month cross-AZ: $200/month. Use topology-aware routing to keep traffic local

1. An EKS cluster uses VPC CNI (default mode, no prefix delegation) with m5.xlarge nodes. Each node has 4 ENIs with 15 IPs each. The subnet is a /24 (251 usable IPs). How many nodes can fit before IP exhaustion?

Each m5.xlarge node consumes 60 VPC IPs (4 ENIs x 15 IPs) because the default VPC CNI attaches all possible ENIs and secondary IPs to ensure rapid pod scheduling. With 251 usable IPs in a /24 subnet, you can fit 251 / 60 = 4.18, so only 4 nodes before exhaustion. The 5th node would fail to acquire all its necessary ENI IPs, preventing new pods from being scheduled on it. In practice, because some IPs are consumed by internal load balancers or VPC endpoints, you might hit the limit even sooner, which is why a /24 is dangerously small for EKS.

2. Your application team is building a microservice that heavily reads from AWS S3 and pushes metrics to CloudWatch. During a cost audit, you notice a massive spike in NAT Gateway data processing charges. What specific architectural changes should you implement to eliminate these costs?

You should implement VPC Gateway endpoints for S3 and VPC Interface endpoints for CloudWatch. Gateway endpoints are free and route traffic to S3 directly over the AWS network, bypassing the NAT Gateway completely. Interface endpoints create an ENI in your subnet for services like CloudWatch, redirecting traffic privately for a small hourly fee that is vastly cheaper than NAT data processing. By routing this heavy internal traffic directly through endpoints, the NAT Gateway is bypassed, eliminating the data processing charges associated with those services.

3. A security compliance auditor mandates that every individual pod's network traffic must be fully logged and subject to VPC-level Network ACLs. Your current clusters run on standard EC2 instances. Which network architecture must you choose to satisfy this requirement?

You must choose an Underlay (VPC-Native) networking architecture, such as the AWS VPC CNI. With underlay networking, every pod receives a native IP address from the VPC subnet. Because the traffic is not encapsulated in tunnels (like it would be with an overlay network such as VXLAN), the VPC fabric sees every packet’s true source and destination IP. This visibility allows VPC Flow Logs to record individual pod traffic and enables Network ACLs to filter traffic at the pod IP level, directly satisfying the auditor’s requirements.

4. Your company has 8 VPCs that need to communicate. You are debating between using VPC Peering or a Transit Gateway. Why is Transit Gateway the better architectural choice for this scenario?

With 8 VPCs, a full mesh of VPC Peering would require 28 separate peering connections (8 x 7 / 2), each needing custom route table entries and complex security group management. Adding a 9th VPC later would require 8 more distinct peering connections, creating an operational nightmare. Transit Gateway simplifies this by acting as a central hub where each VPC only requires a single attachment. Route tables are managed centrally on the Transit Gateway, allowing for clean network segmentation and vastly simpler scaling as new environments are added.

5. A team plans to connect their AWS VPCs to an on-premises data center using a Site-to-Site VPN. Both the AWS environments and the on-premises network use the 10.0.0.0/8 CIDR range. What routing problem will occur, and how must it be resolved?

Direct routing will fail because the CIDR ranges perfectly overlap, meaning the routers cannot determine whether a packet destined for 10.0.1.5 belongs to a cloud pod or an on-premises server. To fix this without re-IPing either side, you must deploy a NAT solution at the network boundary. The VPN configuration would need to NAT the on-premises 10.x range to a non-overlapping range (such as 100.64.0.0/10) from the perspective of the cloud VPC. The most sustainable long-term solution, however, is to plan a global IPAM scheme before creating infrastructure to ensure environments utilize entirely distinct CIDR blocks.

6. Your e-commerce platform spans three Availability Zones. During a load test, you notice that cross-AZ data transfer costs are excessively high, even though the total number of requests is expected. You are currently using default Kubernetes Services for internal routing. How can you modify the Kubernetes configuration to reduce this cloud infrastructure cost?

You should implement topology-aware routing by adding the service.kubernetes.io/topology-mode: Auto annotation to your Kubernetes Services. By default, kube-proxy distributes internal service traffic randomly across all healthy endpoints in the cluster, meaning roughly 67% of traffic crosses AZ boundaries in a 3-AZ setup. Topology-aware routing instructs kube-proxy to prefer routing traffic to backend pods located in the exact same Availability Zone as the client pod. This change keeps the majority of internal traffic local to the AZ, drastically reducing the $0.01/GB cross-AZ data transfer fees while also slightly improving request latency.


Hands-On Exercise: Design a Multi-Environment Subnet Plan

Section titled “Hands-On Exercise: Design a Multi-Environment Subnet Plan”

You’re designing the network architecture for a company that runs Kubernetes across three environments (development, staging, production) plus a shared services VPC. The company also has an on-premises data center that must connect to all cloud environments.

  • Cloud provider: AWS, us-east-1
  • On-premises data center CIDR: 172.16.0.0/12
  • Each environment runs EKS with VPC CNI (prefix delegation enabled)
  • Production: 50 nodes, ~2,500 pods
  • Staging: 15 nodes, ~500 pods
  • Development: 10 nodes, ~300 pods
  • Shared services: monitoring stack, CI/CD, artifact registry
  • Future: eu-west-1 region for production DR

Create a non-overlapping CIDR scheme that accommodates all current and future environments without conflicts.

Solution
GLOBAL CIDR ALLOCATION
═══════════════════════════════════════════════════════════════
On-Premises (existing):
172.16.0.0/12 (172.16.0.0 - 172.31.255.255)
AWS us-east-1:
10.1.0.0/16 Production VPC (65,536 IPs)
10.2.0.0/16 Staging VPC (65,536 IPs)
10.3.0.0/16 Development VPC (65,536 IPs)
10.10.0.0/16 Shared Services VPC (65,536 IPs)
AWS eu-west-1 (future DR):
10.101.0.0/16 Production DR VPC (65,536 IPs)
10.110.0.0/16 Shared Services DR (65,536 IPs)
Reserved for future regions:
10.201.0.0/16 Asia-Pacific Prod
10.210.0.0/16 Asia-Pacific Shared
Reserved for other cloud providers:
10.50.0.0/16 GCP (if needed)
10.60.0.0/16 Azure (if needed)
Kubernetes Pod CIDRs (if using overlay -- not needed with VPC CNI):
192.168.0.0/16 Reserved, not used with VPC CNI
Key design decisions:
- First octet after 10. encodes the purpose
- 1-9: environments, 10-19: shared services
- 100+: DR regions mirror primary with +100 offset
- 200+: additional regions
- 50-60: other clouds
- No overlap with on-prem 172.16.0.0/12

Task 2: Design Subnet Layout for the Production VPC

Section titled “Task 2: Design Subnet Layout for the Production VPC”

Create the subnet layout for the production VPC (10.1.0.0/16) across 3 AZs, with separate tiers for pods, nodes, and internal load balancers.

Solution
PRODUCTION VPC: 10.1.0.0/16
═══════════════════════════════════════════════════════════════
Availability Zone us-east-1a:
10.1.0.0/19 Pod subnet (8,190 IPs) ← EKS pods
10.1.32.0/22 Node subnet (1,022 IPs) ← EC2 instances
10.1.36.0/24 Internal LB (251 IPs) ← NLB/ALB
10.1.37.0/24 Public subnet (251 IPs) ← NAT GW, bastion
10.1.38.0/24 VPC endpoints (251 IPs) ← Interface endpoints
10.1.39.0/24 Reserved (future use)
Availability Zone us-east-1b:
10.1.64.0/19 Pod subnet (8,190 IPs)
10.1.96.0/22 Node subnet (1,022 IPs)
10.1.100.0/24 Internal LB (251 IPs)
10.1.101.0/24 Public subnet (251 IPs)
10.1.102.0/24 VPC endpoints (251 IPs)
10.1.103.0/24 Reserved
Availability Zone us-east-1c:
10.1.128.0/19 Pod subnet (8,190 IPs)
10.1.160.0/22 Node subnet (1,022 IPs)
10.1.164.0/24 Internal LB (251 IPs)
10.1.165.0/24 Public subnet (251 IPs)
10.1.166.0/24 VPC endpoints (251 IPs)
10.1.167.0/24 Reserved
Total pod IPs: 3 × 8,190 = 24,570
Supports: 50 nodes × 110 pods = 5,500 pods (using <25%)
Growth capacity: ~4x before needing subnet expansion
Why /19 for pods?
50 nodes × 110 max pods = 5,500 pod IPs needed now
/19 per AZ = 8,190 IPs per AZ = 24,570 total
Leaves ~75% headroom for growth
Why separate pod and node subnets?
- Different security groups for pods vs nodes
- Pods need VPC CNI with prefix delegation
- Nodes have SSH access, pods don't
- Monitoring IP exhaustion separately is easier

Task 3: Design the Transit Gateway Routing

Section titled “Task 3: Design the Transit Gateway Routing”

Configure the Transit Gateway route tables to enforce environment isolation: development cannot reach production directly.

Solution
TRANSIT GATEWAY ROUTE TABLE DESIGN
═══════════════════════════════════════════════════════════════
TGW Route Table: production-routes
Associated: Production VPC
Routes:
10.10.0.0/16 → Shared Services attachment (monitoring, CI/CD)
10.2.0.0/16 → Staging attachment (for promotion testing)
172.16.0.0/12 → On-prem VPN attachment (database migration)
# NO route to 10.3.0.0/16 (Development) ← ISOLATION
TGW Route Table: staging-routes
Associated: Staging VPC
Routes:
10.10.0.0/16 → Shared Services attachment
10.1.0.0/16 → Production attachment (read replicas)
172.16.0.0/12 → On-prem VPN attachment
# NO route to 10.3.0.0/16 (Development)
TGW Route Table: development-routes
Associated: Development VPC
Routes:
10.10.0.0/16 → Shared Services attachment (CI/CD, registry)
# NO route to 10.1.0.0/16 (Production) ← ISOLATION
# NO route to 10.2.0.0/16 (Staging) ← ISOLATION
# NO route to 172.16.0.0/12 (On-prem) ← ISOLATION
TGW Route Table: shared-services-routes
Associated: Shared Services VPC
Routes:
10.1.0.0/16 → Production attachment
10.2.0.0/16 → Staging attachment
10.3.0.0/16 → Development attachment
172.16.0.0/12 → On-prem VPN attachment
# Shared services can reach everything (monitoring, CI/CD)
TGW Route Table: onprem-routes
Associated: VPN attachment
Routes:
10.1.0.0/16 → Production attachment
10.2.0.0/16 → Staging attachment
10.10.0.0/16 → Shared Services attachment
# NO route to 10.3.0.0/16 (Development)

This design enforces: Development is completely isolated from Production, Staging, and on-prem. It can only reach Shared Services (for pulling images, CI/CD). Production and Staging can reach each other (for promotion testing) and on-prem (for database connectivity). Shared Services is the hub that can reach everything.

Task 4: Calculate the Monthly Networking Cost

Section titled “Task 4: Calculate the Monthly Networking Cost”

Estimate the monthly cost for the complete network architecture, including NAT Gateways, Transit Gateway, VPC endpoints, and data transfer.

Solution
ComponentQuantityUnit CostMonthly Cost
NAT Gateway (prod, 3 AZs)3$32.85/mo$98.55
NAT Gateway (staging, 2 AZs)2$32.85/mo$65.70
NAT Gateway (dev, 1 AZ)1$32.85/mo$32.85
NAT data processing (est. 2TB total)2,000 GB$0.045/GB$90.00
Transit Gateway attachments (4 VPCs x 2 AZs avg)8$36.50/mo$292.00
TGW VPN attachment1$36.50/mo$36.50
TGW data processing (est. 500GB)500 GB$0.02/GB$10.00
VPC endpoints - S3 Gateway (all VPCs)4Free$0.00
VPC endpoints - Interface (ECR, STS, CW per VPC)12$7.30/mo per AZ$175.20
Cross-AZ data transfer (est. 3TB)3,000 GB$0.02/GB$60.00
Site-to-Site VPN1$36.50/mo$36.50
Total Monthly Network Cost$897.30

Cost optimization opportunities:

  • Replace dev NAT GW with a t4g.nano NAT instance: save $29.83/mo
  • Use VPC endpoints to reduce NAT data processing: save ~$40/mo
  • Enable topology-aware routing to reduce cross-AZ: save ~$20/mo
  • Consolidate dev+staging VPC endpoints: save $87.60/mo
  • Optimized total: ~$720/mo
  • Global CIDR scheme has no overlaps between any environments or on-premises
  • Production subnet plan accommodates 4x growth without re-architecting
  • Pod and node subnets are separated with appropriate sizing
  • Transit Gateway routing enforces development isolation from production
  • VPC endpoints reduce NAT Gateway dependency for AWS service traffic
  • Cost estimate includes all networking components

This is the final module in the Cloud Architecture Patterns series. You now have the knowledge to design Kubernetes deployments that are well-managed (Module 4.1), resilient across regions (Module 4.2), secured with identity federation (Module 4.3), and networked correctly from day one (Module 4.4). Consider exploring the Platform Engineering Track for deeper dives into GitOps, observability, and security tooling.