Перейти до вмісту

Module 1.5: Multi-Cluster & Hybrid Networking

Цей контент ще не доступний вашою мовою.

Discipline Module | Complexity: [COMPLEX] | Time: 60-70 min

Before starting this module:


After completing this module, you will be able to:

  • Design multi-cluster networking architectures that connect services across clusters, regions, and clouds
  • Implement cross-cluster service discovery using DNS, service mesh federation, or Submariner
  • Configure network policies that span cluster boundaries while maintaining security isolation
  • Evaluate multi-cluster networking trade-offs — latency, complexity, cost — for your distributed architecture

In June 2023, a global payments company ran two Kubernetes clusters — one in us-east-1 (AWS) and one in eu-west-1. Each cluster was self-contained, with its own databases, caches, and application stacks. The company’s European customers were routing through the US cluster because a DNS misconfiguration in their global load balancer pointed payments.example.com to the US cluster only.

When the US cluster experienced a 40-minute outage (AZ failure in us-east-1a), European customers lost service entirely even though the EU cluster was perfectly healthy. The company had invested $800K building the second cluster for disaster recovery, but the networking layer — the piece that connects clusters and routes traffic intelligently — was an afterthought. The failover was manual, undocumented, and had never been tested.

Multi-cluster networking is not just about connecting clusters. It’s about building a resilient service topology where traffic flows to the right cluster based on latency, availability, and business rules. This module covers the tools, patterns, and operational practices that make multi-cluster Kubernetes work in production.


Cilium ClusterMesh can connect up to 255 clusters in a single mesh with a shared Pod identity system. Pods in any cluster can reach Pods in any other cluster using standard Kubernetes Service names, with no application changes. Each cluster maintains its own control plane — there is no single point of failure.

Submariner (a CNCF Sandbox project) creates encrypted tunnels between clusters using IPsec or WireGuard. It supports non-overlapping and overlapping Pod CIDRs through its Globalnet component, which assigns unique “global” IPs to exported Services. This means you can connect clusters that were provisioned with the same default 10.244.0.0/16 range.

The external-dns project can synchronize Kubernetes Service and Ingress resources with over 30 DNS providers (Route53, CloudFlare, Google Cloud DNS, Azure DNS, etc.). In a multi-cluster setup, external-dns creates DNS records that point to Services in each cluster, enabling simple DNS-based failover — if a cluster goes down, its external-dns stops updating, and DNS TTL expiry routes traffic to the healthy cluster.

Google’s GKE Multi-Cluster Services (MCS) API was proposed as a Kubernetes Enhancement Proposal (KEP-1645) and is being standardized as the ServiceExport/ServiceImport pattern. When GA, it will provide a vendor-neutral way for clusters to share Services, replacing the current vendor-specific approaches.


Model 1: Flat Networking (Shared Pod CIDR Space)

Section titled “Model 1: Flat Networking (Shared Pod CIDR Space)”

All clusters share a routable Pod network. Pods can reach each other directly by IP.

┌─────────────────────┐ ┌─────────────────────┐
│ Cluster A │ │ Cluster B │
│ Pods: 10.1.0.0/16 │ │ Pods: 10.2.0.0/16 │
│ │ │ │
│ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │
│ │ P │ │ P │ │ │ │ P │ │ P │ │
│ │.1 │────→│.5 │ │ │ │.3 │ │.7 │ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
│ └──────────────┼─────┼──→ │
│ Direct Pod-to-Pod │ │ Direct Pod-to-Pod │
└─────────────────────┘ └─────────────────────┘
│ │
└──────────┬──────────────────┘
VPC Peering / VPN / Direct Connect
(non-overlapping CIDRs required)

Requirements:

  • Non-overlapping Pod, Service, and Node CIDRs across all clusters
  • Network connectivity (VPC peering, VPN, dedicated interconnect)
  • Routing rules for cross-cluster Pod CIDRs

Best for: Clusters in the same cloud provider/region where VPC peering is simple.

Cross-cluster traffic is encapsulated in tunnels. Pod CIDRs can overlap.

┌─────────────────────┐ ┌─────────────────────┐
│ Cluster A │ │ Cluster B │
│ Pods: 10.244.0.0/16│ │ Pods: 10.244.0.0/16│
│ (same CIDR!) │ │ (same CIDR!) │
│ │ │ │
│ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ Submariner GW │──┼─────┼──│ Submariner GW │ │
│ │ (IPsec/WG) │ │ │ │ (IPsec/WG) │ │
│ └────────────────┘ │ │ └────────────────┘ │
│ Globalnet: 242.x │ │ Globalnet: 243.x │
└─────────────────────┘ └─────────────────────┘

Requirements:

  • A tunnel solution (Submariner, WireGuard mesh)
  • Gateway nodes with connectivity to other clusters
  • Higher latency than flat networking (encapsulation overhead)

Best for: Clusters with overlapping CIDRs, different cloud providers, or restricted network environments.

Only Services (not individual Pods) are shared across clusters. Traffic goes through a gateway.

┌──────────────────────┐ ┌──────────────────────┐
│ Cluster A │ │ Cluster B │
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ api-service │◄───┼─────┼──│ payment-svc │ │
│ │ (ClusterIP) │ │ │ │ (exported) │ │
│ └──────────────┘ │ │ └──────────────┘ │
│ ▲ │ │ │
│ ServiceImport │ │ ServiceExport │
│ (from Cluster B) │ │ (to Cluster A) │
└──────────────────────┘ └──────────────────────┘

Requirements:

  • Multi-cluster service discovery (MCS API, Istio, Cilium ClusterMesh)
  • Gateway or proxy for cross-cluster traffic
  • Only exported Services are reachable, not all Pods

Best for: Security-conscious environments where you want explicit control over what’s shared.

FactorFlatOverlayService-Level
CIDR overlap OKNoYesN/A
PerformanceBestGood (-5-10%)Good
Security postureLow (all Pods reachable)MediumHighest
ComplexityLowMediumMedium-High
Cross-cloudNeeds VPN/peeringWorks anywhereWorks anywhere

ClusterMesh connects Cilium-managed clusters with a shared identity system and cross-cluster Service discovery.

Terminal window
# Prerequisites: Cilium installed on both clusters with unique cluster IDs
# Cluster A:
cilium install --cluster-name cluster-a --cluster-id 1 \
--set cluster.name=cluster-a \
--set cluster.id=1
# Cluster B:
cilium install --cluster-name cluster-b --cluster-id 2 \
--set cluster.name=cluster-b \
--set cluster.id=2
# Enable ClusterMesh on both clusters
cilium clustermesh enable --service-type LoadBalancer
# (Use NodePort if no LoadBalancer available)
# Wait for ClusterMesh API server to be ready
cilium clustermesh status --wait
# Connect the clusters
cilium clustermesh connect --destination-context cluster-b

Sharing Services across clusters:

# In Cluster B: annotate the Service to be global
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: production
annotations:
service.cilium.io/global: "true"
# Optional: prefer local endpoints, fall back to remote
service.cilium.io/affinity: "local"
spec:
selector:
app: payment
ports:
- port: 8080

Once annotated, Pods in Cluster A can reach payment-service.production.svc.cluster.local and traffic will be load balanced across endpoints in both clusters.

Terminal window
# Verify cross-cluster connectivity
cilium clustermesh status
# Shows: connected clusters, shared services, endpoint counts
# View cross-cluster endpoints
kubectl get ciliumendpoints -A | grep -i payment

Submariner creates IPsec or WireGuard tunnels between clusters and supports the Multi-Cluster Services (MCS) API.

Terminal window
# Install subctl CLI
curl -Ls https://get.submariner.io | VERSION=0.23.1 bash
# Deploy the broker (coordination component) on Cluster A
subctl deploy-broker --kubeconfig kubeconfig-cluster-a
# Join Cluster A to the broker
subctl join broker-info.subm --kubeconfig kubeconfig-cluster-a \
--clusterid cluster-a \
--nattport 4500
# Join Cluster B
subctl join broker-info.subm --kubeconfig kubeconfig-cluster-b \
--clusterid cluster-b \
--nattport 4500
# Verify connectivity
subctl show all
subctl verify --kubeconfig kubeconfig-cluster-a \
--toconfig kubeconfig-cluster-b --only connectivity

Exporting Services with MCS API:

# In Cluster B: export the service
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: payment-service
namespace: production
# In Cluster A: the ServiceImport is created automatically
# Pods can now reach:
# payment-service.production.svc.clusterset.local
# (Note: .clusterset.local instead of .cluster.local)
Terminal window
# Verify service export
kubectl get serviceexport -n production
kubectl get serviceimport -n production # On the consuming cluster

Skupper uses an application-layer Virtual Application Network (VAN) to connect services without VPN or special network configuration.

Terminal window
# Install Skupper CLI
curl https://skupper.io/install.sh | sh
# In Cluster A: initialize Skupper
skupper init --site-name cluster-a
# In Cluster B: initialize and create a link token
skupper init --site-name cluster-b
skupper token create cluster-b-token.yaml
# In Cluster A: use the token to establish the link
skupper link create cluster-b-token.yaml
# In Cluster B: expose a service
skupper expose deployment payment-service --port 8080
# In Cluster A: the service is now accessible
kubectl get services # payment-service appears as a local ClusterIP

When Skupper shines: connecting Kubernetes clusters to non-Kubernetes workloads (VMs, bare metal), or connecting clusters across restrictive firewalls where VPN setup is impossible.

FeatureCilium ClusterMeshSubmarinerSkupper
Max clusters25520-30 (practical)50+
ConnectivityDirect (flat or tunnel)IPsec/WireGuard tunnelAMQP application layer
Overlapping CIDRsNoYes (Globalnet)Yes
Network policies cross-clusterYesNoNo
MCS API (ServiceExport)No (own annotation)YesNo (own model)
Non-K8s workloadsNoNoYes
Requires CNI changeYes (Cilium)No (any CNI)No (any CNI)
Performance overheadMinimal5-10% (tunnel)10-20% (app layer)

Kubernetes uses CoreDNS for in-cluster DNS. For multi-cluster, you can configure CoreDNS to forward queries for other clusters:

apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
# Forward queries for cluster-b services to cluster-b's DNS
cluster-b.local:53 {
forward . 10.100.0.10 # Cluster B's CoreDNS IP
}
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}

external-dns synchronizes Kubernetes resources with external DNS providers:

apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: registry.k8s.io/external-dns/external-dns:v0.15.1
args:
- --source=service
- --source=ingress
- --source=gateway-httproute # Gateway API support
- --provider=aws # or google, azure, cloudflare
- --domain-filter=example.com
- --aws-zone-type=public
- --txt-owner-id=cluster-a # Unique per cluster
- --policy=upsert-only # Don't delete records
┌─────────────────────────────────────────────────────────────┐
│ Route53 (weighted routing) │
│ api.example.com │
│ ├── 50% → cluster-a.api.example.com (us-east-1) │
│ └── 50% → cluster-b.api.example.com (eu-west-1) │
│ │
│ Health checks: │
│ cluster-a: GET /healthz → 200 ✓ │
│ cluster-b: GET /healthz → 200 ✓ │
│ │
│ If cluster-a fails health check: │
│ 100% → cluster-b.api.example.com │
└─────────────────────────────────────────────────────────────┘
Terminal window
# AWS Route53 health check + weighted routing
aws route53 create-health-check --caller-reference "cluster-a-$(date +%s)" \
--health-check-config '{
"IPAddress": "203.0.113.10",
"Port": 443,
"Type": "HTTPS",
"ResourcePath": "/healthz",
"RequestInterval": 10,
"FailureThreshold": 3
}'

┌──────────────────┐ ┌──────────────────┐
│ Cloud (AWS) │ │ On-Prem DC │
│ VPC: 10.0.0.0/16│ │ Net: 172.16.0.0/12│
│ │ │ │
│ K8s Cluster │ IPsec │ K8s Cluster │
│ Pods: 10.1.0.0/16│←────────→│ Pods: 10.2.0.0/16 │
│ │ VPN │ │
│ AWS VPN Gateway │ │ On-prem VPN GW │
└──────────────────┘ └──────────────────┘

Key considerations:

  • Non-overlapping CIDRs — Plan CIDR allocation across all environments
  • Bandwidth — VPN throughput is typically 1-5 Gbps; Direct Connect/ExpressRoute for more
  • Latency — VPN adds 1-5ms per hop; measure and account for in timeout configs
  • Reliability — Use redundant VPN tunnels; monitor tunnel state
EnvironmentNode CIDRPod CIDRService CIDR
Cluster A (us-east-1)10.0.0.0/1610.1.0.0/1610.96.0.0/16
Cluster B (eu-west-1)10.10.0.0/1610.11.0.0/1610.97.0.0/16
Cluster C (on-prem)172.16.0.0/16172.17.0.0/16172.18.0.0/16

Rules:

  1. No overlap between any CIDR ranges across all clusters
  2. Reserve space for future clusters (don’t use /8 ranges on a single cluster)
  3. Document all CIDRs in a central registry
  4. Use Submariner Globalnet if you cannot avoid overlap (legacy clusters)

Terminal window
# 1. Can nodes in Cluster A reach nodes in Cluster B?
ping <cluster-b-node-ip>
# 2. Is the tunnel/peering established?
# Submariner:
subctl show connections
# Cilium ClusterMesh:
cilium clustermesh status
# 3. Can Pods resolve cross-cluster DNS?
kubectl run dns-test --rm -it --image=busybox:1.36 --restart=Never -- \
nslookup payment-service.production.svc.clusterset.local
# 4. Can Pods reach cross-cluster Services?
kubectl run net-test --rm -it --image=nicolaka/netshoot --restart=Never -- \
curl -v http://payment-service.production.svc.clusterset.local:8080/healthz
# 5. Check for MTU issues (common with tunnels)
kubectl run mtu-test --rm -it --image=nicolaka/netshoot --restart=Never -- \
ping -M do -s 1400 <remote-pod-ip>
# If this fails but ping -s 1300 works, you have an MTU issue
# 6. Check firewall rules
# Submariner needs: UDP 4500 (IPsec NAT-T), UDP 4490 (tunnel)
# Cilium ClusterMesh needs: TCP 2379 (etcd), TCP 4240 (health)
SymptomLikely CauseFix
DNS resolution fails for .clusterset.localSubmariner CoreDNS plugin not installedRun subctl diagnose all
Intermittent timeouts on large payloadsMTU mismatch (tunnel overhead)Set MTU to 1400 (VXLAN) or 1380 (IPsec)
Service reachable from one cluster but not the otherAsymmetric routing or missing return routeCheck route tables on gateway nodes
ClusterMesh shows “connected” but Services not sharedMissing service.cilium.io/global annotationAdd annotation and verify endpoint sync
VPN tunnel flapsKeep-alive timeout too aggressiveIncrease DPD interval, check cloud provider VPN limits

MistakeWhy It HappensHow to Fix It
Using the same Pod CIDR on all clustersDefault kubeadm/kind uses 10.244.0.0/16 everywherePlan CIDRs before cluster creation; use Submariner Globalnet if already deployed
Not testing failover”We have two clusters, so we’re HA”Schedule monthly failover drills; automate DNS failover with health checks
Running multi-cluster without monitoring cross-cluster latencyTeams monitor per-cluster metrics but not cross-clusterAdd cross-cluster latency probes (blackbox exporter) and SLOs
Opening all ports between clusters”Just open everything for now”Whitelist only required ports: 4240, 2379 (Cilium), 4500, 4490 (Submariner)
Ignoring DNS TTL in failoverHigh TTL means DNS failover takes minutes, not secondsSet TTL to 30-60s for records used in failover
Not documenting which Services are exportedCross-cluster dependencies become invisibleMaintain a registry of exported Services with ownership

Exercise 1: Multi-Cluster with Cilium ClusterMesh (kind)

Section titled “Exercise 1: Multi-Cluster with Cilium ClusterMesh (kind)”
Terminal window
# Create two kind clusters
cat <<'EOF' > cluster-a.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: "10.1.0.0/16"
serviceSubnet: "10.96.0.0/16"
nodes:
- role: control-plane
- role: worker
EOF
cat <<'EOF' > cluster-b.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
disableDefaultCNI: true
podSubnet: "10.2.0.0/16"
serviceSubnet: "10.97.0.0/16"
nodes:
- role: control-plane
- role: worker
EOF
kind create cluster --name cluster-a --config cluster-a.yaml
kind create cluster --name cluster-b --config cluster-b.yaml

Task 1: Install Cilium on both clusters with unique cluster IDs.

Terminal window
# Cluster A
cilium install --context kind-cluster-a --cluster-name cluster-a --cluster-id 1 \
--set cluster.name=cluster-a --set cluster.id=1
# Cluster B
cilium install --context kind-cluster-b --cluster-name cluster-b --cluster-id 2 \
--set cluster.name=cluster-b --set cluster.id=2
# Wait for ready
cilium status --context kind-cluster-a --wait
cilium status --context kind-cluster-b --wait

Task 2: Enable ClusterMesh and connect the clusters.

Terminal window
cilium clustermesh enable --context kind-cluster-a --service-type NodePort
cilium clustermesh enable --context kind-cluster-b --service-type NodePort
cilium clustermesh status --context kind-cluster-a --wait
cilium clustermesh status --context kind-cluster-b --wait
cilium clustermesh connect --context kind-cluster-a --destination-context kind-cluster-b

Task 3: Deploy a global service and verify cross-cluster access.

Terminal window
# Deploy service in Cluster B
kubectl --context kind-cluster-b create namespace shared
kubectl --context kind-cluster-b run echo-server -n shared \
--image=hashicorp/http-echo:0.2.3 -- -listen=:8080 -text="from-cluster-b"
kubectl --context kind-cluster-b expose pod echo-server -n shared --port=8080
# Annotate as global
kubectl --context kind-cluster-b annotate service echo-server -n shared \
service.cilium.io/global="true"
# Create matching namespace in Cluster A
kubectl --context kind-cluster-a create namespace shared
# Test from Cluster A
kubectl --context kind-cluster-a run test -n shared --rm -it --restart=Never \
--image=busybox:1.36 -- wget -qO- http://echo-server.shared.svc.cluster.local:8080
# Expected: "from-cluster-b"
Terminal window
# Deploy the same service in BOTH clusters
kubectl --context kind-cluster-a create namespace app
kubectl --context kind-cluster-a run web -n app --image=hashicorp/http-echo:0.2.3 \
-- -listen=:8080 -text="cluster-a"
kubectl --context kind-cluster-a expose pod web -n app --port=8080
kubectl --context kind-cluster-b create namespace app
kubectl --context kind-cluster-b run web -n app --image=hashicorp/http-echo:0.2.3 \
-- -listen=:8080 -text="cluster-b"
kubectl --context kind-cluster-b expose pod web -n app --port=8080
# Annotate both as global with local affinity
for CTX in kind-cluster-a kind-cluster-b; do
kubectl --context $CTX annotate service web -n app \
service.cilium.io/global="true" \
service.cilium.io/affinity="local"
done

Task: Simulate a failure in Cluster A and verify traffic fails over to Cluster B.

Terminal window
# From Cluster A, traffic goes to local first
kubectl --context kind-cluster-a run test -n app --rm -it --restart=Never \
--image=busybox:1.36 -- wget -qO- http://web.app.svc.cluster.local:8080
# Expected: "cluster-a" (local affinity)
# Delete the local Pod to simulate failure
kubectl --context kind-cluster-a delete pod web -n app
# Test again — should fail over to Cluster B
kubectl --context kind-cluster-a run test2 -n app --rm -it --restart=Never \
--image=busybox:1.36 -- wget -qO- http://web.app.svc.cluster.local:8080
# Expected: "cluster-b" (failover)

Task: Intentionally break cross-cluster connectivity and diagnose it.

Terminal window
# Break connectivity by removing the ClusterMesh annotation
kubectl --context kind-cluster-b annotate service echo-server -n shared \
service.cilium.io/global-
# From Cluster A, try to reach the service
kubectl --context kind-cluster-a run test -n shared --rm -it --restart=Never \
--image=busybox:1.36 -- wget --timeout=3 -qO- http://echo-server.shared.svc.cluster.local:8080
# Expected: timeout (no local endpoint, global annotation removed)
# Diagnose
cilium clustermesh status --context kind-cluster-a
kubectl --context kind-cluster-a get endpoints echo-server -n shared
# Shows: no endpoints (service not global anymore)
# Fix: re-add annotation
kubectl --context kind-cluster-b annotate service echo-server -n shared \
service.cilium.io/global="true"

Success Criteria:

  • Two kind clusters running with Cilium and ClusterMesh connected
  • Global service accessible from both clusters
  • Local affinity routing verified (traffic prefers local cluster)
  • Failover tested by deleting local Pod
  • Cross-cluster connectivity diagnosed and repaired

The CIDR Collision That Nobody Saw Coming

A logistics company acquired a competitor in 2023. Both companies ran Kubernetes. Both used the default Pod CIDR: 10.244.0.0/16. Both used the default Service CIDR: 10.96.0.0/12. The merger integration plan called for connecting the two Kubernetes environments within 90 days so that applications could be gradually migrated.

Timeline:

  • Week 1: Network team discovers the CIDR overlap. Every Pod IP in Company A’s cluster could conflict with a Pod IP in Company B’s cluster.
  • Week 3: Team evaluates options: (A) rebuild one cluster with new CIDRs ($300K, 4 weeks downtime risk), (B) use Submariner Globalnet to NAT cross-cluster traffic.
  • Week 5: Submariner Globalnet deployed. Each cluster gets a unique GlobalCIDR (242.0.0.0/16 and 243.0.0.0/16). Cross-cluster Services are assigned global IPs from these ranges.
  • Week 8: Integration testing reveals that Globalnet adds 2-3ms of latency per request due to double NAT. The payments service, which makes 6 cross-cluster calls per transaction, sees 12-18ms of added latency — enough to breach SLOs.
  • Week 12: Team decides to rebuild Company B’s cluster with non-overlapping CIDRs during a weekend maintenance window. Total cost: $180K in engineering time plus $50K in cloud compute for the parallel environment.

Lesson: CIDR allocation is a foundational decision that is extremely expensive to change later. Treat it like a database schema migration — plan it carefully at the beginning, document it centrally, and reserve enough address space for future growth. If you’re starting fresh, use /16 ranges from the RFC 5737 documentation space (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24) for test clusters and unique /16 blocks from 10.0.0.0/8 for each production cluster.


1. What are the three multi-cluster networking models, and when would you choose each?

(1) Flat networking — Direct Pod-to-Pod routing via VPC peering or VPN. Best for clusters in the same cloud provider with non-overlapping CIDRs. Lowest latency, simplest, but requires CIDR planning. (2) Overlay networking — Tunneled connectivity (IPsec/WireGuard) between gateway nodes. Best when CIDRs overlap or clusters are in different providers. Adds 5-10% overhead. (3) Service-level connectivity — Only exported Services are reachable, not all Pods. Best for security-conscious environments where you want explicit control. Medium complexity but highest security.

2. How does Cilium ClusterMesh handle cross-cluster service discovery?

Cilium ClusterMesh uses a shared etcd-based ClusterMesh API server that synchronizes endpoint information across clusters. When a Service is annotated with service.cilium.io/global: "true", its endpoints are shared with all connected clusters. Pods in any cluster can resolve the Service using the standard <name>.<namespace>.svc.cluster.local DNS name. Cilium’s eBPF dataplane routes traffic to the appropriate endpoint — local or remote — based on the affinity configuration. With affinity: local, local endpoints are preferred; remote endpoints are used only when no local endpoints are available.

3. What is Submariner's Globalnet feature and why does it exist?

Globalnet solves the overlapping Pod CIDR problem. When two clusters use the same Pod CIDR (e.g., both use 10.244.0.0/16), direct routing is impossible because the same IP could exist in both clusters. Globalnet assigns each cluster a unique “global” CIDR (e.g., 242.0.0.0/16). When a Service is exported, it gets a global IP from this range. Cross-cluster traffic is NATed: source Pod IP is translated to a global IP, routed to the remote cluster, then NATed again to the actual Pod IP. The trade-off is added latency from the double NAT.

4. Why is DNS TTL important for multi-cluster failover?

When a cluster fails, DNS-based failover removes or deprioritizes the failed cluster’s DNS records. However, clients and resolvers cache DNS responses for the duration of the TTL (Time To Live). If TTL is 300 seconds (5 minutes), clients continue sending traffic to the failed cluster for up to 5 minutes after the DNS record is updated. For fast failover, set TTL to 30-60 seconds. The trade-off is more DNS queries (higher load on DNS infrastructure and slightly higher latency for initial resolutions).

5. Scenario: You need to connect an on-premises Kubernetes cluster to an EKS cluster in AWS. The on-prem cluster uses 10.244.0.0/16 for Pods, and EKS uses the same range (aws-vpc-cni assigns VPC IPs, but you also have a secondary CIDR of 10.244.0.0/16). What are your options?

Three options: (1) Submariner with Globalnet — handles the CIDR overlap through NAT. Fastest to deploy but adds latency. (2) Rebuild one cluster with non-overlapping CIDRs. Best long-term but requires downtime or blue-green migration. (3) Skupper — operates at the application layer, so IP overlap doesn’t matter. Services are exposed individually. Lower performance but simplest network-wise. For EKS specifically, consider switching to VPC-native IPs only (no secondary CIDR) and using Skupper or Submariner Globalnet for the cross-environment link.

6. What firewall ports need to be opened for Cilium ClusterMesh between two clusters?

Cilium ClusterMesh requires: (1) TCP 2379 — etcd (ClusterMesh API server) for synchronizing endpoint and identity data between clusters. (2) TCP 4240 — Cilium health checks between nodes. (3) UDP 8472 (VXLAN) or UDP 51871 (WireGuard) — for actual Pod-to-Pod data traffic, depending on the tunnel mode. (4) TCP 4244 — Hubble relay, if using Hubble across clusters. Additionally, if using NodePort for the ClusterMesh API server, the assigned NodePort (typically 32379) must be reachable.

7. How does external-dns enable multi-cluster failover without any multi-cluster networking tool?

external-dns runs in each cluster and creates DNS records for Services/Ingresses. For failover: (1) Deploy the same Service in both clusters with the same hostname. (2) Configure external-dns with a unique --txt-owner-id per cluster so they don’t conflict. (3) Use a DNS provider that supports health checks and weighted routing (Route53, Cloudflare). (4) Configure health checks for each cluster’s endpoint. When a cluster goes down, its health check fails, and the DNS provider stops routing traffic to it. This provides cluster-level failover without any cross-cluster networking — each cluster operates independently. The limitation is that it only works for external traffic (ingress), not east-west Pod-to-Pod traffic.

8. What is the biggest risk of not planning CIDR allocation before deploying multiple clusters?

The biggest risk is CIDR overlap, which makes direct cross-cluster networking impossible and forces you into overlay solutions (Submariner Globalnet, Skupper) that add latency and complexity. Changing a cluster’s Pod CIDR after deployment is effectively a full rebuild — you must drain all nodes, reconfigure the CNI, and recreate all Pods. For a production cluster with hundreds of services, this is a multi-day operation with significant outage risk. The cost of fixing CIDR overlap retroactively is 10-100x higher than planning it correctly from the start. Always maintain a centralized CIDR registry and allocate non-overlapping ranges for every cluster.


Multi-cluster and hybrid networking extends Kubernetes beyond a single cluster boundary. The key decisions are:

  1. Choose your connectivity model — flat (performance), overlay (flexibility), or service-level (security)
  2. Plan CIDRs first — non-overlapping ranges across all clusters. This is the most important networking decision you’ll make.
  3. Use the right tool — Cilium ClusterMesh for Cilium shops, Submariner for CNI-agnostic clusters, Skupper for hybrid/non-K8s integration
  4. DNS is your failover mechanism — external-dns + health checks for cluster-level failover, CoreDNS configuration for internal cross-cluster discovery
  5. Test failover regularly — having two clusters is not HA until you’ve proven traffic shifts correctly when one fails

Multi-cluster networking is hard because it touches every layer — DNS, routing, encryption, identity, and service discovery. But it’s essential for any organization running Kubernetes at scale or across regions.

Congratulations on completing the Kubernetes Networking discipline. You now have a comprehensive understanding of how traffic flows into, within, and between Kubernetes clusters.

Recommended next tracks: