Module 3.4: DNS & Certificate Infrastructure
Complexity:
[MEDIUM]| Time: 45 minutesPrerequisites: Module 3.3: Load Balancing, CKA: DNS
Why This Module Matters
Section titled “Why This Module Matters”On AWS, Route 53 provides DNS and ACM provides TLS certificates — both fully managed, both automatic. On bare metal, you need to run your own DNS infrastructure and manage your own certificate authority. If DNS is wrong, nothing works. If certificates are wrong, everything is “untrusted.”
A healthcare company deploying on-premises Kubernetes discovered this the hard way. They had CoreDNS inside the cluster for service discovery but forgot about external DNS — the names that humans and external systems use to reach services. Their monitoring system could not resolve grafana.internal.company.com because no DNS server was authoritative for internal.company.com. Their Jenkins pipelines failed because the internal container registry registry.internal.company.com had a self-signed certificate that curl rejected. Three weeks of “why doesn’t this work?” traced back to: no DNS zone for internal names, and no trusted CA for internal certificates.
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Implement internal DNS infrastructure with authoritative and recursive resolvers for Kubernetes service discovery
- Configure a private certificate authority and automate TLS certificate issuance with cert-manager
- Design split-horizon DNS that correctly resolves internal and external names from both inside and outside the cluster
- Troubleshoot DNS resolution failures and certificate trust chain issues in air-gapped or isolated networks
What You’ll Learn
Section titled “What You’ll Learn”- Internal DNS architecture (authoritative + recursive)
- Split-horizon DNS (internal vs external resolution)
- CoreDNS as external authoritative DNS
- Certificate management with cert-manager + internal CA
- HashiCorp Vault as a private PKI
- Automated certificate rotation for K8s components
Internal DNS Architecture
Section titled “Internal DNS Architecture”The Three DNS Layers
Section titled “The Three DNS Layers”┌─────────────────────────────────────────────────────────────┐│ DNS LAYERS FOR ON-PREM K8s ││ ││ Layer 1: Kubernetes Internal DNS (CoreDNS in cluster) ││ ├── Resolves: service.namespace.svc.cluster.local ││ ├── Managed by: Kubernetes automatically ││ └── Scope: pods and services within the cluster ││ ││ Layer 2: Internal Corporate DNS ││ ├── Resolves: *.internal.company.com ││ ├── Managed by: you (BIND, CoreDNS, PowerDNS) ││ └── Scope: internal services reachable by name ││ (grafana.internal.company.com → MetalLB VIP) ││ ││ Layer 3: External/Public DNS ││ ├── Resolves: *.company.com (public) ││ ├── Managed by: DNS provider (Cloudflare, Route53, etc.) ││ └── Scope: internet-facing services ││ ││ Pod DNS resolution chain: ││ Pod → CoreDNS (L1) → Corporate DNS (L2) → Public DNS (L3)││ │└─────────────────────────────────────────────────────────────┘Pause and predict: A developer deploys a new service and creates a Kubernetes Service with a MetalLB VIP. Other pods in the cluster can reach it via its ClusterIP. But when they try
curl https://myservice.internal.company.com, it fails with “could not resolve host.” What is missing from the DNS chain, and at which layer does the resolution break?
Split-Horizon DNS
Section titled “Split-Horizon DNS”Internal and external clients resolve the same name to different IPs:
┌─────────────────────────────────────────────────────────────┐│ SPLIT-HORIZON DNS ││ ││ Internal query: app.company.com ││ └── Resolved by internal DNS: 10.0.50.10 (MetalLB VIP) ││ ││ External query: app.company.com ││ └── Resolved by public DNS: 203.0.113.50 (public IP) ││ ││ Implementation: ││ ┌──────────────┐ ┌──────────────┐ ││ │ Internal DNS │ │ Public DNS │ ││ │ (BIND/CoreDNS)│ │ (Cloudflare) │ ││ │ │ │ │ ││ │ app.company │ │ app.company │ ││ │ → 10.0.50.10│ │ → 203.0.113.50│ ││ └──────────────┘ └──────────────┘ ││ ││ Corporate DNS servers: 10.0.10.1, 10.0.10.2 ││ K8s CoreDNS forwards to corporate DNS for non-cluster names││ │└─────────────────────────────────────────────────────────────┘CoreDNS Configuration for Forwarding
Section titled “CoreDNS Configuration for Forwarding”# CoreDNS ConfigMap — forward non-cluster queries to corporate DNSapiVersion: v1kind: ConfigMapmetadata: name: coredns namespace: kube-systemdata: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } # Forward internal domains to corporate DNS forward internal.company.com 10.0.10.1 10.0.10.2 # Forward everything else to public DNS forward . 8.8.8.8 1.1.1.1 cache 30 loop reload loadbalance }External Authoritative DNS with CoreDNS
Section titled “External Authoritative DNS with CoreDNS”Run a second CoreDNS instance (outside the cluster) as the authoritative DNS for your internal zone:
# CoreDNS config for internal.company.com zone.:53 { file /etc/coredns/db.internal.company.com internal.company.com forward . 8.8.8.8 1.1.1.1 # Forward public queries upstream cache 300 log}; /etc/coredns/db.internal.company.com$ORIGIN internal.company.com.$TTL 300
@ IN SOA ns1.internal.company.com. admin.company.com. ( 2024010101 ; Serial 3600 ; Refresh 900 ; Retry 604800 ; Expire 300 ) ; Minimum TTL
IN NS ns1.internal.company.com. IN NS ns2.internal.company.com.
ns1 IN A 10.0.10.1ns2 IN A 10.0.10.2
; Kubernetes services (MetalLB VIPs)grafana IN A 10.0.50.10argocd IN A 10.0.50.11registry IN A 10.0.50.12vault IN A 10.0.50.13
; Wildcard for ingress*.apps IN A 10.0.50.20
; Infrastructureapi IN A 10.0.20.100 ; kube-vip API server VIPCertificate Infrastructure
Section titled “Certificate Infrastructure”The Problem
Section titled “The Problem”On bare metal, there is no AWS ACM, no GCP Certificate Manager. You need TLS certificates for:
- Kubernetes API server (kubelet-to-API, kubectl-to-API)
- etcd (peer-to-peer, client-to-server)
- Ingress (HTTPS for user-facing services)
- Service mesh (mTLS between pods)
- Internal tools (Grafana, ArgoCD, Vault, Harbor)
Stop and think: Your team has been using
curl --insecureandkubectl --insecure-skip-tls-verifyeverywhere because internal services use self-signed certificates. Beyond the inconvenience, what specific security risks does this create? How does a proper CA chain (shown below) eliminate these risks?
Option 1: cert-manager with Internal CA
Section titled “Option 1: cert-manager with Internal CA”The cert-manager setup below creates a two-tier certificate hierarchy: a self-signed root CA that signs an intermediate issuer, which then signs individual service certificates. This mirrors how public CAs work and allows you to rotate the intermediate without redistributing the root:
# Install cert-managerkubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
# Create a self-signed root CAapiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: selfsigned-issuerspec: selfSigned: {}
---# Generate a CA certificateapiVersion: cert-manager.io/v1kind: Certificatemetadata: name: internal-ca namespace: cert-managerspec: isCA: true commonName: "KubeDojo Internal CA" secretName: internal-ca-secret duration: 87600h # 10 years renewBefore: 8760h # Renew 1 year before expiry privateKey: algorithm: ECDSA size: 256 issuerRef: name: selfsigned-issuer kind: ClusterIssuer
---# Create an issuer using the CAapiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: internal-ca-issuerspec: ca: secretName: internal-ca-secret
---# Issue a certificate for an internal serviceapiVersion: cert-manager.io/v1kind: Certificatemetadata: name: grafana-tls namespace: monitoringspec: secretName: grafana-tls-secret duration: 8760h # 1 year renewBefore: 720h # Renew 30 days before expiry dnsNames: - grafana.internal.company.com - grafana.monitoring.svc.cluster.local issuerRef: name: internal-ca-issuer kind: ClusterIssuerOption 2: cert-manager with Vault PKI
Section titled “Option 2: cert-manager with Vault PKI”HashiCorp Vault provides a more robust PKI with audit logging, short-lived certificates, and HSM backing:
# Enable Vault PKI secrets enginevault secrets enable pki
# Configure max TTLvault secrets tune -max-lease-ttl=87600h pki
# Generate root CAvault write -field=certificate pki/root/generate/internal \ common_name="KubeDojo Root CA" \ ttl=87600h > root-ca.crt
# Enable intermediate PKIvault secrets enable -path=pki_int pkivault secrets tune -max-lease-ttl=43800h pki_int
# Generate intermediate CA (signed by root)vault write -format=json pki_int/intermediate/generate/internal \ common_name="KubeDojo Intermediate CA" | jq -r '.data.csr' > intermediate.csr
vault write -format=json pki/root/sign-intermediate \ csr=@intermediate.csr format=pem_bundle ttl=43800h \ | jq -r '.data.certificate' > intermediate.crt
vault write pki_int/intermediate/set-signed certificate=@intermediate.crt
# Create a role for K8s certificatesvault write pki_int/roles/kubernetes \ allowed_domains="internal.company.com,svc.cluster.local" \ allow_subdomains=true \ max_ttl=720h
# Enable Kubernetes auth method in Vaultvault auth enable kubernetes
# Configure Vault to talk to the K8s APIvault write auth/kubernetes/config \ kubernetes_host="https://$(kubectl get svc kubernetes -o jsonpath='{.spec.clusterIP}'):443"
# Create a policy allowing cert-manager to sign certificatesvault policy write cert-manager - <<POLICYpath "pki_int/sign/kubernetes" { capabilities = ["create", "update"]}POLICY
# Create a Kubernetes auth role for cert-managervault write auth/kubernetes/role/cert-manager \ bound_service_account_names=cert-manager \ bound_service_account_namespaces=cert-manager \ policies=cert-manager \ ttl=1h# cert-manager Vault issuerapiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: vault-issuerspec: vault: server: https://vault.internal.company.com:8200 path: pki_int/sign/kubernetes auth: kubernetes: role: cert-manager mountPath: /v1/auth/kubernetes serviceAccountRef: name: cert-managerPause and predict: You have just created a beautiful internal CA and issued certificates for all your services. A new pod tries to connect to
registry.internal.company.comover HTTPS and gets “certificate signed by unknown authority.” The certificate is valid and the CA signed it correctly. What step did you miss?
Distributing the Internal CA
Section titled “Distributing the Internal CA”For internal certificates to be trusted by browsers, tools, and pods, the CA certificate must be distributed:
# Add to system trust store (Ubuntu)cp root-ca.crt /usr/local/share/ca-certificates/kubedojo-ca.crtupdate-ca-certificates
# Add to pods (Kubernetes ConfigMap)kubectl create configmap internal-ca \ --from-file=ca.crt=root-ca.crt \ -n default
# Mount in podsvolumes: - name: ca-certs configMap: name: internal-cavolumeMounts: - name: ca-certs mountPath: /etc/ssl/certs/kubedojo-ca.crt subPath: ca.crtDid You Know?
Section titled “Did You Know?”-
Let’s Encrypt certificates work on-premises too — if your internal services have public DNS names and are reachable from the internet for HTTP-01 validation. For truly internal services, you need your own CA.
-
Kubernetes automatically rotates kubelet certificates when
--rotate-certificatesis enabled. But etcd certificates, API server certificates, and webhook certificates require manual rotation or cert-manager. -
The default Kubernetes CA certificate expires after 10 years (kubeadm default). Many organizations will hit this limit on clusters deployed in 2015-2016. When it expires, every component that trusts it breaks simultaneously.
-
ACME protocol (used by Let’s Encrypt) can work with internal CAs via step-ca, an open source ACME server. This lets you use cert-manager’s ACME issuer with your own private CA, getting automatic renewal without exposing anything to the internet.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| No internal DNS server | Services only reachable by IP, not name | Run CoreDNS/BIND for internal zone |
| Self-signed certs everywhere | Every tool shows “untrusted”, scripts need --insecure | Use a proper CA chain, distribute root CA |
| Forgetting cert rotation | Certificates expire, services stop | cert-manager with auto-renewal |
| Corporate DNS not redundant | Single DNS server = SPOF | At least 2 DNS servers, different racks |
| No split-horizon | Internal services resolved to public IPs | Separate internal/external DNS views |
| CA key on a shared server | CA compromise = all certs compromised | Vault + HSM for CA key protection |
| Not trusting CA in pods | Pods can’t verify internal services | Mount CA cert via ConfigMap or init container |
Question 1
Section titled “Question 1”A pod needs to reach grafana.internal.company.com. Trace the DNS resolution path.
Answer
-
Pod’s DNS resolver (
/etc/resolv.conf) points to CoreDNS cluster IP (e.g., 10.96.0.10) -
CoreDNS (in-cluster) receives the query. It checks:
- Is
grafana.internal.company.coma Kubernetes service? No (not*.svc.cluster.local) - Forward rule:
forward internal.company.com 10.0.10.1 10.0.10.2
- Is
-
Corporate DNS (10.0.10.1) receives the query. It is authoritative for
internal.company.com:- Looks up zone file:
grafana IN A 10.0.50.10 - Returns: 10.0.50.10
- Looks up zone file:
-
Pod receives the answer and connects to 10.0.50.10 (MetalLB VIP for Grafana)
Total chain: Pod → CoreDNS (cluster) → Corporate DNS → Answer
If the CoreDNS forward rule for internal.company.com is missing, the query falls through to the generic forwarder (. 10.0.10.1) and still works — but having the explicit forward is clearer and prevents leaking internal names to public DNS if the generic forwarder uses 8.8.8.8.
Question 2
Section titled “Question 2”Why use Vault PKI instead of a simple self-signed CA for internal certificates?
Answer
Self-signed CA limitations:
- CA private key stored in a Kubernetes Secret (accessible to cluster admins)
- No audit trail of which certificates were issued
- No revocation capability (CRL/OCSP)
- Renewing the CA requires manual intervention across all trust stores
- No role-based access control for certificate issuance
Vault PKI advantages:
- CA private key protected by Vault (sealed, audit-logged, optionally HSM-backed)
- Full audit trail of every certificate issued
- Short-lived certificates (hours instead of years) — reduces blast radius of compromise
- Role-based access: only cert-manager can issue K8s certs, only CI/CD can issue pipeline certs
- Automatic CRL/OCSP for revocation
- Integrates with cert-manager for automated renewal
When self-signed CA is fine: Dev/test environments, small clusters without compliance requirements.
When Vault is needed: Production, regulated industries, environments where certificate issuance must be audited.
Question 3
Section titled “Question 3”Your cert-manager certificate shows Ready: False with reason OrderFailed. What do you check?
Answer
Debug steps:
# Check certificate statuskubectl describe certificate grafana-tls -n monitoring
# Check the orderkubectl get orders -n monitoringkubectl describe order <order-name> -n monitoring
# Check the challenge (if ACME issuer)kubectl get challenges -n monitoring
# Common causes:# 1. Issuer not readykubectl describe clusterissuer internal-ca-issuer# Check: Status.Conditions[0].Type == "Ready"
# 2. Secret not found (CA issuer)kubectl get secret internal-ca-secret -n cert-manager# If missing: the CA certificate was not created
# 3. DNS name not allowed by issuer# Check Vault role's allowed_domains or CA issuer constraints
# 4. Vault authentication failed# Check cert-manager logskubectl logs -n cert-manager -l app=cert-managerQuestion 4
Section titled “Question 4”You need HTTPS for *.apps.internal.company.com (wildcard). How do you set this up with cert-manager?
Answer
apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: wildcard-apps-tls namespace: monitoringspec: secretName: wildcard-apps-tls-secret duration: 2160h # 90 days renewBefore: 360h # Renew 15 days before expiry dnsNames: - "*.apps.internal.company.com" - "apps.internal.company.com" # Also include the bare domain issuerRef: name: internal-ca-issuer kind: ClusterIssuerThe wildcard certificate is stored as a Kubernetes Secret and can be referenced by your ingress controller:
apiVersion: networking.k8s.io/v1kind: Ingressmetadata: name: grafana namespace: monitoringspec: tls: - hosts: - grafana.apps.internal.company.com secretName: wildcard-apps-tls-secret rules: - host: grafana.apps.internal.company.com http: paths: - path: / pathType: Prefix backend: service: name: grafana port: number: 3000Important: The wildcard cert secret must be in the same namespace as the Ingress, or use a tool like reflector to copy it across namespaces.
Hands-On Exercise: Internal DNS and Certificates
Section titled “Hands-On Exercise: Internal DNS and Certificates”Task: Set up internal DNS resolution and issue a TLS certificate in a kind cluster.
# Create clusterkind create cluster --name dns-lab
# Install cert-managerkubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yamlkubectl wait --for=condition=Available deployment/cert-manager -n cert-manager --timeout=120s
# Create a self-signed CAkubectl apply -f - <<EOFapiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: selfsignedspec: selfSigned: {}---apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: lab-ca namespace: cert-managerspec: isCA: true commonName: "Lab CA" secretName: lab-ca-secret duration: 87600h issuerRef: name: selfsigned kind: ClusterIssuer---apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: lab-ca-issuerspec: ca: secretName: lab-ca-secretEOF
# Wait for the CA to be readykubectl wait --for=condition=Ready certificate/lab-ca -n cert-manager --timeout=60s
# Issue a certificate for a test servicekubectl create namespace demokubectl apply -f - <<EOFapiVersion: cert-manager.io/v1kind: Certificatemetadata: name: demo-tls namespace: demospec: secretName: demo-tls-secret dnsNames: - demo.apps.lab.local issuerRef: name: lab-ca-issuer kind: ClusterIssuerEOF
# Verify certificate was issuedkubectl get certificate demo-tls -n demo# NAME READY SECRET AGE# demo-tls True demo-tls-secret 10s
# Inspect the certificatekubectl get secret demo-tls-secret -n demo -o jsonpath='{.data.tls\.crt}' | \ base64 -d | openssl x509 -text -noout | head -20
# Cleanupkind delete cluster --name dns-labSuccess Criteria
Section titled “Success Criteria”- cert-manager installed and running
- Self-signed CA created (ClusterIssuer ready)
- TLS certificate issued for
demo.apps.lab.local - Certificate contains correct DNS SANs
- Certificate is signed by the Lab CA (not self-signed)
Next Module
Section titled “Next Module”Continue to Module 4.1: Storage Architecture Decisions to learn how to design storage for on-premises Kubernetes clusters.