Module 3.2: Endpoints & EndpointSlices
Complexity:
[MEDIUM]- Understanding service mechanicsTime to Complete: 30-40 minutes
Prerequisites: Module 3.1 (Services)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Diagnose traffic routing failures by evaluating EndpointSlice conditions and pod readiness.
- Design manual endpoint configurations to integrate external databases into your Kubernetes service mesh.
- Compare the performance characteristics and limitations of legacy Endpoints versus modern EndpointSlices.
- Implement headless services for stateful workloads requiring direct pod addressing.
- Evaluate cluster topology to configure zone-aware routing using EndpointSlice hints.
Why This Module Matters
Section titled “Why This Module Matters”Hypothetical scenario: your team operates an online checkout API that normally runs a few dozen replicas, then a seasonal traffic event pushes the autoscaler into four-digit replica counts. The Service object still exists, DNS still resolves, and many pods are Ready, but traffic distribution becomes uneven because part of the routing data model cannot represent the backend set cleanly. Nothing about that failure looks dramatic at first; it appears as latency, hot pods, and confusing disagreement between different inspection commands.
That is why a CKA candidate needs to look beneath the Service abstraction. A Service is the stable address, but endpoints are the current answer to a more operational question: which network destinations should receive traffic right now? When selectors, readiness probes, EndpointSlice conditions, or topology hints disagree, the Service name can look healthy while the actual routing set is empty, stale, truncated, or intentionally narrowed.
Kubernetes 1.35 uses EndpointSlices as the scalable discovery API behind Services, while the older Endpoints API remains visible mostly for compatibility and troubleshooting. This module teaches you how those objects are produced, how to read their signals during an incident, and how to design selectorless or headless services without accidentally creating a dead name. You will practice the same mental model repeatedly: start with the stable Service, inspect the backing endpoints, explain why each backend is included or excluded, and then choose the lowest-risk correction.
Part 1: Endpoints Fundamentals
Section titled “Part 1: Endpoints Fundamentals”1.1 What Are Endpoints?
Section titled “1.1 What Are Endpoints?”Endpoints are the glue between Services and Pods. A Service gives clients a durable name and usually a stable virtual IP, but a Pod IP is temporary by design; it changes when a replica is rescheduled, replaced, or recreated. The endpoint layer bridges that gap by recording the currently eligible backend addresses for a Service, along with the port information needed to connect to them.
Think of the Service as a reception desk and the endpoint set as the current room assignment sheet. Callers do not need to know which room each engineer occupies today, but the receptionist needs a fresh sheet or the call goes nowhere. In Kubernetes, the controller manager continually reconciles Services and Pods so that the sheet reflects labels, readiness, deletion, and replacement. When the sheet is empty, no amount of confidence in the reception desk helps the caller reach a person.
Legacy Visualization Reference:
View original ASCII architecture diagram
┌────────────────────────────────────────────────────────────────┐│ Service → Endpoints → Pods ││ ││ ┌──────────────────┐ ┌──────────────────┐ ││ │ Service │ │ Endpoints │ ││ │ web-svc │ │ web-svc │ ││ │ │ │ │ ││ │ selector: │───►│ subsets: │ ││ │ app: web │ │ - addresses: │ ││ │ │ │ - 10.244.1.5 │───► Pod 1 ││ │ ports: │ │ - 10.244.2.8 │───► Pod 2 ││ │ - port: 80 │ │ - 10.244.1.12 │───► Pod 3 ││ │ │ │ ports: │ ││ │ │ │ - port: 8080 │ ││ └──────────────────┘ └──────────────────┘ ││ ││ Endpoints auto-created and updated by endpoint controller ││ │└────────────────────────────────────────────────────────────────┘Here is the modern routing flow visualized:
flowchart LR S[Service\nweb-svc\nselector: app=web] -->|Watches| E[Endpoints\nweb-svc] E -->|subsets.addresses| P1[Pod 1\n10.244.1.5] E -->|subsets.addresses| P2[Pod 2\n10.244.2.8] E -->|subsets.addresses| P3[Pod 3\n10.244.1.12] classDef k8s fill:#326ce5,stroke:#fff,stroke-width:2px,color:#fff; class S,E k8s;The legacy Endpoints object stores addresses in subsets, which group a set of IPs with a set of ports. That shape matters when you debug older tools because the object does not simply say “three pods”; it says “these addresses are ready for these ports, and these other addresses are not ready.” A selector-based Service normally receives this object automatically, so manually editing it is usually a sign that you are fighting the controller rather than fixing the workload.
Pause and predict: if a Deployment has three Pods, but only two pass readiness, how many Pod IPs should be usable through the Service and where would you expect to find the excluded address? The correct answer is not “all Running pods.” Kubernetes separates liveness, scheduling state, and readiness so that an alive container can be protected from user traffic while it starts, warms caches, or waits for a dependency.
1.2 Endpoint Lifecycle
Section titled “1.2 Endpoint Lifecycle”The Endpoint Controller operates in the Kubernetes control plane, constantly reconciling the state of the cluster to ensure Endpoints stay accurate. It watches Services because selectors define intent, and it watches Pods because labels, IPs, readiness, node placement, and deletion determine the actual backend set. The controller does not make traffic flow by itself; it writes discovery objects that other components, such as kube-proxy, CoreDNS, and ingress controllers, can consume.
The lifecycle is deliberately boring when everything is healthy. A Pod appears with labels that match a Service selector, the Pod becomes Ready, the controller records its IP as a backend, and clients can be routed to it. When the Pod fails readiness or starts terminating, the controller changes the discovery object so service proxies stop treating that endpoint as a normal destination. The practical diagnostic lesson is simple: always ask whether the Service selector matched the Pod, and then ask whether readiness allowed that match to become routable.
View original ASCII lifecycle diagram
┌────────────────────────────────────────────────────────────────┐│ Endpoint Controller ││ ││ Watches: Pods and Services ││ Updates: Endpoints objects ││ ││ Pod Created (label: app=web) ││ │ ││ ▼ ││ Controller finds Service with selector app=web ││ │ ││ ▼ ││ Adds Pod IP to Service's Endpoints ││ ││ Pod Deleted or Fails Readiness ││ │ ││ ▼ ││ Removes Pod IP from Endpoints ││ │└────────────────────────────────────────────────────────────────┘sequenceDiagram participant P as Pod (app=web) participant C as Endpoint Controller participant E as Endpoints Object
P->>C: Pod Created & Ready C->>C: Match Service Selector (app=web) C->>E: Add Pod IP to Addresses
P-->>C: Pod Deleted / Readiness Fails C->>E: Remove Pod IP from Addresses (or move to NotReady)One useful habit is to treat endpoint generation as a reconciliation chain rather than as a single command output. The Service must exist in the namespace you are querying, its selector must match the labels that are actually on the Pods, the Pods must have assigned IP addresses, and readiness must permit routing. If any link breaks, kubectl get svc can still look normal while kubectl get endpoints or kubectl get endpointslices reveals the real failure.
1.3 Viewing Endpoints
Section titled “1.3 Viewing Endpoints”You can inspect the legacy Endpoints API using standard kubectl commands. Even though EndpointSlices are the modern source for service proxies in current Kubernetes, legacy Endpoints output remains useful because many humans, scripts, and older controllers still expose it during troubleshooting. Use it as a quick compatibility view, then confirm anything scale-sensitive or condition-sensitive with EndpointSlices.
# List all endpointskubectl get endpointskubectl get ep # Short form
# Get specific endpointkubectl get endpoints web-svc
# Detailed viewkubectl describe endpoints web-svc
# Get endpoints as YAMLkubectl get endpoints web-svc -o yaml
# Wide output with pod IPskubectl get endpoints -o wideThe short form ep is fine because it is a real Kubernetes resource abbreviation, not a shell alias. Avoid using a local shorthand for kubectl in shared runbooks, copied exercises, or CI scripts because aliases often do not expand in non-interactive shells. A module that teaches reliable operations should make every command copy-pasteable without depending on a learner’s terminal profile.
1.4 Endpoint Structure
Section titled “1.4 Endpoint Structure”The YAML shape explains several common debugging surprises. Ready addresses and not-ready addresses can live in the same object, which means an endpoint object can exist without every listed Pod being eligible for normal traffic. The targetRef links an address back to the Pod that produced it, the nodeName shows where that backend resides, and ports describes the backend port, not just the front-door Service port.
# What an Endpoints object looks likeapiVersion: v1kind: Endpointsmetadata: name: web-svc # Must match Service name namespace: defaultsubsets:- addresses: # Ready pod IPs - ip: 10.244.1.5 nodeName: worker-1 targetRef: kind: Pod name: web-abc123 namespace: default - ip: 10.244.2.8 nodeName: worker-2 targetRef: kind: Pod name: web-def456 namespace: default notReadyAddresses: # Pods not passing readiness probe - ip: 10.244.1.12 nodeName: worker-1 targetRef: kind: Pod name: web-ghi789 namespace: default ports: - port: 8080 protocol: TCPWhen you read this object, do not stop at the address list. Ask whether the backend port matches the container port the application actually listens on, whether the endpoint is ready or not ready, and whether the Pod reference still exists. A stale-looking reference usually points to a reconciliation lag or a race you should recheck, while a stable not-ready entry points toward application health, probe configuration, or dependency readiness.
Pause and predict: You have a Service with 3 endpoints. You add a readiness probe to the deployment that checks
/healthz, but the endpoint on your app returns 500 for that path. What happens to the Service’s endpoints, and can clients still reach the app?
The key is that readiness is an admission control signal for Service traffic, not a statement about whether the process exists. If all replicas fail the probe, clients can resolve the Service name but receive no normal backend through the Service. That behavior is protective, because sending users to a Pod that says it is not ready is usually worse than surfacing a clean failure that pushes the operator toward the probe or dependency problem.
Part 2: The Shift to EndpointSlices
Section titled “Part 2: The Shift to EndpointSlices”As clusters grew, the legacy Endpoints API became a bottleneck because it represented an entire backend set in one object. Any Pod addition, deletion, readiness transition, or IP change could require a large object update, and every watcher had to process the new version. That design is tolerable for small Services, but it becomes wasteful when a single Service has hundreds or thousands of backends across many nodes and zones.
EndpointSlices solve the same discovery problem with smaller, shard-like objects under discovery.k8s.io/v1. Kubernetes has treated EndpointSlices as stable since v1.21, and in Kubernetes 1.35 they are the API you should consider first for scalable service discovery, dual-stack addressing, endpoint conditions, and topology-aware routing. The older Endpoints API remains relevant mainly because you still encounter it in commands, legacy integrations, and compatibility behavior.
2.1 Why EndpointSlices?
Section titled “2.1 Why EndpointSlices?”View original ASCII EndpointSlices diagram
┌────────────────────────────────────────────────────────────────┐│ Endpoints Problem ││ ││ Large Service with 5000 pods ││ ││ Single Endpoints object: ││ - Contains all 5000 IPs ││ - Any pod change = entire object update ││ - Large payload sent to all watchers ││ - API server and etcd strain ││ │└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐│ EndpointSlices Solution ││ ││ Same 5000 pods split across 50 slices ││ ││ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ │ Slice 1 │ │ Slice 2 │ │ Slice 3 │ ... │Slice 50 │ ││ │ 100 IPs │ │ 100 IPs │ │ 100 IPs │ │ 100 IPs │ ││ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ││ ││ Pod change = update only affected slice ││ Small payload, minimal API server load ││ │└────────────────────────────────────────────────────────────────┘flowchart TD subgraph Legacy Endpoints Bottleneck S1[Large Service\n5000 Pods] --> E1[Single Endpoints Object\n5000 IPs] E1 -->|Any single Pod change requires\ntransmitting entire 5000 IP list| API1[Massive API Server Strain] end subgraph Modern EndpointSlices Solution S2[Large Service\n5000 Pods] --> ES1[Slice 1\n100 IPs] S2 --> ES2[Slice 2\n100 IPs] S2 --> ESn[Slice 50\n100 IPs] ES2 -->|Pod Change| API2[Only affected 100 IP slice transmits\nMinimal Payload] endThe default EndpointSlice controller creates another slice when existing slices for a Service reach about 100 endpoints and one more endpoint is needed. The controller can be configured with --max-endpoints-per-slice, but larger slices trade fewer objects for larger updates, so the default is a reasonable operational compromise. The point is not that 100 is magic; the point is that a change to one backend should not force the whole cluster to reprocess every backend.
EndpointSlices also carry details that the legacy object either cannot express or expresses poorly. The addressType can distinguish IPv4, IPv6, and FQDN-shaped addresses, although Kubernetes service proxies process IPv4 and IPv6 for normal routing. Endpoint conditions add richer lifecycle meaning, while topology hints can help proxies prefer same-zone endpoints when the Service and cluster configuration allow it. Those fields turn EndpointSlices from a scale fix into the central service-discovery API.
Stop and think: Imagine a Service backed by 5,000 pods. Every time a single pod is added or removed, the entire Endpoints object containing all 5,000 IPs must be sent to every node watching it. What problem does this create, and how would you design a better solution?
The design pressure is bandwidth, CPU, watch latency, and unnecessary churn in components that need routing data. A better design splits the backend set into bounded chunks, lets each chunk carry enough metadata to be useful on its own, and updates only the affected chunk when one Pod changes. That is exactly the shape EndpointSlices provide, which is why modern debugging should include them even when the legacy output looks familiar.
2.2 EndpointSlice Structure
Section titled “2.2 EndpointSlice Structure”An EndpointSlice is a dedicated Kubernetes API resource under discovery.k8s.io/v1. A single Service may be associated with multiple EndpointSlice objects, dynamically tied via the kubernetes.io/service-name label. That label is not decoration; it is how you reliably list the slices for one Service without relying on generated object names.
# What an EndpointSlice looks likeapiVersion: discovery.k8s.io/v1kind: EndpointSlicemetadata: name: web-svc-abc12 # Auto-generated name labels: kubernetes.io/service-name: web-svcaddressType: IPv4ports:- name: "" port: 8080 protocol: TCPendpoints:- addresses: - 10.244.1.5 conditions: ready: true serving: true terminating: false nodeName: worker-1 targetRef: kind: Pod name: web-abc123 namespace: default- addresses: - 10.244.2.8 conditions: ready: true nodeName: worker-2The fields you inspect most often are ports, addressType, endpoints[].addresses, endpoints[].conditions, endpoints[].nodeName, and optionally endpoints[].zone or hints. Transient duplication can occur because controllers and watchers do not observe every update at the same instant, so do not build fragile automation that assumes a backend appears in exactly one slice at every millisecond. For operational debugging, compare the stable shape over repeated reads and focus on labels and conditions.
2.3 Viewing EndpointSlices
Section titled “2.3 Viewing EndpointSlices”EndpointSlices have their own resource names and generated object names, so the safest query starts with the Service label. When you only run kubectl get endpointslices, a busy namespace can produce a noisy list that hides the slice you care about. Filtering by kubernetes.io/service-name keeps the output tied to the Service under investigation and matches how controllers associate slices with Services.
# List all EndpointSliceskubectl get endpointsliceskubectl get eps # Short form (might conflict with endpoints)
# Get EndpointSlices for a servicekubectl get endpointslices -l kubernetes.io/service-name=web-svc
# Detailed viewkubectl describe endpointslice web-svc-abc12
# Get as YAMLkubectl get endpointslice web-svc-abc12 -o yamlBefore running a describe command, first list the slices and copy the generated name that currently exists. The generated suffix can change when the controller reshapes the slice set, and hard-coding it in runbooks creates brittle instructions. In troubleshooting notes, prefer the label selector query because it survives controller-generated name changes.
2.4 Endpoints vs EndpointSlices Comparison
Section titled “2.4 Endpoints vs EndpointSlices Comparison”| Aspect | Endpoints | EndpointSlices |
|---|---|---|
| Max entries | Unlimited (but problematic) | 100 per slice |
| Update scope | Entire object | Single slice |
| API version | v1 | discovery.k8s.io/v1 |
| Default since | Always | Kubernetes 1.21 |
| Dual-stack support | Limited | Full IPv4/IPv6 |
| Topology hints | No | Yes |
This comparison is useful, but it needs one correction in your mental model: “unlimited” in the legacy Endpoints row does not mean safe at any size. Kubernetes documents that the legacy Endpoints API truncates over-capacity backend sets at 1,000 endpoints and marks the object with the endpoints.kubernetes.io/over-capacity: truncated annotation. If a tool reads only the legacy object, it can miss backends that EndpointSlices still represent.
For exam and production work, treat EndpointSlices as the authoritative modern view while still knowing how to interpret Endpoints. You will often inspect both because old commands are fast and familiar, but your conclusion should account for which API has the fidelity needed for the situation. Large Services, dual-stack clusters, terminating endpoint behavior, and topology-aware routing all push you toward EndpointSlices.
Part 3: Debugging Routing and Conditions
Section titled “Part 3: Debugging Routing and Conditions”3.1 No Endpoints Means No Traffic
Section titled “3.1 No Endpoints Means No Traffic”When traffic drops, your first stop is usually verifying endpoint generation. A Service with no endpoints is like a load balancer with a listener and no healthy backend targets; the front door exists, but it has nowhere useful to send the request. This is one of the fastest failures to diagnose if you resist the temptation to start with application logs before checking the discovery layer.
# Service exists but has no endpointskubectl get svc web-svc# NAME TYPE CLUSTER-IP PORT(S)# web-svc ClusterIP 10.96.45.123 80/TCP
kubectl get endpoints web-svc# NAME ENDPOINTS AGE# web-svc <none> 5m ← Problem!An empty endpoint result does not prove the Service is broken by itself. It tells you the Service selector, the Pod labels, the namespace, the Pod phase, or readiness is preventing a backend from entering the routable set. The right move is to preserve the chain of evidence: record the Service selector, list the matching Pods, check readiness, and then inspect EndpointSlice conditions if the legacy output is ambiguous.
3.2 Common Causes of Missing Endpoints
Section titled “3.2 Common Causes of Missing Endpoints”| Symptom | Cause | Debug Command | Solution |
|---|---|---|---|
<none> endpoints | No pods match selector | kubectl get pods --show-labels | Fix selector or pod labels |
<none> endpoints | Pods not running | kubectl get pods | Fix pod issues |
<none> endpoints | Pods in wrong namespace | kubectl get pods -A | Check namespace |
| Partial endpoints | Some pods not ready | kubectl describe endpoints | Check readiness probes |
Selector problems are especially common after template rewrites because a Deployment can keep creating healthy Pods with labels the Service no longer selects. Namespace mistakes look similar because Services select Pods only in their own namespace, even when the same label exists elsewhere. Readiness problems are subtler because the Pod can be Running and still excluded from normal Service traffic; that is a feature, not a contradiction.
Before running this, what output do you expect if the Service selector is app: web but the Pods are labeled app: api? You should expect the selector query to return no Pods, and therefore the endpoint objects should have no ready backend addresses. If the selector query does return Pods, shift your attention to readiness, ports, and conditions rather than repeatedly recreating the Service.
3.3 Debugging Workflow
Section titled “3.3 Debugging Workflow”The workflow below starts with the smallest question and only moves deeper when the previous answer does not explain the failure. That matters during incidents because random command hopping creates contradictory notes and makes handoff harder. Each command either confirms the Service exists, confirms the selector, confirms matching Pods, or checks whether readiness keeps those Pods out of the endpoint set.
# Step 1: Check if endpoints existkubectl get endpoints web-svc# If <none>, proceed to step 2
# Step 2: Check service selectorkubectl get svc web-svc -o yaml | grep -A5 selector# selector:# app: web
# Step 3: Find pods with matching labelskubectl get pods --selector=app=web# Should list pods backing the service
# Step 4: If no pods found, check what labels pods havekubectl get pods --show-labels# Compare with service selector
# Step 5: If pods exist but not in endpoints, check pod statuskubectl get pods# Look for pods that aren't Running
# Step 6: Check for readiness probe failureskubectl describe pod <pod-name> | grep -A10 ReadinessThe workflow deliberately uses the Service selector as a boundary. If kubectl get pods --selector=app=web returns nothing, you have a selection problem; describing EndpointSlices will mostly confirm what you already know. If matching Pods exist and are Ready, then EndpointSlice inspection becomes more valuable because it can reveal port mismatches, condition values, terminating endpoints, or topology information that the old Endpoints view compresses away.
3.4 Endpoints with NotReady Pods and EndpointSlice Conditions
Section titled “3.4 Endpoints with NotReady Pods and EndpointSlice Conditions”Readiness is where many learners first realize that endpoints are not just a list of Pod IPs. Kubernetes can know about a Pod, include a reference to it in discovery data, and still protect clients from it. The legacy object shows this as NotReadyAddresses; EndpointSlices express the same idea through conditions that are more precise for modern service proxies.
# Describe shows both ready and not-ready addresseskubectl describe endpoints web-svc
# Output:# Name: web-svc# Subsets:# Addresses: 10.244.1.5,10.244.2.8# NotReadyAddresses: 10.244.1.12# Ports:# Name Port Protocol# ---- ---- --------# <unset> 8080 TCPPods listed in NotReadyAddresses are actively shielded from receiving normal traffic. In the modern EndpointSlice API, routing is driven by endpoint conditions: ready, serving, and terminating. The serving condition maps directly to pod readiness for pod-backed endpoints, while terminating indicates the endpoint is currently shutting down.
For pod-backed endpoints, nil values in ready or serving are safely interpreted as true, while nil in terminating is interpreted as false. Service proxies normally ignore terminating endpoints, but they may route to endpoints marked as both serving and terminating if all available endpoints are terminating. That fallback helps avoid an immediate total outage during aggressive rollouts, but you should still fix the rollout shape so healthy replacements exist before old Pods disappear.
When diagnosing a readiness-related outage, keep the human explanation tied to the condition. Say “the Pods are Running but excluded because readiness is false” instead of “Kubernetes lost the Pods.” That phrasing prevents the team from deleting random resources and points them toward probe paths, dependency checks, application startup time, or resource pressure. It also maps cleanly to the CKA troubleshooting style, where you need to identify the broken link and correct it directly.
Part 4: Manual Endpoints and External Services
Section titled “Part 4: Manual Endpoints and External Services”4.1 When to Use Manual Endpoints
Section titled “4.1 When to Use Manual Endpoints”Services without a label selector do not get EndpointSlice objects created automatically; their EndpointSlices or legacy Endpoints must be created manually. This pattern is useful when Kubernetes should provide the stable Service name, policy surface, or DNS name, but the backend does not come from Pods selected in the same namespace. Common examples include a database on virtual machines, a service in another cluster, or a static appliance that the application should reach through a Kubernetes name.
Manual endpoints are powerful because they let workloads use normal Service DNS for resources that Kubernetes does not own. They are risky for the same reason: the control plane cannot infer when an external IP changes, whether the remote process is healthy, or whether a listed address is still safe. If you choose this design, the operational owner of the external resource also owns the automation or runbook that keeps the endpoint data current.
EndpointSlices managed outside the control plane should explicitly use the endpointslice.kubernetes.io/managed-by label, and each namespace endpoint slice must have a unique name. That label helps humans and controllers distinguish manually maintained discovery data from slices produced by the built-in controller. Without a clear managed-by signal, future operators can misread the object as automatic and waste time wondering why the controller does not fix it.
4.2 Creating Manual Endpoints
Section titled “4.2 Creating Manual Endpoints”The legacy manual Endpoints pattern requires the Service and Endpoints object to share the same name in the same namespace. The Service has no selector, so the endpoint controller leaves it alone, and the manually created object supplies the backend addresses. This is a good way to learn the model, even though new integrations should consider manual EndpointSlices when they need modern fields.
# Step 1: Create service WITHOUT selectorapiVersion: v1kind: Servicemetadata: name: external-dbspec: ports: - port: 5432 targetPort: 5432 # No selector! This is intentional.---# Step 2: Create Endpoints with same nameapiVersion: v1kind: Endpointsmetadata: name: external-db # Must match service name exactlysubsets:- addresses: - ip: 192.168.1.100 # External database IP - ip: 192.168.1.101 # Backup database IP ports: - port: 5432# Apply bothkubectl apply -f external-db.yaml
# Verifykubectl get svc,endpoints external-db
# Now pods can reach external DB via:# external-db.default.svc.cluster.local:5432This pattern works because clients only need the Service name and port. They do not care whether the backend IP came from a Pod selector or a human-maintained endpoint object. The danger appears later, when the external database moves to a new address and no Kubernetes controller updates the object. At that point the Service name still resolves, but traffic follows stale endpoint data until someone patches or reapplies the backend list.
4.3 Manual Endpoints Use Cases
Section titled “4.3 Manual Endpoints Use Cases”The same technique can expose a stable name for an external HTTPS API. In production, prefer a DNS-backed design when the provider gives you a stable hostname, because raw public IPs can change and may represent a load-balanced edge rather than a durable server. Use static endpoint IPs only when your network team can guarantee that those addresses are meant to be targets.
# Example: External API endpointapiVersion: v1kind: Servicemetadata: name: external-apispec: ports: - port: 443---apiVersion: v1kind: Endpointsmetadata: name: external-apisubsets:- addresses: - ip: 52.84.123.45 # External API server ports: - port: 443Exercise scenario: your application team wants payments-db.default.svc.cluster.local to point at a managed database outside the cluster. Which approach would you choose here and why: a selectorless Service with manual endpoint IPs, an ExternalName Service that points at a provider hostname, or moving the database behind an internal load balancer? The best answer depends on who owns address changes, whether clients need a port-only Service abstraction, and whether your network policy or DNS requirements favor one layer over another.
When you design manual endpoint configurations, document ownership as carefully as YAML. A selector-based Service has obvious ownership because the Deployment template controls the selected Pods, but a selectorless Service can become an orphaned name if no one owns the external lifecycle. A good runbook states where the backend address comes from, how to validate it, how to update it, and how to confirm clients are reaching the intended service.
Part 5: Headless Services
Section titled “Part 5: Headless Services”5.1 What Is a Headless Service?
Section titled “5.1 What Is a Headless Service?”A headless service explicitly has no ClusterIP. Instead of returning a single virtual IP, Kubernetes DNS can return individual backend records so that clients can connect directly to selected Pods. This is most useful when the client must know the identity of the backend rather than treating all replicas as interchangeable.
apiVersion: v1kind: Servicemetadata: name: headless-svcspec: clusterIP: None # This makes it headless selector: app: web ports: - port: 80The distinction is not “headless means no Service.” The Service still defines a name, selector, and ports; it simply opts out of the normal ClusterIP load-balancing abstraction. For StatefulSets, that is exactly what you want because pod-0, pod-1, and pod-2 may have different identities, storage, replication roles, or quorum responsibilities. For a stateless web frontend, it is often unnecessary because most clients should not care which replica answered.
What would happen if: You set
clusterIP: Noneon a Service. A pod doesnslookupon that service name. Instead of getting one IP, it gets three. Why is this useful, and when would you NOT want this behavior?
The useful part is direct discovery: a client library can see all available backends and choose according to its own protocol rules. The risky part is that you have shifted some load-balancing responsibility away from Kubernetes service proxying and toward the client. If the client caches DNS forever, chooses poorly, or cannot retry across returned addresses, a headless Service can expose more failure modes than a regular ClusterIP Service.
5.2 Headless Service Behavior
Section titled “5.2 Headless Service Behavior”For headless services, selector-based services create EndpointSlices natively in the control plane, while selectorless headless services do not generate them automatically. That difference is easy to miss because both objects have clusterIP: None, but one has a selector Kubernetes can reconcile and the other does not. If there is no selector, you must provide the discovery data yourself.
View original ASCII headless flow
┌────────────────────────────────────────────────────────────────┐│ Regular vs Headless Service ││ ││ Regular Service (ClusterIP: 10.96.45.123) ││ ┌─────────────────────────────────────────────────────────┐ ││ │ DNS: web-svc.default.svc → 10.96.45.123 (Service IP) │ ││ │ Client → Service IP → kube-proxy → random Pod │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ Headless Service (clusterIP: None) ││ ┌─────────────────────────────────────────────────────────┐ ││ │ DNS: web-svc.default.svc → │ ││ │ 10.244.1.5 (Pod 1) │ ││ │ 10.244.2.8 (Pod 2) │ ││ │ 10.244.1.12 (Pod 3) │ ││ │ Client gets ALL pod IPs, chooses one itself │ ││ └─────────────────────────────────────────────────────────┘ ││ │└────────────────────────────────────────────────────────────────┘flowchart TD subgraph Regular Service C1[Client] -->|DNS resolves to| IP[ClusterIP: 10.96.45.123] IP -->|kube-proxy load balances| P1[Pod 1 or Pod 2] end subgraph Headless Service C2[Client] -->|DNS resolves directly to A records| PIP[10.244.1.5 \n 10.244.2.8] PIP -->|Client manages connection directly| P2[Specific Pod] endThis behavior explains why headless Services pair naturally with StatefulSets. StatefulSet identity is stable, so DNS records such as kafka-0.kafka-headless.default.svc.cluster.local can resolve to a particular Pod identity rather than an arbitrary backend. That directness is valuable for databases, message brokers, and clustered systems that have leader, follower, shard, or replica identities.
5.3 Headless Service Use Cases
Section titled “5.3 Headless Service Use Cases”| Use Case | Why Headless? |
|---|---|
| StatefulSets | Need to address specific pods (pod-0, pod-1) |
| Client-side load balancing | Client needs all IPs to implement custom balancing |
| Service discovery | Discover all backend instances |
| Database clusters | Need direct connection to specific node |
Choose a headless Service when backend identity is part of the protocol, not just an implementation detail. If every replica is equivalent, a regular ClusterIP Service keeps client behavior simpler and lets Kubernetes handle proxy selection. If the client must choose a leader, connect to a shard owner, or list all peers, direct DNS records may be the right interface.
5.4 Headless Service Endpoints
Section titled “5.4 Headless Service Endpoints”Endpoints still matter for headless Services because DNS needs to know which backend records to return. A selector-based headless Service uses the same reconciliation chain as other selector Services, but the DNS response differs. You can inspect endpoints and then test DNS from inside the cluster to confirm whether clients will see individual Pod addresses.
# Endpoints still track podskubectl get endpoints headless-svc# NAME ENDPOINTS# headless-svc 10.244.1.5,10.244.2.8,10.244.1.12
# DNS returns multiple A recordskubectl run test --rm -i --image=busybox:1.36 --restart=Never -- \ nslookup headless-svc
# Output:# Name: headless-svc.default.svc.cluster.local# Address: 10.244.1.5# Address: 10.244.2.8# Address: 10.244.1.12DNS tests are most meaningful when run from inside the cluster because cluster DNS names and search paths are meant for Pods. If you test from your laptop, you may be querying a completely different resolver that knows nothing about svc.cluster.local. A CKA troubleshooting answer should be precise about vantage point: inspect Kubernetes objects with kubectl, then test in-cluster name resolution with a temporary Pod.
Part 6: Service Topology and Topology Hints
Section titled “Part 6: Service Topology and Topology Hints”6.1 Topology-Aware Routing
Section titled “6.1 Topology-Aware Routing”EndpointSlices natively support topology hints. This permits zone-aware routing, where service proxies can prefer endpoints in the same zone to reduce latency and cross-zone data transfer. The feature does not guarantee that all traffic always stays local, because Kubernetes must balance locality against availability, endpoint distribution, and Service configuration.
# EndpointSlice with hintsapiVersion: discovery.k8s.io/v1kind: EndpointSlicemetadata: name: web-svc-abc12endpoints:- addresses: - 10.244.1.5 zone: us-east-1a # Pod is in this zone hints: forZones: - name: us-east-1a # Prefer traffic from same zoneThe important word is “prefer.” A topology hint tells a proxy which endpoints are suitable for a zone-aware choice, but it is not a hard security boundary or a replacement for network policy. If a zone has no healthy endpoint, or if the distribution would make routing unsafe, Kubernetes can still send traffic elsewhere depending on configuration and implementation. Treat hints as an optimization signal, not as a correctness requirement.
6.2 Enabling Topology-Aware Hints
Section titled “6.2 Enabling Topology-Aware Hints”Kubernetes has evolved the naming around topology-aware routing, so be careful when reading older examples. You may see the older service.kubernetes.io/topology-mode: Auto annotation, while newer documentation also discusses trafficDistribution behavior on Services. In a Kubernetes 1.35 study context, the CKA-level skill is to recognize that EndpointSlices carry zone and hint data, and that service routing can use that data when the cluster and Service are configured for topology-aware behavior.
# Service with topology hintsapiVersion: v1kind: Servicemetadata: name: web-svc annotations: service.kubernetes.io/topology-mode: Auto # Enable hintsspec: selector: app: web ports: - port: 80Which approach would you choose here and why: same-zone preference for a latency-sensitive internal API, or ordinary cluster-wide balancing for a tiny service with one replica per region? Same-zone preference makes sense when you have enough healthy endpoints per zone and cross-zone traffic has measurable cost or latency. Ordinary balancing is simpler when replicas are sparse, traffic is low, or reliability would suffer if locality rules narrow the eligible backend set too much.
Patterns & Anti-Patterns
Section titled “Patterns & Anti-Patterns”Patterns
Section titled “Patterns”Use selector-based Services for normal Pod-backed workloads because they let Kubernetes own endpoint reconciliation. This pattern works best when Deployment labels are stable, readiness probes represent real serving ability, and the Service selector is intentionally narrow but not version-specific. It scales operationally because the controller updates EndpointSlices as Pods come and go, while your runbook stays focused on labels and readiness.
Use EndpointSlices as the primary diagnostic view for large or modern Services. The legacy Endpoints command can still be a quick first check, but it hides important condition and scale details. EndpointSlices let you inspect ready, serving, terminating, node, zone, and address family data in one place, which is exactly the information you need when traffic behaves differently across rollouts, zones, or IP families.
Use headless Services when backend identity is part of the protocol. StatefulSets, peer discovery systems, and client-side load balancers often need direct backend records instead of a single virtual IP. The pattern works when clients understand multiple DNS answers and can retry intelligently, but it is a poor fit for simple stateless clients that expect Kubernetes to hide backend choice.
Use selectorless Services only when an external lifecycle owns the backend. A manual endpoint can be clean and useful when a database, appliance, or other cluster cannot be selected as Pods in the namespace. The pattern needs explicit ownership because Kubernetes will not discover, health-check, or update those addresses automatically.
Anti-Patterns
Section titled “Anti-Patterns”Do not treat a Running Pod as proof that Service routing is healthy. Running only means the container exists and has not failed its liveness contract; readiness determines whether it should receive normal Service traffic. The better alternative is to compare Pod status, readiness, EndpointSlice conditions, and Service selector matches before changing any workload objects.
Do not build new automation that depends only on legacy Endpoints for large Services. Once a backend set crosses the documented legacy truncation behavior, older readers can observe an incomplete world and make bad routing or alerting decisions. The better alternative is to read EndpointSlices through the discovery API and use the Service label to gather all slices for the target Service.
Do not use a headless Service as a casual performance tweak. Direct DNS answers can make clients responsible for balancing, caching, retrying, and handling backend churn, and many simple clients do those things badly. The better alternative is to use regular ClusterIP Services unless backend identity is a genuine requirement of the application protocol.
Do not leave manual endpoints undocumented. A selectorless Service can look like a normal Kubernetes abstraction while secretly depending on a spreadsheet, a ticket, or a person remembering to update an IP address. The better alternative is to add clear labels, runbooks, and preferably automation that owns updates to manual EndpointSlices or Endpoints.
Decision Framework
Section titled “Decision Framework”Start with the workload source. If the backend is a normal set of Pods in the same namespace, use a selector-based Service and let Kubernetes create EndpointSlices automatically. If the backend is outside Kubernetes or outside the namespace selector model, use a selectorless Service and create the discovery data deliberately. If backend identity matters to clients, consider a headless Service; if identity does not matter, prefer ClusterIP because it keeps client behavior simpler.
Next evaluate scale and metadata needs. For a small Service, legacy Endpoints output may be enough for a quick glance, but EndpointSlices should still be your deeper diagnostic tool. For a large Service, dual-stack Service, topology-aware design, or rollout involving terminating endpoints, go directly to EndpointSlices because the old object can omit or flatten exactly the details you need. This distinction is especially important on the exam because the fastest command is not always the most complete explanation.
Finally evaluate who owns correctness after deployment. Selector-based Services have controller ownership, readiness-based admission, and automatic reconciliation. Manual endpoints need external ownership and a tested update process. Headless Services need clients that can handle multiple records and backend churn. Topology hints need enough endpoints per zone to preserve availability while improving locality. A good design answer names the owner of each moving part, not just the YAML field that enables it.
Use that framework during troubleshooting as well as design. If a Service has a selector, your first hypothesis should be reconciliation: labels, namespaces, Pod IPs, readiness, and EndpointSlice conditions. If a Service lacks a selector, your first hypothesis should be ownership: who created the endpoints, whether the names and labels connect them to the Service, and whether the external address is still valid. If a Service is headless, your first hypothesis should include client behavior because DNS may expose multiple backends instead of hiding selection behind a virtual IP.
Did You Know?
Section titled “Did You Know?”- EndpointSlices became stable in Kubernetes 1.21: The discovery API is not experimental in modern clusters; it is the scalable replacement for watching one giant legacy Endpoints object.
- Legacy Endpoints truncate over-capacity backends at 1,000 addresses: Kubernetes marks the object with
endpoints.kubernetes.io/over-capacity: truncated, which is a warning that legacy readers may see only part of the backend set. - EndpointSlices are normally split around 100 endpoints each: The controller creates additional slices as Services grow, reducing the update payload when one Pod changes.
- EndpointSlice conditions distinguish readiness from termination:
ready,serving, andterminatinglet service proxies reason about rollout edge cases that the legacy object cannot describe as clearly.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Wrong endpoint name | Manual Endpoints or EndpointSlices are created with a name that does not associate with the target Service. | Make the manual object name or kubernetes.io/service-name label match the Service name exactly, then verify with a label-filtered query. |
| Selector typo | A Deployment template label changes, but the Service selector still points at the old label set. | Compare kubectl get svc <name> -o yaml with kubectl get pods --show-labels, then patch the selector or labels intentionally. |
Missing targetRef | Manual endpoint authors focus only on IPs and lose traceability back to the producing object. | Include targetRef when an endpoint represents an in-cluster object, and label externally managed EndpointSlices with endpointslice.kubernetes.io/managed-by. |
Ignoring NotReadyAddresses | Engineers assume Running Pods are routable and overlook readiness probe failures. | Inspect NotReadyAddresses and EndpointSlice conditions, then debug the readiness probe path, dependency, or startup timing. |
| Confusing Endpoints and EndpointSlices | Older runbooks query only the legacy object and miss scale or condition details. | Use Endpoints for a quick compatibility glance, but treat EndpointSlices as the modern source of truth for large or nuanced Services. |
Missing managed-by label | Manually maintained slices look like controller-owned slices, confusing future operators. | Add a clear endpointslice.kubernetes.io/managed-by label and document the automation or team that owns updates. |
| Using legacy Endpoints for IPv6 planning | The old API lacks modern dual-stack and topology fields, so it becomes an incomplete design surface. | Use EndpointSlices when designing IPv4/IPv6 or topology-aware Service behavior. |
| Forgetting selectorless headless rules | A headless Service without a selector is expected to auto-discover backends, but no controller can infer them. | Create the EndpointSlice or Endpoints object manually, then test DNS from an in-cluster Pod. |
-
You deploy a new version of your app and
kubectl get endpoints my-svcsuddenly shows<none>, even thoughkubectl get podsshows 3 pods in Running state. The previous version worked fine. What is your debugging process?Answer
First, check whether the Pods are actually Ready, not merely Running, because readiness controls Service routing. Then compare the Service selector with the labels on the new Pods; a changed template label is a common reason the endpoint controller has no matching backends. If labels match, inspect readiness probe events with `kubectl describe pod` and confirm the target port still matches what the Service expects. This process diagnoses routing failures by following the Service-to-endpoint chain instead of guessing from application symptoms. -
Your company has a PostgreSQL database running on a VM at 192.168.1.50, outside the Kubernetes cluster. You want pods to reach it as
external-db.default.svc.cluster.local:5432. How do you set this up, and what happens if the database IP changes?Answer
Create a Service without a selector and provide matching manual discovery data, either as a legacy Endpoints object for the basic pattern or as an EndpointSlice when you need the modern API fields. The Service defines the stable name and port, while the endpoint object lists `192.168.1.50` as the backend address. If the database IP changes, Kubernetes will not update it automatically because no selector points to a Pod set. You need automation or a runbook that updates the endpoint data and verifies clients can still connect through the Service name. -
You run
kubectl describe endpoints my-svcand see 2 IPs underAddressesand 1 IP underNotReadyAddresses. A colleague says “just delete the not-ready pod to fix it.” Is this the right approach? What should you investigate first?Answer
Deleting the Pod may temporarily create a replacement, but it does not explain why Kubernetes excluded the backend from Service traffic. The Pod is alive enough to appear in discovery data, yet readiness says it should not receive normal requests. First inspect readiness probe events, application logs, the configured probe path and port, and any dependency the probe checks. Kubernetes is doing the protective thing here, so the fix should target the readiness failure rather than bypassing the signal. -
Your cluster has a Service backed by 3,000 pods. An SRE reports that every time a rolling update occurs, the API server’s memory spikes and kube-proxy takes 10+ seconds to update rules. What is causing this, and what Kubernetes feature addresses it?
Answer
A single legacy Endpoints object is a poor fit for thousands of backend addresses because any small change can require watchers to process the whole object again. EndpointSlices address this by splitting the backend set into smaller discovery objects that can be updated independently. With slices around 100 endpoints by default, one Pod change affects a much smaller payload. The practical fix is to use EndpointSlice-aware tooling and controllers rather than building new behavior around the legacy Endpoints object. -
You are deploying a StatefulSet for a Kafka cluster where each broker needs to be individually addressable. A regular ClusterIP Service gives you a single virtual IP. How do you configure DNS so that producers can connect to
kafka-0,kafka-1, andkafka-2individually?Answer
Create a headless Service by setting `clusterIP: None`, and set that Service as the StatefulSet `serviceName`. Kubernetes DNS can then publish records tied to individual StatefulSet Pod identities, such as `kafka-0.kafka-headless.default.svc.cluster.local`. This implements direct pod addressing for a protocol that needs stable broker identities. A regular ClusterIP Service would hide the backend choice, which is useful for stateless applications but wrong when each backend has a distinct role. -
You are reviewing the YAML of an EndpointSlice for a legacy application. Under one of the endpoints, the
readyandservingconditions are missing entirely (nil). A junior engineer worries that kube-proxy will drop traffic to this endpoint because it is not explicitly marked ready. Are they correct, and why?Answer
They are not correct for pod-backed endpoints. Kubernetes interprets nil `ready` or `serving` values as true for compatibility, while nil `terminating` is interpreted as false. That does not mean missing conditions are ideal for clarity, but it does mean the endpoint is not automatically dropped only because those fields are absent. You should still evaluate the broader EndpointSlice, Pod readiness, and rollout state before concluding the proxy behavior is wrong. -
During a traffic spike, your application auto-scales to 1,200 pods. Metrics show that 200 of the newest pods receive zero traffic while the older 1,000 pods are overloaded. You inspect the
Endpointsobject and notice anendpoints.kubernetes.io/over-capacity: truncatedannotation. What is the architectural root cause, and how do you route traffic to the stranded pods?Answer
The root cause is reliance on the legacy Endpoints API after the backend set exceeded its documented truncation behavior. The annotation tells you the old object is not representing every backend, so any component reading only that object can make incomplete routing decisions. EndpointSlices are designed to represent the full backend set in smaller objects, so use EndpointSlice-aware service routing and diagnostics. The stranded pods are not necessarily unhealthy; they are invisible to legacy readers that cannot represent the whole set. -
During an aggressive rolling update of a critical microservice, you observe a brief window where all available pods are terminating. To your surprise,
kube-proxyis still routing active traffic to these pods instead of dropping it immediately. The endpoints show bothserving: trueandterminating: true. Is this a bug in the proxy or intended behavior?Answer
This can be intended fallback behavior. Service proxies normally avoid terminating endpoints, but if all available endpoints are terminating, a proxy may continue routing to endpoints that are still serving to avoid an immediate total drop. The real fix is not to depend on that edge case; improve rollout settings so replacement Pods become Ready before every old endpoint is terminating. The EndpointSlice conditions help you distinguish a controlled fallback from a mysterious proxy bug.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Debug a service with endpoint issues by creating a selector mismatch, proving why the endpoint set is empty, repairing the Service, and then comparing normal EndpointSlice-backed routing with headless DNS behavior.
Steps: Work through the sequence in order and pause after each observation command to explain what the output proves. The goal is not just to make the final Service work; it is to build a repeatable diagnostic path from Service selector to Pod labels to endpoint objects.
- Create a deployment:
kubectl create deployment web --image=nginx --replicas=3kubectl rollout status deployment web --timeout=60s- Create a service with wrong selector:
cat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: broken-servicespec: selector: app: webapp # Wrong! Should be "web" ports: - port: 80EOF- Observe the problem:
kubectl get endpoints broken-service# Shows: <none>- Debug the issue:
# Check what selector the service haskubectl get svc broken-service -o yaml | grep -A2 selector
# Check what labels the pods havekubectl get pods --show-labels
# Find the mismatch!- Fix the service:
kubectl delete svc broken-servicekubectl expose deployment web --port=80 --name=broken-service- Verify endpoints exist:
kubectl get endpoints broken-service# Should show 3 pod IPs- Check EndpointSlices too:
kubectl get endpointslices -l kubernetes.io/service-name=broken-service- Test with a headless service:
cat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: headless-webspec: clusterIP: None selector: app: web ports: - port: 80EOF
# Check DNS returns multiple IPskubectl run test --rm -i --image=busybox:1.36 --restart=Never -- \ nslookup headless-web- Cleanup:
kubectl delete deployment webkubectl delete svc broken-service headless-webSuccess Criteria: Mark each item only when you can explain the evidence behind it. For example, “endpoints exist” should mean you can name the command that showed them, the selector that produced them, and the Pod labels that made the match valid.
- Can identify missing endpoints.
- Can debug selector mismatches confidently.
- Understand endpoints vs endpointslices outputs.
- Can manually create headless services.
- Understand DNS behavior differences for
clusterIP: None.
The exercise begins with a selector mismatch because that is the cleanest way to see the discovery chain fail without introducing application-specific noise. Once you repair the selector by recreating the Service from the Deployment, the endpoint objects should populate and EndpointSlices should appear under the Service label. The headless step then proves that the same backend set can be exposed through a different DNS behavior without changing the Pods.
Drill 1: Endpoint Inspection (Target: 2 minutes)
Section titled “Drill 1: Endpoint Inspection (Target: 2 minutes)”# Setupkubectl create deployment drill --image=nginx --replicas=2kubectl rollout status deployment drill --timeout=60skubectl expose deployment drill --port=80
# Check endpointskubectl get endpoints drill
# Get detailed endpoint infokubectl describe endpoints drill
# Get as YAML (see pod IPs)kubectl get endpoints drill -o yaml
# Check EndpointSliceskubectl get endpointslices -l kubernetes.io/service-name=drill
# Cleanupkubectl delete deployment drillkubectl delete svc drillDrill 2: Debug Missing Endpoints (Target: 3 minutes)
Section titled “Drill 2: Debug Missing Endpoints (Target: 3 minutes)”# Create deploymentkubectl create deployment debug-app --image=nginxkubectl rollout status deployment debug-app --timeout=60s
# Create service with typo in selectorcat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: debug-svcspec: selector: app: debug-apps # Typo: extra 's' ports: - port: 80EOF
# Observe problemkubectl get endpoints debug-svc# <none>
# Debugkubectl get pods --show-labelskubectl get svc debug-svc -o jsonpath='{.spec.selector}'
# Fixkubectl delete svc debug-svckubectl expose deployment debug-app --port=80 --name=debug-svc
# Verifykubectl get endpoints debug-svc
# Cleanupkubectl delete deployment debug-appkubectl delete svc debug-svcDrill 3: Manual Endpoints (Target: 4 minutes)
Section titled “Drill 3: Manual Endpoints (Target: 4 minutes)”# Create service without selectorcat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: external-svcspec: ports: - port: 80EOF
# Check - no endpoints yetkubectl get endpoints external-svc
# Create manual endpointscat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Endpointsmetadata: name: external-svcsubsets:- addresses: - ip: 1.2.3.4 - ip: 5.6.7.8 ports: - port: 80EOF
# Verify endpointskubectl get endpoints external-svckubectl describe endpoints external-svc
# Cleanupkubectl delete svc external-svckubectl delete endpoints external-svcDrill 4: Headless Service (Target: 3 minutes)
Section titled “Drill 4: Headless Service (Target: 3 minutes)”# Create deploymentkubectl create deployment headless-test --image=nginx --replicas=3kubectl rollout status deployment headless-test --timeout=60s
# Create headless servicecat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: headlessspec: clusterIP: None selector: app: headless-test ports: - port: 80EOF
# Verify no ClusterIPkubectl get svc headless# CLUSTER-IP should be "None"
# Check endpoints (still exist!)kubectl get endpoints headless
# Test DNS - should return multiple IPskubectl run test --rm -i --image=busybox:1.36 --restart=Never -- \ nslookup headless
# Cleanupkubectl delete deployment headless-testkubectl delete svc headlessDrill 5: EndpointSlice Analysis (Target: 3 minutes)
Section titled “Drill 5: EndpointSlice Analysis (Target: 3 minutes)”# Create deploymentkubectl create deployment slice-test --image=nginx --replicas=3kubectl rollout status deployment slice-test --timeout=60skubectl expose deployment slice-test --port=80
# Get EndpointSlice namekubectl get endpointslices -l kubernetes.io/service-name=slice-test
# Describe itSLICE_NAME=$(kubectl get endpointslices -l kubernetes.io/service-name=slice-test -o jsonpath='{.items[0].metadata.name}')kubectl describe endpointslice $SLICE_NAME
# Get YAMLkubectl get endpointslice $SLICE_NAME -o yaml
# Note the endpoints array with conditions
# Cleanupkubectl delete deployment slice-testkubectl delete svc slice-testDrill 6: Readiness and Endpoints (Target: 4 minutes)
Section titled “Drill 6: Readiness and Endpoints (Target: 4 minutes)”# Create pod with failing readiness probecat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Podmetadata: name: unready-pod labels: app: unreadyspec: containers: - name: nginx image: nginx readinessProbe: httpGet: path: /nonexistent port: 80 initialDelaySeconds: 1 periodSeconds: 2EOF
# Create servicekubectl expose pod unready-pod --port=80 --name=unready-svc
# Wait for the pod to be scheduled and probe to failkubectl wait --for=condition=ready pod/unready-pod --timeout=10s || truekubectl get endpoints unready-svc# Should be <none> or empty!
# Check whykubectl describe endpoints unready-svc# Look for notReadyAddresses
# Check pod statuskubectl get pod unready-pod# Not ready due to probe
# Cleanupkubectl delete pod unready-podkubectl delete svc unready-svcDrill 7: Scale and Watch Endpoints (Target: 3 minutes)
Section titled “Drill 7: Scale and Watch Endpoints (Target: 3 minutes)”# Create deploymentkubectl create deployment watch-test --image=nginx --replicas=1kubectl rollout status deployment watch-test --timeout=60skubectl expose deployment watch-test --port=80
# Watch endpoints in terminal 1 (or background)kubectl get endpoints watch-test -w &
# Scale up and observe endpoints changekubectl scale deployment watch-test --replicas=5kubectl rollout status deployment watch-test --timeout=60s
# Scale downkubectl scale deployment watch-test --replicas=2kubectl rollout status deployment watch-test --timeout=60s
# Stop the background watch processkill $!
# Cleanupkubectl delete deployment watch-testkubectl delete svc watch-testDrill 8: Challenge - Complete Endpoint Workflow
Section titled “Drill 8: Challenge - Complete Endpoint Workflow”Without looking at solutions:
- Create deployment
ep-challengewith 3 replicas of nginx - Create a service that intentionally has wrong selector
- Diagnose why endpoints are empty
- Fix the service
- Create a headless service for same deployment
- Verify DNS returns 3 IPs for headless service
- Create manual endpoints for IP 10.0.0.1
- Cleanup everything
# YOUR TASK: Complete in under 6 minutesSolution
# 1. Create deploymentkubectl create deployment ep-challenge --image=nginx --replicas=3kubectl rollout status deployment ep-challenge --timeout=60s
# 2. Create service with wrong selectorcat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: wrong-svcspec: selector: app: wrong ports: - port: 80EOF
# 3. Diagnosekubectl get endpoints wrong-svc# <none>kubectl get pods --show-labels# Labels show app=ep-challenge, not app=wrong
# 4. Fixkubectl delete svc wrong-svckubectl expose deployment ep-challenge --port=80 --name=fixed-svckubectl get endpoints fixed-svc# Shows 3 IPs
# 5. Create headless servicecat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: headless-challengespec: clusterIP: None selector: app: ep-challenge ports: - port: 80EOF
# 6. Verify DNSkubectl run test --rm -i --image=busybox:1.36 --restart=Never -- \ nslookup headless-challenge# Should show 3 IPs
# 7. Manual endpointscat << 'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: name: manual-svcspec: ports: - port: 80---apiVersion: v1kind: Endpointsmetadata: name: manual-svcsubsets:- addresses: - ip: 10.0.0.1 ports: - port: 80EOFkubectl get endpoints manual-svc
# 8. Cleanupkubectl delete deployment ep-challengekubectl delete svc fixed-svc headless-challenge manual-svckubectl delete endpoints manual-svcThese drills are intentionally repetitive because endpoint troubleshooting is a chain-of-evidence skill. By the end, you should be able to explain which object owns the stable name, which object owns the backend list, which field excludes an unhealthy backend, and which command proves your claim. That is the difference between memorizing a Service definition and diagnosing Service routing under exam pressure.
If one of the drills behaves differently in your cluster, treat that as useful signal rather than noise. Resource names may collide with leftovers from a previous attempt, DNS Pods may take a few seconds to answer, and readiness probes may need one or two probe periods before the endpoint view changes. The disciplined response is to re-run the relevant inspection command, identify whether the object state or timing explains the difference, and clean up only the resources created by the drill.
Sources
Section titled “Sources”- EndpointSlices — Primary reference for slice sizing, conditions, duplication behavior, and controller-managed EndpointSlices.
- Service — Covers selectorless Services, headless Services, deprecated Endpoints behavior, and over-capacity truncation.
- DNS for Services and Pods — Explains how normal and headless Services resolve in cluster DNS, including multiple A/AAAA responses.
- Topology Aware Routing — Explains zone-aware routing behavior and EndpointSlice hints.
- Service Internal Traffic Policy — Covers how Service routing policy can narrow eligible endpoints.
- EndpointSlice API reference — Defines EndpointSlice fields, conditions, address types, ports, and topology fields.
- Endpoints API reference — Defines the legacy Endpoints object shape used in compatibility examples.
- Debug Services — Kubernetes task guide for tracing Service selector and endpoint failures.
- StatefulSet Basics — Demonstrates stable network identity and headless Service usage for StatefulSets.
- Virtual IPs and Service Proxies — Explains how Kubernetes service proxying uses Service and endpoint data.
Next Module
Section titled “Next Module”Module 3.3: DNS & CoreDNS - Deep-dive into Kubernetes DNS, service discovery, and resolving complex naming constraints.