Skip to content

Module 5.5: EKS Production: Scaling, Observability & Cost

Complexity: [COMPLEX] | Time to Complete: 3h | Prerequisites: Module 5.1 (EKS Architecture), Module 5.2 (EKS Networking)

After completing this module, you will be able to:

  • Configure Karpenter for intelligent, constraint-based node autoscaling that optimizes cost and bin-packing on EKS
  • Implement EKS observability with CloudWatch Container Insights, AWS Distro for OpenTelemetry, and Prometheus
  • Deploy cost optimization strategies combining Spot instances, Savings Plans, and right-sizing for EKS workloads
  • Design production EKS upgrade runbooks that safely roll clusters through Kubernetes version upgrades

In September 2023, a video streaming company running on EKS launched a new series that went viral. Their backend scaled from 200 pods to 1,800 pods in under 20 minutes. The Horizontal Pod Autoscaler did its job — it created the pods. But the Cluster Autoscaler did not keep up. It took 8 minutes to detect the need for new nodes, 3 minutes to provision them through the Auto Scaling Group, and another 2 minutes for the nodes to join the cluster and become Ready. By the time capacity caught up, users had experienced 13 minutes of degraded service: buffering, failed playback starts, and error pages. Their social media mentions during that window were overwhelmingly negative.

The next quarter, they replaced Cluster Autoscaler with Karpenter. During a similar traffic spike, Karpenter detected the unschedulable pods within seconds, calculated the optimal instance types, and launched nodes directly through the EC2 Fleet API. New capacity was available in under 90 seconds. But scaling is only one pillar of production readiness. Without observability, you cannot diagnose what went wrong during the 13-minute degradation. Without cost management, that 1,800-pod spike on On-Demand instances cost $12,000 for a single day — when Spot instances would have handled the stateless video encoding pods at 70% less.

In this module, you will master Karpenter for intelligent, fast node provisioning. You will learn how to orchestrate Spot instances safely. You will configure control plane logging, Container Insights, and Prometheus-based monitoring. And you will implement cost allocation with Kubecost and OpenCost to finally answer the question: “How much does each team actually spend on Kubernetes?”


Karpenter: Next-Generation Node Provisioning

Section titled “Karpenter: Next-Generation Node Provisioning”

Karpenter is an open-source, high-performance Kubernetes node provisioner built by AWS. It replaces the Cluster Autoscaler with a fundamentally different approach: instead of scaling existing Auto Scaling Groups, Karpenter provisions individual nodes by directly calling the EC2 Fleet API based on the exact requirements of pending pods.

Cluster Autoscaler Workflow:
Pod Pending → CA checks ASG configs → Selects ASG → Increments desired capacity
→ ASG launches instance → Node joins cluster → Pod scheduled
Time: 3-10 minutes
Karpenter Workflow:
Pod Pending → Karpenter evaluates pod requirements → Calls EC2 Fleet API directly
→ Instance launches → Node joins cluster → Pod scheduled
Time: 30-90 seconds
FeatureCluster AutoscalerKarpenter
Provisioning speed3-10 minutes30-90 seconds
Instance selectionFixed per ASG (you pre-define)Dynamic (evaluates pod needs per launch)
Bin packingLimited (works within ASG constraints)Optimized (selects exact instance type for pending pods)
Scale-downScans for underutilized nodes periodicallyContinuous consolidation (TTL, empty node, and underutilization)
Spot handlingVia ASG mixed instances policyNative Spot support with automatic fallback
Node groupsRequired (one per instance type mix)Not required (NodePools define constraints, not specific groups)
MaintenanceASG + Launch Template managementNodePool + EC2NodeClass CRDs
Terminal window
# Install Karpenter using Helm
# Install Karpenter v1.x from OCI registry (charts.karpenter.sh is deprecated)
# Karpenter needs specific IAM roles and instance profiles
# (Simplified here -- see Karpenter docs for full IAM setup)
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
--namespace kube-system \
--set settings.clusterName=my-cluster \
--set clusterEndpoint=$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text) \
--set settings.isolatedVPC=false \
--version 1.1.0 \
--wait

A NodePool defines the constraints and preferences for the nodes Karpenter creates. Think of it as a menu of acceptable instance types, AZs, and architectures.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-purpose
spec:
template:
metadata:
labels:
team: platform
workload-type: general
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"] # General, compute, memory
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"] # Only 6th gen and newer
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["large", "xlarge", "2xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
expireAfter: 720h # Force node rotation every 30 days
limits:
cpu: "1000" # Max 1000 vCPUs across all nodes
memory: 4000Gi # Max 4 TiB memory
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
weight: 50 # Priority vs other NodePools

The EC2NodeClass defines the AWS-specific configuration for nodes: AMI, subnets, security groups, instance profile, and user data.

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
role: KarpenterNodeRole
amiSelectorTerms:
- alias: al2023@latest # Amazon Linux 2023, auto-updated
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1 # IMDSv2 enforcement
httpTokens: required # Require IMDSv2
tags:
Environment: production
ManagedBy: karpenter

When pods are pending, Karpenter evaluates their resource requests against the NodePool constraints and selects the cheapest instance type that fits. This is called “bin packing.”

Pending Pods:
Pod A: requests 2 CPU, 4Gi memory
Pod B: requests 1 CPU, 8Gi memory
Pod C: requests 4 CPU, 4Gi memory
Karpenter evaluates:
Option 1: 3x m6i.large (2 CPU, 8Gi each) = $0.288/hr → 1 pod per node
Option 2: 1x m6i.2xlarge (8 CPU, 32Gi) = $0.192/hr → all 3 pods on 1 node
Option 3: 1x c6i.2xlarge (8 CPU, 16Gi) = $0.170/hr → all 3 pods, tighter fit
Karpenter selects Option 3 (cheapest that satisfies all pod requirements)

Karpenter continuously evaluates whether the current node fleet is optimal. It consolidates by:

  1. Empty node removal: If a node has no non-DaemonSet pods, remove it after consolidateAfter
  2. Underutilization: If pods on a node could fit on other existing nodes, drain and terminate it
  3. Expiration: Force node rotation after expireAfter (ensures AMI freshness)
Before Consolidation: After Consolidation:
┌───────────────┐ ┌───────────────┐
│ Node A (25%) │ │ Node A (75%) │
│ ▓░░░░░░░░░░░ │ ──────────────► │ ▓▓▓▓▓▓▓▓░░░░ │
├───────────────┤ └───────────────┘
│ Node B (50%) │
│ ▓▓▓▓▓▓░░░░░░ │ (Node B drained, Node C terminated)
├───────────────┤
│ Node C (empty)│
│ ░░░░░░░░░░░░ │
└───────────────┘
3 nodes → 1 node, same pods served

Spot instances cost 60-90% less than On-Demand but can be reclaimed by AWS with a two-minute warning. Karpenter makes Spot orchestration significantly simpler than managing it through ASGs.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-batch
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["xlarge", "2xlarge", "4xlarge"]
# Diversify across many types to reduce interruption risk
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
taints:
- key: spot
value: "true"
effect: NoSchedule
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
  1. Diversify instance types: The more instance types you allow, the more Spot capacity pools Karpenter can draw from, reducing interruption probability
  2. Use taints for Spot nodes: Only schedule Spot-tolerant workloads on Spot nodes
  3. Handle interruptions gracefully: Use PodDisruptionBudgets and termination handlers
# PodDisruptionBudget for Spot workloads
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: batch-processor-pdb
namespace: batch
spec:
minAvailable: "50%"
selector:
matchLabels:
app: batch-processor
---
# Deployment tolerating Spot
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
namespace: batch
spec:
replicas: 10
selector:
matchLabels:
app: batch-processor
template:
metadata:
labels:
app: batch-processor
spec:
tolerations:
- key: spot
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
karpenter.sh/capacity-type: spot
terminationGracePeriodSeconds: 120
containers:
- name: processor
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/batch-processor:latest
resources:
requests:
cpu: "2"
memory: 4Gi
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5 && /app/graceful-shutdown"]

For critical workloads that must keep running even if Spot is unavailable, use a two-NodePool strategy:

# Primary: Spot (cheap)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: compute-spot
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
weight: 100 # Higher weight = preferred
---
# Fallback: On-Demand (reliable)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: compute-ondemand
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
weight: 10 # Lower weight = fallback only

Karpenter tries the higher-weight NodePool first. If Spot capacity is unavailable (InsufficientInstanceCapacity), it falls back to On-Demand automatically.

While Spot instances offer the deepest discounts (up to 90%), they are not suitable for stateful, singleton, or latency-sensitive workloads that cannot tolerate interruptions. For these On-Demand workloads, AWS Compute Savings Plans provide the next best optimization.

By committing to a consistent amount of compute usage (e.g., $10/hour) for a 1- or 3-year term, you can reduce On-Demand costs by up to 50%. Savings Plans apply automatically across instance families, sizes, and regions. This flexibility is critical for Kubernetes clusters using Karpenter, as the specific instance types provisioned will constantly change based on dynamic bin-packing decisions.

Stop and think: If Karpenter provisions a Spot instance for your workload and AWS reclaims it with a two-minute warning, how does your application ensure zero downtime? (Hint: Think about PodDisruptionBudgets, replicas, and the pod lifecycle.)


EKS control plane logs provide visibility into API server activity, authentication events, and scheduler decisions. These logs are essential for security auditing and troubleshooting.

Terminal window
aws eks update-cluster-config --name my-cluster \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
Log TypeWhat It ContainsWhen to Use
apiAPI server request/response logsDebugging API errors, rate limiting
auditWho did what, when (all API calls)Security compliance, forensics
authenticatorAuthentication decisions (IAM → K8s RBAC)Troubleshooting access denied errors
controllerManagerController loops (ReplicaSet, Deployment)Why pods are not being created
schedulerScheduling decisions and failuresWhy pods are Pending

Logs are delivered to CloudWatch Logs under the group /aws/eks/my-cluster/cluster. They can be queried using CloudWatch Logs Insights:

-- Find all failed API calls in the last hour
fields @timestamp, verb, requestURI, responseStatus.code, user.username
| filter responseStatus.code >= 400
| sort @timestamp desc
| limit 50
-- Find who deleted a specific pod
fields @timestamp, verb, requestURI, user.username, sourceIPs.0
| filter verb = "delete" and requestURI like "/api/v1/namespaces/production/pods/"
| sort @timestamp desc
| limit 20

Cost warning: Control plane logs can be expensive at scale. An active cluster generating 50 GB/day of audit logs costs ~$25/day in CloudWatch ingestion alone. Consider enabling only audit and authenticator by default, adding the others temporarily for debugging.


Amazon CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from containerized workloads. It provides dashboards for cluster, node, pod, and container-level metrics.

Terminal window
# Install via the EKS add-on
aws eks create-addon \
--cluster-name my-cluster \
--addon-name amazon-cloudwatch-observability \
--service-account-role-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/CloudWatchObservabilityRole

Container Insights deploys a CloudWatch Agent DaemonSet that collects:

  • Cluster metrics: Node count, pod count, CPU/memory utilization
  • Node metrics: CPU, memory, disk, network per node
  • Pod metrics: CPU, memory, network per pod
  • Container metrics: Resource usage per container within a pod
MetricNamespaceWhat to Alert On
node_cpu_utilizationContainerInsights> 80% sustained
node_memory_utilizationContainerInsights> 85% sustained
pod_cpu_utilization_over_pod_limitContainerInsights> 90% (throttling imminent)
node_filesystem_utilizationContainerInsights> 80% (disk pressure)
cluster_failed_node_countContainerInsights> 0
pod_status_phase (Pending)ContainerInsights> 0 for > 5 min

Container Insights sends custom metrics to CloudWatch, which charges $0.30 per metric per month for the first 10,000 metrics. A 50-node cluster with 500 pods easily generates 5,000+ metrics. Estimate $1,500-3,000/month for Container Insights on a medium-sized cluster. For cost-sensitive environments, consider Prometheus + Grafana as an alternative (covered next).


For teams that want full control over their metrics pipeline and lower costs at scale, self-managed Prometheus (or Amazon Managed Prometheus) is the standard choice.

AMP is a serverless, Prometheus-compatible monitoring service. You do not manage servers, storage, or scaling.

Terminal window
# Create a workspace
WORKSPACE_ID=$(aws amp create-workspace \
--alias eks-production \
--query 'workspaceId' --output text)
echo "Workspace: $WORKSPACE_ID"
echo "Endpoint: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID"

Deploying Prometheus to Scrape EKS Metrics

Section titled “Deploying Prometheus to Scrape EKS Metrics”
Terminal window
# Install the Prometheus stack using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# For self-managed Prometheus + Grafana:
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.resources.requests.cpu=500m \
--set prometheus.prometheusSpec.resources.requests.memory=2Gi \
--set grafana.enabled=true \
--set grafana.adminPassword=DojoGrafana2024
# For remote-write to Amazon Managed Prometheus:
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.remoteWrite[0].url="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write" \
--set prometheus.prometheusSpec.remoteWrite[0].sigv4.region=us-east-1
# CPU usage by namespace (percentage of requests)
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace)
/ sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)
* 100
# Memory usage vs limits (OOM risk indicator)
sum(container_memory_working_set_bytes{container!=""}) by (namespace, pod)
/ sum(kube_pod_container_resource_limits{resource="memory"}) by (namespace, pod)
* 100
# Pod restart rate (signals instability)
sum(rate(kube_pod_container_status_restarts_total[1h])) by (namespace, pod) > 0
# Node not Ready duration
sum(kube_node_status_condition{condition="Ready", status="true"} == 0) by (node)
# Karpenter provisioning latency (p99)
histogram_quantile(0.99, sum(rate(karpenter_provisioner_scheduling_duration_seconds_bucket[5m])) by (le))

Pause and predict: You notice that kube_pod_container_status_restarts_total is rapidly increasing for your core API namespace, but node_cpu_utilization is completely normal. What might be causing the pods to restart if it isn’t node-level resource starvation? (Hint: Think about memory limits, liveness probes, or application-level crashes.)


Cost Allocation with Kubecost and OpenCost

Section titled “Cost Allocation with Kubecost and OpenCost”

Kubernetes makes it remarkably difficult to answer “how much does team X’s workload cost?” Resources are shared across nodes, and a single node might run pods from five different teams. Kubecost and OpenCost solve this by allocating infrastructure costs to individual namespaces, deployments, and labels.

OpenCost is the CNCF-backed open-source standard for Kubernetes cost monitoring.

Terminal window
# Install OpenCost
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.exporter.defaultClusterId=my-cluster \
--set opencost.exporter.aws.spot_data_region=us-east-1 \
--set opencost.exporter.aws.spot_data_bucket=my-spot-pricing-bucket \
--set opencost.ui.enabled=true

Kubecost provides a richer UI and recommendations on top of the OpenCost engine.

Terminal window
helm repo add kubecost https://kubecost.github.io/cost-analyzer
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="" \
--set prometheus.server.global.external_labels.cluster_id=my-cluster

The key to effective cost allocation is consistent labeling:

# Enforce cost-tracking labels on all pods
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: payments
labels:
app: payment-service
team: payments-team # Who owns this?
environment: production # Prod vs staging vs dev
cost-center: cc-1234 # Finance tracking code
spec:
template:
metadata:
labels:
app: payment-service
team: payments-team
environment: production
cost-center: cc-1234

Enforce labels using admission controllers:

# Kyverno policy to require cost labels
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-labels
spec:
validationFailureAction: Enforce
rules:
- name: check-cost-labels
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
- DaemonSet
validate:
message: "All workloads must have 'team' and 'cost-center' labels."
pattern:
metadata:
labels:
team: "?*"
cost-center: "?*"
Monthly Cost Breakdown (Example):
┌──────────────────────────────────────────────────────────────┐
│ Team │ Namespace │ CPU Cost │ Mem Cost │ Total │
├──────────────────────────────────────────────────────────────┤
│ payments │ payments │ $1,240 │ $890 │ $2,130 │
│ search │ search │ $3,100 │ $2,450 │ $5,550 │
│ data-pipeline │ batch │ $890 │ $1,200 │ $2,090 │
│ platform │ monitoring │ $420 │ $680 │ $1,100 │
│ (idle) │ (unallocated)│ $1,800 │ $2,300 │ $4,100 │
├──────────────────────────────────────────────────────────────┤
│ TOTAL CLUSTER │ $14,970 │
└──────────────────────────────────────────────────────────────┘
Key insight: $4,100 (27%) is idle/unallocated — this is your optimization target.

The “idle” cost represents resources (CPU, memory) that are provisioned but not requested by any pod. Reducing idle cost through better bin packing (Karpenter consolidation), right-sizing pod requests, and Spot adoption is the highest-leverage cost optimization.


Kubernetes releases minor versions three times a year, and AWS supports each EKS version for 14 months (with extended support available at an additional cost). Falling behind means risking security vulnerabilities and losing community support. A production upgrade runbook ensures safe, zero-downtime cluster upgrades.

  1. Pre-flight API Checks: Verify that no deprecated or removed Kubernetes APIs are in use by your manifests or Helm charts (using open-source tools like pluto or kubent). Update your CI/CD pipelines to deploy the new API versions.
  2. Control Plane Upgrade: Trigger the EKS control plane upgrade via the AWS API, CLI, or Terraform. This process takes 10-20 minutes. AWS manages a highly available control plane, so API requests may experience slight latency, but running pods are unaffected.
  3. Core Add-on Upgrades: Update the Amazon VPC CNI, CoreDNS, and kube-proxy add-ons to the specific versions recommended for your new Kubernetes control plane version.
  4. Data Plane (Node) Rotation:
    • With Karpenter: If using an AMI alias like al2023@latest in your EC2NodeClass, Karpenter automatically discovers the latest EKS Optimized AMI for the new control plane version. You can trigger rotation by setting expireAfter: 1m temporarily or using Karpenter’s node drift feature, which automatically replaces nodes when the underlying AMI changes.
    • With Managed Node Groups: Trigger a rolling update. EKS will automatically cordon, drain, and replace nodes one by one.

Stop and think: Why must you upgrade the EKS control plane before upgrading the worker nodes? (Hint: The Kubernetes version skew policy dictates compatibility rules between the kube-apiserver and the kubelet running on nodes.)


  1. Karpenter evaluates over 600 EC2 instance types when deciding what to launch. For each set of pending pods, it simulates packing them onto every compatible instance type, calculates the cost, and selects the cheapest option. This evaluation happens in milliseconds thanks to an in-memory instance type database that Karpenter refreshes from the EC2 pricing API every 6 hours.

  2. EKS control plane audit logs record every single API request, including the request body and response. For a cluster with 500 pods and active HPA/VPA controllers, the audit log volume can reach 10-15 GB per day. At CloudWatch’s $0.50/GB ingestion rate, that is $5-7.50/day or $150-225/month just for audit log storage. Many teams filter audit logs to only capture write operations and authentication events, reducing volume by 80%.

  3. Spot instance interruption rates vary dramatically by instance type and region. In 2024 data, the m5.xlarge in us-east-1 had a roughly 5% monthly interruption rate, while the m6i.xlarge in the same region was under 3%. By diversifying across 15-20 instance types and 3 AZs, you can achieve an effective interruption rate below 2% per node per month, making Spot viable even for user-facing services behind proper redundancy.

  4. Kubecost analysis across thousands of Kubernetes clusters found that the average cluster utilization (actual CPU/memory usage vs provisioned) is only 20-35%. This means 65-80% of infrastructure spend is wasted on idle resources. The three highest-impact optimizations are: (1) right-sizing pod resource requests using VPA recommendations, (2) enabling Karpenter consolidation to remove underutilized nodes, and (3) using Spot instances for stateless and fault-tolerant workloads.


MistakeWhy It HappensHow to Fix It
Running Cluster Autoscaler and Karpenter simultaneouslyMigrating incrementally but forgetting to remove the old autoscaler. Both fight over node lifecycle.Remove Cluster Autoscaler completely before deploying Karpenter. They are mutually exclusive.
Karpenter NodePool with only one instance typeOver-constraining instance requirements, defeating Karpenter’s instance flexibility advantage.Allow at least 10-15 instance types across multiple families (m, c, r) and generations. More diversity means better availability and pricing.
No expireAfter on NodePoolNodes run indefinitely with stale AMIs and potential security vulnerabilities.Set expireAfter: 720h (30 days) to force rotation. Karpenter gracefully drains nodes before termination.
Spot for stateful workloads without replicationAssuming Spot interruption will not happen. A database on a Spot node loses its EBS volume attachment during interruption.Use On-Demand for databases and single-point-of-failure workloads. Spot is for stateless, redundant, or batch workloads only.
Enabling all 5 control plane log types permanently”We might need them someday.” Meanwhile paying $200/month for logs nobody reads.Enable audit and authenticator always. Enable api, controllerManager, and scheduler only when actively debugging.
No resource requests on podsDevelopers skip requests “to keep it simple.” Karpenter cannot bin-pack without knowing pod resource needs.Require resource requests via admission controller (Kyverno/OPA). Use VPA recommendations to determine appropriate values.
Cost allocation without label enforcementLabels are optional, most teams do not add them, and 60% of cost is “unallocated.”Enforce team and cost-center labels using admission policies. Reject deployments that lack required labels.
Ignoring idle cost in KubecostFocusing on per-team costs while ignoring the 30%+ of cluster spend that is idle.Review idle cost weekly. Right-size pod requests, enable Karpenter consolidation, and set aggressive consolidateAfter values.

Question 1: During a massive traffic spike, your application needs 50 new nodes. With Cluster Autoscaler, this took 8 minutes, causing an outage. You switch to Karpenter and it takes 45 seconds. What architectural difference allows Karpenter to provision nodes so much faster in this scenario?

Cluster Autoscaler works by adjusting the desired capacity of existing Auto Scaling Groups. It must: (1) detect pending pods, (2) evaluate which ASG to scale, (3) increment the ASG desired count, (4) wait for the ASG to launch an instance. This involves multiple AWS API calls with propagation delays. Karpenter bypasses ASGs entirely and calls the EC2 Fleet API directly, requesting a specific instance type in a specific subnet. It also pre-calculates the optimal instance type based on pending pod requirements, eliminating the trial-and-error of ASG scaling. The result is 30-90 seconds vs 3-10 minutes.

Question 2: You configure a Karpenter NodePool for a batch processing job and include both `spot` and `on-demand` in the capacity-type requirements. When 100 pending batch pods appear, how does Karpenter decide whether to launch Spot or On-Demand instances to fulfill the request?

Karpenter evaluates the cost of both capacity types for each pending pod batch. It first attempts to provision Spot instances because they are cheaper. If Spot capacity is unavailable for the requested instance types (InsufficientInstanceCapacity error from EC2), Karpenter automatically falls back to On-Demand. You can influence this behavior using NodePool weight — create a Spot-preferred NodePool with weight 100 and an On-Demand fallback NodePool with weight 10. Karpenter tries higher-weight NodePools first. Pods can also force On-Demand using nodeSelector: {karpenter.sh/capacity-type: on-demand}.

Question 3: Your cluster has 30% idle cost according to Kubecost. What three actions would have the highest impact on reducing this waste?

(1) Right-size pod resource requests. Use Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual usage and reduce over-provisioned requests. If a pod requests 2 CPU but only uses 0.3 CPU on average, reducing the request to 0.5 CPU frees 1.5 CPU of capacity. (2) Enable Karpenter consolidation. Set consolidationPolicy: WhenEmptyOrUnderutilized with a short consolidateAfter (30s). Karpenter will continuously pack pods onto fewer, better-utilized nodes and terminate empty or underutilized ones. (3) Use Spot instances for stateless workloads. Spot prices are 60-90% lower than On-Demand. Move all fault-tolerant workloads (web frontends, batch jobs, CI/CD runners) to Spot, keeping On-Demand only for databases and critical singletons.

Question 4: You enabled all 5 EKS control plane log types and your CloudWatch bill spiked by $300/month. Which log types should you keep enabled permanently, and which should be on-demand only?

Keep audit and authenticator logs enabled permanently. Audit logs are essential for security compliance — they record who performed what action and when, and you cannot retroactively generate them. Authenticator logs help troubleshoot IAM-to-RBAC mapping issues. The api, controllerManager, and scheduler logs should be enabled only when actively debugging issues. API logs in particular are extremely high-volume (every GET, LIST, WATCH request is logged). Enable them temporarily, diagnose the issue, then disable them.

Question 5: A Spot instance running 8 pods receives a two-minute interruption notice. What happens to the pods, and how should your application handle this?

When a Spot interruption notice arrives, Karpenter (or the AWS Node Termination Handler if using Cluster Autoscaler) marks the node as unschedulable (cordons it) and begins draining pods. Kubernetes respects PodDisruptionBudgets during the drain, evicting pods gracefully. Each pod receives a SIGTERM signal and has until its terminationGracePeriodSeconds (default 30 seconds) to shut down cleanly. Applications should: (1) handle SIGTERM by finishing in-flight requests, (2) use a preStop lifecycle hook for cleanup, (3) have terminationGracePeriodSeconds set appropriately (no more than 90 seconds for Spot), and (4) be designed for redundancy so that losing one pod does not cause user-visible errors.

Question 6: Your startup just launched and needs robust EKS monitoring. The platform team is debating between enabling CloudWatch Container Insights or deploying a self-managed Prometheus stack. What scenario or cluster characteristics would make Prometheus the better choice?

Container Insights sends metrics to CloudWatch as custom metrics. It provides pre-built dashboards, integrates with CloudWatch Alarms, and requires no infrastructure to run (just the agent DaemonSet). However, it charges per-metric ($0.30/metric/month) and can become expensive for large clusters ($1,500-3,000/month). Prometheus is an open-source metrics system that stores metrics locally (or in AMP) with powerful PromQL querying. It is free to run (self-managed) or lower-cost (AMP charges $0.03/10M samples). Use Container Insights for small clusters (under 20 nodes) or teams that want zero-ops monitoring. Use Prometheus for larger clusters, teams that need custom metrics, or organizations with existing Grafana dashboards.

Question 7: A developer deploys a new memory-intensive application to your Karpenter-managed EKS cluster but forgets to define CPU and memory requests in the pod spec. What will happen when Karpenter attempts to provision capacity for these pods?

Karpenter uses pod resource requests (not limits) to determine what instance type to provision. If a pod has no CPU or memory request, Karpenter assumes it needs zero resources and may pack it onto an already-full node, leading to severe resource contention and OOM (Out of Memory) kills. Without accurate requests, Karpenter cannot effectively bin-pack pods—it might provision the wrong instance sizes, wasting money. Furthermore, the Kubernetes scheduler treats pods without requests as “BestEffort”, making them the first to be evicted under node pressure. Always set resource requests that reflect actual usage.


Hands-On Exercise: Replace Cluster Autoscaler with Karpenter + Spot Batch Processing

Section titled “Hands-On Exercise: Replace Cluster Autoscaler with Karpenter + Spot Batch Processing”

In this exercise, you will deploy Karpenter, create NodePools for general and Spot workloads, run a batch processing job on Spot instances, and set up basic cost visibility.

What you will build:

┌────────────────────────────────────────────────────────────────┐
│ EKS Cluster with Karpenter │
│ │
│ NodePool: general-purpose │
│ ┌──────────────────────────────────────┐ │
│ │ On-Demand m6i/c6i instances │ │
│ │ Web frontends, APIs │ │
│ └──────────────────────────────────────┘ │
│ │
│ NodePool: spot-batch │
│ ┌──────────────────────────────────────┐ │
│ │ Spot instances (diverse types) │ │
│ │ Batch processing jobs │ │
│ │ Tainted: spot=true:NoSchedule │ │
│ └──────────────────────────────────────┘ │
│ │
│ OpenCost: Cost allocation dashboard │
└────────────────────────────────────────────────────────────────┘

Task 1: Remove Cluster Autoscaler (If Present)

Section titled “Task 1: Remove Cluster Autoscaler (If Present)”
Solution
Terminal window
# Check if Cluster Autoscaler is running
k get deployment cluster-autoscaler -n kube-system 2>/dev/null
# If present, remove it
helm uninstall cluster-autoscaler -n kube-system 2>/dev/null || \
k delete deployment cluster-autoscaler -n kube-system 2>/dev/null
# Verify it is gone
k get pods -n kube-system | grep autoscaler
# Should return nothing
Solution
Terminal window
# Tag your subnets for Karpenter discovery
CLUSTER_NAME="my-cluster"
SUBNET_IDS=$(aws eks describe-cluster --name $CLUSTER_NAME \
--query 'cluster.resourcesVpcConfig.subnetIds[]' --output text)
for SUBNET in $SUBNET_IDS; do
aws ec2 create-tags --resources $SUBNET \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
done
# Tag the cluster security group
CLUSTER_SG=$(aws eks describe-cluster --name $CLUSTER_NAME \
--query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' --output text)
aws ec2 create-tags --resources $CLUSTER_SG \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
# Install Karpenter via Helm
CLUSTER_ENDPOINT=$(aws eks describe-cluster --name $CLUSTER_NAME \
--query 'cluster.endpoint' --output text)
helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter \
--namespace kube-system \
--set clusterName=$CLUSTER_NAME \
--set clusterEndpoint=$CLUSTER_ENDPOINT \
--version 1.1.0 \
--wait
# Verify Karpenter is running
k get pods -n kube-system -l app.kubernetes.io/name=karpenter
k logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=20

Task 3: Create the General Purpose NodePool

Section titled “Task 3: Create the General Purpose NodePool”
Solution
Terminal window
cat <<'EOF' | k apply -f -
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
role: KarpenterNodeRole
amiSelectorTerms:
- alias: al2023@latest
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-cluster
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 80Gi
volumeType: gp3
encrypted: true
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
httpPutResponseHopLimit: 1
httpTokens: required
tags:
ManagedBy: karpenter
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general-purpose
spec:
template:
metadata:
labels:
workload-type: general
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["large", "xlarge", "2xlarge"]
expireAfter: 720h
limits:
cpu: "200"
memory: 800Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 60s
weight: 50
EOF
echo "General purpose NodePool created."
k get nodepool general-purpose
Solution
Terminal window
cat <<'EOF' | k apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-batch
spec:
template:
metadata:
labels:
workload-type: batch
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["xlarge", "2xlarge", "4xlarge"]
taints:
- key: spot
value: "true"
effect: NoSchedule
expireAfter: 168h # Rotate weekly for Spot
limits:
cpu: "500"
memory: 2000Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
weight: 100
EOF
echo "Spot batch NodePool created."
k get nodepools

Task 5: Run a Batch Processing Job on Spot

Section titled “Task 5: Run a Batch Processing Job on Spot”
Solution
Terminal window
k create namespace batch
cat <<'EOF' | k apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: video-encode-batch
namespace: batch
labels:
team: media
cost-center: cc-5678
spec:
parallelism: 10
completions: 50
backoffLimit: 5
template:
metadata:
labels:
app: video-encoder
team: media
cost-center: cc-5678
spec:
tolerations:
- key: spot
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
karpenter.sh/capacity-type: spot
terminationGracePeriodSeconds: 90
containers:
- name: encoder
image: busybox:latest
command:
- /bin/sh
- -c
- |
echo "Starting video encoding job..."
echo "Processing batch item $((RANDOM % 1000))"
sleep $((30 + RANDOM % 60))
echo "Encoding complete."
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "echo 'Saving progress...' && sleep 5"]
restartPolicy: OnFailure
EOF
# Watch Karpenter provision Spot nodes
echo "Watching Karpenter logs for node provisioning..."
k logs -n kube-system -l app.kubernetes.io/name=karpenter -f --tail=5 &
LOG_PID=$!
# Watch pods scheduling
k get pods -n batch -w &
POD_PID=$!
# Wait a moment then check what Karpenter provisioned
sleep 30
k get nodes -l karpenter.sh/capacity-type=spot \
-o custom-columns='NAME:.metadata.name,TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,CAPACITY:karpenter\.sh/capacity-type'
# Clean up background processes
kill $LOG_PID $POD_PID 2>/dev/null

Task 6: Install OpenCost and View Cost Allocation

Section titled “Task 6: Install OpenCost and View Cost Allocation”
Solution
Terminal window
# Install OpenCost
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.exporter.defaultClusterId=my-cluster \
--set opencost.ui.enabled=true
# Wait for OpenCost to be ready
k wait --for=condition=Ready pods -l app.kubernetes.io/name=opencost -n opencost --timeout=120s
# Port-forward to access the UI
k port-forward -n opencost svc/opencost 9090:9090 &
echo "OpenCost UI available at: http://localhost:9090"
# Query cost allocation via API
curl -s http://localhost:9090/allocation/compute \
-d window=1d \
-d aggregate=namespace \
-d accumulate=true | jq '.data[0] | to_entries[] | {namespace: .key, totalCost: .value.totalCost}'
# Check cost by team label
curl -s http://localhost:9090/allocation/compute \
-d window=1d \
-d aggregate=label:team \
-d accumulate=true | jq '.data[0] | to_entries[] | {team: .key, totalCost: .value.totalCost}'
Terminal window
k delete namespace batch
k delete job video-encode-batch -n batch 2>/dev/null
helm uninstall opencost -n opencost
k delete namespace opencost
k delete nodepool spot-batch general-purpose
k delete ec2nodeclass default
helm uninstall karpenter -n kube-system
# Karpenter-managed nodes will be terminated automatically when NodePools are deleted
  • I removed Cluster Autoscaler (if present) before installing Karpenter
  • I installed Karpenter and created an EC2NodeClass with IMDSv2 enforcement
  • I created a general-purpose NodePool with On-Demand instances and consolidation
  • I created a Spot batch NodePool with taints to isolate batch workloads
  • I ran a batch Job with 10 parallel pods on Spot instances
  • I verified Karpenter provisioned Spot nodes within 90 seconds
  • I installed OpenCost and viewed cost allocation by namespace and team label
  • I can explain why Karpenter is faster than Cluster Autoscaler and how Spot fallback works

You have completed the EKS Deep Dive series. You now understand EKS architecture, networking, identity, storage, and production operations. To continue your cloud Kubernetes journey, explore the AKS Deep Dive or GKE Deep Dive series, or compare all three providers with the Hyperscaler Rosetta Stone.