Module 5.5: EKS Production: Scaling, Observability & Cost
Complexity: [COMPLEX] | Time to Complete: 3h | Prerequisites: Module 5.1 (EKS Architecture), Module 5.2 (EKS Networking)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure Karpenter for intelligent, constraint-based node autoscaling that optimizes cost and bin-packing on EKS
- Implement EKS observability with CloudWatch Container Insights, AWS Distro for OpenTelemetry, and Prometheus
- Deploy cost optimization strategies combining Spot instances, Savings Plans, and right-sizing for EKS workloads
- Design production EKS upgrade runbooks that safely roll clusters through Kubernetes version upgrades
Why This Module Matters
Section titled “Why This Module Matters”In September 2023, a video streaming company running on EKS launched a new series that went viral. Their backend scaled from 200 pods to 1,800 pods in under 20 minutes. The Horizontal Pod Autoscaler did its job — it created the pods. But the Cluster Autoscaler did not keep up. It took 8 minutes to detect the need for new nodes, 3 minutes to provision them through the Auto Scaling Group, and another 2 minutes for the nodes to join the cluster and become Ready. By the time capacity caught up, users had experienced 13 minutes of degraded service: buffering, failed playback starts, and error pages. Their social media mentions during that window were overwhelmingly negative.
The next quarter, they replaced Cluster Autoscaler with Karpenter. During a similar traffic spike, Karpenter detected the unschedulable pods within seconds, calculated the optimal instance types, and launched nodes directly through the EC2 Fleet API. New capacity was available in under 90 seconds. But scaling is only one pillar of production readiness. Without observability, you cannot diagnose what went wrong during the 13-minute degradation. Without cost management, that 1,800-pod spike on On-Demand instances cost $12,000 for a single day — when Spot instances would have handled the stateless video encoding pods at 70% less.
In this module, you will master Karpenter for intelligent, fast node provisioning. You will learn how to orchestrate Spot instances safely. You will configure control plane logging, Container Insights, and Prometheus-based monitoring. And you will implement cost allocation with Kubecost and OpenCost to finally answer the question: “How much does each team actually spend on Kubernetes?”
Karpenter: Next-Generation Node Provisioning
Section titled “Karpenter: Next-Generation Node Provisioning”Karpenter is an open-source, high-performance Kubernetes node provisioner built by AWS. It replaces the Cluster Autoscaler with a fundamentally different approach: instead of scaling existing Auto Scaling Groups, Karpenter provisions individual nodes by directly calling the EC2 Fleet API based on the exact requirements of pending pods.
Karpenter vs Cluster Autoscaler
Section titled “Karpenter vs Cluster Autoscaler”Cluster Autoscaler Workflow: Pod Pending → CA checks ASG configs → Selects ASG → Increments desired capacity → ASG launches instance → Node joins cluster → Pod scheduled Time: 3-10 minutes
Karpenter Workflow: Pod Pending → Karpenter evaluates pod requirements → Calls EC2 Fleet API directly → Instance launches → Node joins cluster → Pod scheduled Time: 30-90 seconds| Feature | Cluster Autoscaler | Karpenter |
|---|---|---|
| Provisioning speed | 3-10 minutes | 30-90 seconds |
| Instance selection | Fixed per ASG (you pre-define) | Dynamic (evaluates pod needs per launch) |
| Bin packing | Limited (works within ASG constraints) | Optimized (selects exact instance type for pending pods) |
| Scale-down | Scans for underutilized nodes periodically | Continuous consolidation (TTL, empty node, and underutilization) |
| Spot handling | Via ASG mixed instances policy | Native Spot support with automatic fallback |
| Node groups | Required (one per instance type mix) | Not required (NodePools define constraints, not specific groups) |
| Maintenance | ASG + Launch Template management | NodePool + EC2NodeClass CRDs |
Installing Karpenter
Section titled “Installing Karpenter”# Install Karpenter using Helm# Install Karpenter v1.x from OCI registry (charts.karpenter.sh is deprecated)# Karpenter needs specific IAM roles and instance profiles# (Simplified here -- see Karpenter docs for full IAM setup)helm install karpenter oci://public.ecr.aws/karpenter/karpenter \ --namespace kube-system \ --set settings.clusterName=my-cluster \ --set clusterEndpoint=$(aws eks describe-cluster --name my-cluster --query 'cluster.endpoint' --output text) \ --set settings.isolatedVPC=false \ --version 1.1.0 \ --waitNodePool: What Karpenter Can Provision
Section titled “NodePool: What Karpenter Can Provision”A NodePool defines the constraints and preferences for the nodes Karpenter creates. Think of it as a menu of acceptable instance types, AZs, and architectures.
apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: general-purposespec: template: metadata: labels: team: platform workload-type: general spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["on-demand", "spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["m", "c", "r"] # General, compute, memory - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["5"] # Only 6th gen and newer - key: karpenter.k8s.aws/instance-size operator: In values: ["large", "xlarge", "2xlarge"] - key: topology.kubernetes.io/zone operator: In values: ["us-east-1a", "us-east-1b", "us-east-1c"] expireAfter: 720h # Force node rotation every 30 days limits: cpu: "1000" # Max 1000 vCPUs across all nodes memory: 4000Gi # Max 4 TiB memory disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 30s weight: 50 # Priority vs other NodePoolsEC2NodeClass: How Nodes Are Configured
Section titled “EC2NodeClass: How Nodes Are Configured”The EC2NodeClass defines the AWS-specific configuration for nodes: AMI, subnets, security groups, instance profile, and user data.
apiVersion: karpenter.k8s.aws/v1kind: EC2NodeClassmetadata: name: defaultspec: role: KarpenterNodeRole amiSelectorTerms: - alias: al2023@latest # Amazon Linux 2023, auto-updated subnetSelectorTerms: - tags: karpenter.sh/discovery: my-cluster securityGroupSelectorTerms: - tags: karpenter.sh/discovery: my-cluster blockDeviceMappings: - deviceName: /dev/xvda ebs: volumeSize: 100Gi volumeType: gp3 iops: 3000 throughput: 125 encrypted: true deleteOnTermination: true metadataOptions: httpEndpoint: enabled httpProtocolIPv6: disabled httpPutResponseHopLimit: 1 # IMDSv2 enforcement httpTokens: required # Require IMDSv2 tags: Environment: production ManagedBy: karpenterHow Karpenter Selects Instance Types
Section titled “How Karpenter Selects Instance Types”When pods are pending, Karpenter evaluates their resource requests against the NodePool constraints and selects the cheapest instance type that fits. This is called “bin packing.”
Pending Pods: Pod A: requests 2 CPU, 4Gi memory Pod B: requests 1 CPU, 8Gi memory Pod C: requests 4 CPU, 4Gi memory
Karpenter evaluates: Option 1: 3x m6i.large (2 CPU, 8Gi each) = $0.288/hr → 1 pod per node Option 2: 1x m6i.2xlarge (8 CPU, 32Gi) = $0.192/hr → all 3 pods on 1 node Option 3: 1x c6i.2xlarge (8 CPU, 16Gi) = $0.170/hr → all 3 pods, tighter fit
Karpenter selects Option 3 (cheapest that satisfies all pod requirements)Karpenter Disruption and Consolidation
Section titled “Karpenter Disruption and Consolidation”Karpenter continuously evaluates whether the current node fleet is optimal. It consolidates by:
- Empty node removal: If a node has no non-DaemonSet pods, remove it after
consolidateAfter - Underutilization: If pods on a node could fit on other existing nodes, drain and terminate it
- Expiration: Force node rotation after
expireAfter(ensures AMI freshness)
Before Consolidation: After Consolidation:┌───────────────┐ ┌───────────────┐│ Node A (25%) │ │ Node A (75%) ││ ▓░░░░░░░░░░░ │ ──────────────► │ ▓▓▓▓▓▓▓▓░░░░ │├───────────────┤ └───────────────┘│ Node B (50%) ││ ▓▓▓▓▓▓░░░░░░ │ (Node B drained, Node C terminated)├───────────────┤│ Node C (empty)││ ░░░░░░░░░░░░ │└───────────────┘3 nodes → 1 node, same pods servedSpot Instance Orchestration
Section titled “Spot Instance Orchestration”Spot instances cost 60-90% less than On-Demand but can be reclaimed by AWS with a two-minute warning. Karpenter makes Spot orchestration significantly simpler than managing it through ASGs.
Configuring Spot in Karpenter
Section titled “Configuring Spot in Karpenter”apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: spot-batchspec: template: spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["m", "c", "r"] - key: karpenter.k8s.aws/instance-size operator: In values: ["xlarge", "2xlarge", "4xlarge"] # Diversify across many types to reduce interruption risk - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["4"] taints: - key: spot value: "true" effect: NoSchedule disruption: consolidationPolicy: WhenEmpty consolidateAfter: 30sSpot Best Practices
Section titled “Spot Best Practices”- Diversify instance types: The more instance types you allow, the more Spot capacity pools Karpenter can draw from, reducing interruption probability
- Use taints for Spot nodes: Only schedule Spot-tolerant workloads on Spot nodes
- Handle interruptions gracefully: Use PodDisruptionBudgets and termination handlers
# PodDisruptionBudget for Spot workloadsapiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: batch-processor-pdb namespace: batchspec: minAvailable: "50%" selector: matchLabels: app: batch-processor---# Deployment tolerating SpotapiVersion: apps/v1kind: Deploymentmetadata: name: batch-processor namespace: batchspec: replicas: 10 selector: matchLabels: app: batch-processor template: metadata: labels: app: batch-processor spec: tolerations: - key: spot operator: Equal value: "true" effect: NoSchedule nodeSelector: karpenter.sh/capacity-type: spot terminationGracePeriodSeconds: 120 containers: - name: processor image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/batch-processor:latest resources: requests: cpu: "2" memory: 4Gi lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 5 && /app/graceful-shutdown"]On-Demand Fallback Pattern
Section titled “On-Demand Fallback Pattern”For critical workloads that must keep running even if Spot is unavailable, use a two-NodePool strategy:
# Primary: Spot (cheap)apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: compute-spotspec: template: spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] weight: 100 # Higher weight = preferred---# Fallback: On-Demand (reliable)apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: compute-ondemandspec: template: spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: karpenter.sh/capacity-type operator: In values: ["on-demand"] weight: 10 # Lower weight = fallback onlyKarpenter tries the higher-weight NodePool first. If Spot capacity is unavailable (InsufficientInstanceCapacity), it falls back to On-Demand automatically.
Compute Savings Plans
Section titled “Compute Savings Plans”While Spot instances offer the deepest discounts (up to 90%), they are not suitable for stateful, singleton, or latency-sensitive workloads that cannot tolerate interruptions. For these On-Demand workloads, AWS Compute Savings Plans provide the next best optimization.
By committing to a consistent amount of compute usage (e.g., $10/hour) for a 1- or 3-year term, you can reduce On-Demand costs by up to 50%. Savings Plans apply automatically across instance families, sizes, and regions. This flexibility is critical for Kubernetes clusters using Karpenter, as the specific instance types provisioned will constantly change based on dynamic bin-packing decisions.
Stop and think: If Karpenter provisions a Spot instance for your workload and AWS reclaims it with a two-minute warning, how does your application ensure zero downtime? (Hint: Think about PodDisruptionBudgets, replicas, and the pod lifecycle.)
Control Plane Logging
Section titled “Control Plane Logging”EKS control plane logs provide visibility into API server activity, authentication events, and scheduler decisions. These logs are essential for security auditing and troubleshooting.
Enabling Control Plane Logs
Section titled “Enabling Control Plane Logs”aws eks update-cluster-config --name my-cluster \ --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'| Log Type | What It Contains | When to Use |
|---|---|---|
api | API server request/response logs | Debugging API errors, rate limiting |
audit | Who did what, when (all API calls) | Security compliance, forensics |
authenticator | Authentication decisions (IAM → K8s RBAC) | Troubleshooting access denied errors |
controllerManager | Controller loops (ReplicaSet, Deployment) | Why pods are not being created |
scheduler | Scheduling decisions and failures | Why pods are Pending |
Logs are delivered to CloudWatch Logs under the group /aws/eks/my-cluster/cluster. They can be queried using CloudWatch Logs Insights:
-- Find all failed API calls in the last hourfields @timestamp, verb, requestURI, responseStatus.code, user.username| filter responseStatus.code >= 400| sort @timestamp desc| limit 50-- Find who deleted a specific podfields @timestamp, verb, requestURI, user.username, sourceIPs.0| filter verb = "delete" and requestURI like "/api/v1/namespaces/production/pods/"| sort @timestamp desc| limit 20Cost warning: Control plane logs can be expensive at scale. An active cluster generating 50 GB/day of audit logs costs ~$25/day in CloudWatch ingestion alone. Consider enabling only
auditandauthenticatorby default, adding the others temporarily for debugging.
Container Insights: Node and Pod Metrics
Section titled “Container Insights: Node and Pod Metrics”Amazon CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from containerized workloads. It provides dashboards for cluster, node, pod, and container-level metrics.
Enabling Container Insights
Section titled “Enabling Container Insights”# Install via the EKS add-onaws eks create-addon \ --cluster-name my-cluster \ --addon-name amazon-cloudwatch-observability \ --service-account-role-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/CloudWatchObservabilityRoleContainer Insights deploys a CloudWatch Agent DaemonSet that collects:
- Cluster metrics: Node count, pod count, CPU/memory utilization
- Node metrics: CPU, memory, disk, network per node
- Pod metrics: CPU, memory, network per pod
- Container metrics: Resource usage per container within a pod
Key CloudWatch Metrics for EKS
Section titled “Key CloudWatch Metrics for EKS”| Metric | Namespace | What to Alert On |
|---|---|---|
node_cpu_utilization | ContainerInsights | > 80% sustained |
node_memory_utilization | ContainerInsights | > 85% sustained |
pod_cpu_utilization_over_pod_limit | ContainerInsights | > 90% (throttling imminent) |
node_filesystem_utilization | ContainerInsights | > 80% (disk pressure) |
cluster_failed_node_count | ContainerInsights | > 0 |
pod_status_phase (Pending) | ContainerInsights | > 0 for > 5 min |
Container Insights Cost Considerations
Section titled “Container Insights Cost Considerations”Container Insights sends custom metrics to CloudWatch, which charges $0.30 per metric per month for the first 10,000 metrics. A 50-node cluster with 500 pods easily generates 5,000+ metrics. Estimate $1,500-3,000/month for Container Insights on a medium-sized cluster. For cost-sensitive environments, consider Prometheus + Grafana as an alternative (covered next).
Prometheus Integration
Section titled “Prometheus Integration”For teams that want full control over their metrics pipeline and lower costs at scale, self-managed Prometheus (or Amazon Managed Prometheus) is the standard choice.
Amazon Managed Prometheus (AMP)
Section titled “Amazon Managed Prometheus (AMP)”AMP is a serverless, Prometheus-compatible monitoring service. You do not manage servers, storage, or scaling.
# Create a workspaceWORKSPACE_ID=$(aws amp create-workspace \ --alias eks-production \ --query 'workspaceId' --output text)
echo "Workspace: $WORKSPACE_ID"echo "Endpoint: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID"Deploying Prometheus to Scrape EKS Metrics
Section titled “Deploying Prometheus to Scrape EKS Metrics”# Install the Prometheus stack using Helmhelm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update
# For self-managed Prometheus + Grafana:helm install prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace \ --set prometheus.prometheusSpec.retention=15d \ --set prometheus.prometheusSpec.resources.requests.cpu=500m \ --set prometheus.prometheusSpec.resources.requests.memory=2Gi \ --set grafana.enabled=true \ --set grafana.adminPassword=DojoGrafana2024
# For remote-write to Amazon Managed Prometheus:helm install prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace \ --set prometheus.prometheusSpec.remoteWrite[0].url="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write" \ --set prometheus.prometheusSpec.remoteWrite[0].sigv4.region=us-east-1Essential PromQL Queries for EKS
Section titled “Essential PromQL Queries for EKS”# CPU usage by namespace (percentage of requests)sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (namespace)/ sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace)* 100
# Memory usage vs limits (OOM risk indicator)sum(container_memory_working_set_bytes{container!=""}) by (namespace, pod)/ sum(kube_pod_container_resource_limits{resource="memory"}) by (namespace, pod)* 100
# Pod restart rate (signals instability)sum(rate(kube_pod_container_status_restarts_total[1h])) by (namespace, pod) > 0
# Node not Ready durationsum(kube_node_status_condition{condition="Ready", status="true"} == 0) by (node)
# Karpenter provisioning latency (p99)histogram_quantile(0.99, sum(rate(karpenter_provisioner_scheduling_duration_seconds_bucket[5m])) by (le))Pause and predict: You notice that
kube_pod_container_status_restarts_totalis rapidly increasing for your core API namespace, butnode_cpu_utilizationis completely normal. What might be causing the pods to restart if it isn’t node-level resource starvation? (Hint: Think about memory limits, liveness probes, or application-level crashes.)
Cost Allocation with Kubecost and OpenCost
Section titled “Cost Allocation with Kubecost and OpenCost”Kubernetes makes it remarkably difficult to answer “how much does team X’s workload cost?” Resources are shared across nodes, and a single node might run pods from five different teams. Kubecost and OpenCost solve this by allocating infrastructure costs to individual namespaces, deployments, and labels.
OpenCost (Open Source)
Section titled “OpenCost (Open Source)”OpenCost is the CNCF-backed open-source standard for Kubernetes cost monitoring.
# Install OpenCosthelm repo add opencost https://opencost.github.io/opencost-helm-charthelm repo update
helm install opencost opencost/opencost \ --namespace opencost \ --create-namespace \ --set opencost.exporter.defaultClusterId=my-cluster \ --set opencost.exporter.aws.spot_data_region=us-east-1 \ --set opencost.exporter.aws.spot_data_bucket=my-spot-pricing-bucket \ --set opencost.ui.enabled=trueKubecost (Commercial + Free Tier)
Section titled “Kubecost (Commercial + Free Tier)”Kubecost provides a richer UI and recommendations on top of the OpenCost engine.
helm repo add kubecost https://kubecost.github.io/cost-analyzerhelm repo update
helm install kubecost kubecost/cost-analyzer \ --namespace kubecost \ --create-namespace \ --set kubecostToken="" \ --set prometheus.server.global.external_labels.cluster_id=my-clusterCost Allocation Strategies
Section titled “Cost Allocation Strategies”The key to effective cost allocation is consistent labeling:
# Enforce cost-tracking labels on all podsapiVersion: apps/v1kind: Deploymentmetadata: name: payment-service namespace: payments labels: app: payment-service team: payments-team # Who owns this? environment: production # Prod vs staging vs dev cost-center: cc-1234 # Finance tracking codespec: template: metadata: labels: app: payment-service team: payments-team environment: production cost-center: cc-1234Enforce labels using admission controllers:
# Kyverno policy to require cost labelsapiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-cost-labelsspec: validationFailureAction: Enforce rules: - name: check-cost-labels match: any: - resources: kinds: - Deployment - StatefulSet - DaemonSet validate: message: "All workloads must have 'team' and 'cost-center' labels." pattern: metadata: labels: team: "?*" cost-center: "?*"Reading a Cost Report
Section titled “Reading a Cost Report”Monthly Cost Breakdown (Example):┌──────────────────────────────────────────────────────────────┐│ Team │ Namespace │ CPU Cost │ Mem Cost │ Total │├──────────────────────────────────────────────────────────────┤│ payments │ payments │ $1,240 │ $890 │ $2,130 ││ search │ search │ $3,100 │ $2,450 │ $5,550 ││ data-pipeline │ batch │ $890 │ $1,200 │ $2,090 ││ platform │ monitoring │ $420 │ $680 │ $1,100 ││ (idle) │ (unallocated)│ $1,800 │ $2,300 │ $4,100 │├──────────────────────────────────────────────────────────────┤│ TOTAL CLUSTER │ $14,970 │└──────────────────────────────────────────────────────────────┘
Key insight: $4,100 (27%) is idle/unallocated — this is your optimization target.The “idle” cost represents resources (CPU, memory) that are provisioned but not requested by any pod. Reducing idle cost through better bin packing (Karpenter consolidation), right-sizing pod requests, and Spot adoption is the highest-leverage cost optimization.
EKS Upgrade Runbooks
Section titled “EKS Upgrade Runbooks”Kubernetes releases minor versions three times a year, and AWS supports each EKS version for 14 months (with extended support available at an additional cost). Falling behind means risking security vulnerabilities and losing community support. A production upgrade runbook ensures safe, zero-downtime cluster upgrades.
The Standard Upgrade Sequence
Section titled “The Standard Upgrade Sequence”- Pre-flight API Checks: Verify that no deprecated or removed Kubernetes APIs are in use by your manifests or Helm charts (using open-source tools like
plutoorkubent). Update your CI/CD pipelines to deploy the new API versions. - Control Plane Upgrade: Trigger the EKS control plane upgrade via the AWS API, CLI, or Terraform. This process takes 10-20 minutes. AWS manages a highly available control plane, so API requests may experience slight latency, but running pods are unaffected.
- Core Add-on Upgrades: Update the Amazon VPC CNI, CoreDNS, and kube-proxy add-ons to the specific versions recommended for your new Kubernetes control plane version.
- Data Plane (Node) Rotation:
- With Karpenter: If using an AMI alias like
al2023@latestin yourEC2NodeClass, Karpenter automatically discovers the latest EKS Optimized AMI for the new control plane version. You can trigger rotation by settingexpireAfter: 1mtemporarily or using Karpenter’s node drift feature, which automatically replaces nodes when the underlying AMI changes. - With Managed Node Groups: Trigger a rolling update. EKS will automatically cordon, drain, and replace nodes one by one.
- With Karpenter: If using an AMI alias like
Stop and think: Why must you upgrade the EKS control plane before upgrading the worker nodes? (Hint: The Kubernetes version skew policy dictates compatibility rules between the kube-apiserver and the kubelet running on nodes.)
Did You Know?
Section titled “Did You Know?”-
Karpenter evaluates over 600 EC2 instance types when deciding what to launch. For each set of pending pods, it simulates packing them onto every compatible instance type, calculates the cost, and selects the cheapest option. This evaluation happens in milliseconds thanks to an in-memory instance type database that Karpenter refreshes from the EC2 pricing API every 6 hours.
-
EKS control plane audit logs record every single API request, including the request body and response. For a cluster with 500 pods and active HPA/VPA controllers, the audit log volume can reach 10-15 GB per day. At CloudWatch’s $0.50/GB ingestion rate, that is $5-7.50/day or $150-225/month just for audit log storage. Many teams filter audit logs to only capture write operations and authentication events, reducing volume by 80%.
-
Spot instance interruption rates vary dramatically by instance type and region. In 2024 data, the
m5.xlargein us-east-1 had a roughly 5% monthly interruption rate, while them6i.xlargein the same region was under 3%. By diversifying across 15-20 instance types and 3 AZs, you can achieve an effective interruption rate below 2% per node per month, making Spot viable even for user-facing services behind proper redundancy. -
Kubecost analysis across thousands of Kubernetes clusters found that the average cluster utilization (actual CPU/memory usage vs provisioned) is only 20-35%. This means 65-80% of infrastructure spend is wasted on idle resources. The three highest-impact optimizations are: (1) right-sizing pod resource requests using VPA recommendations, (2) enabling Karpenter consolidation to remove underutilized nodes, and (3) using Spot instances for stateless and fault-tolerant workloads.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Running Cluster Autoscaler and Karpenter simultaneously | Migrating incrementally but forgetting to remove the old autoscaler. Both fight over node lifecycle. | Remove Cluster Autoscaler completely before deploying Karpenter. They are mutually exclusive. |
| Karpenter NodePool with only one instance type | Over-constraining instance requirements, defeating Karpenter’s instance flexibility advantage. | Allow at least 10-15 instance types across multiple families (m, c, r) and generations. More diversity means better availability and pricing. |
No expireAfter on NodePool | Nodes run indefinitely with stale AMIs and potential security vulnerabilities. | Set expireAfter: 720h (30 days) to force rotation. Karpenter gracefully drains nodes before termination. |
| Spot for stateful workloads without replication | Assuming Spot interruption will not happen. A database on a Spot node loses its EBS volume attachment during interruption. | Use On-Demand for databases and single-point-of-failure workloads. Spot is for stateless, redundant, or batch workloads only. |
| Enabling all 5 control plane log types permanently | ”We might need them someday.” Meanwhile paying $200/month for logs nobody reads. | Enable audit and authenticator always. Enable api, controllerManager, and scheduler only when actively debugging. |
| No resource requests on pods | Developers skip requests “to keep it simple.” Karpenter cannot bin-pack without knowing pod resource needs. | Require resource requests via admission controller (Kyverno/OPA). Use VPA recommendations to determine appropriate values. |
| Cost allocation without label enforcement | Labels are optional, most teams do not add them, and 60% of cost is “unallocated.” | Enforce team and cost-center labels using admission policies. Reject deployments that lack required labels. |
| Ignoring idle cost in Kubecost | Focusing on per-team costs while ignoring the 30%+ of cluster spend that is idle. | Review idle cost weekly. Right-size pod requests, enable Karpenter consolidation, and set aggressive consolidateAfter values. |
Question 1: During a massive traffic spike, your application needs 50 new nodes. With Cluster Autoscaler, this took 8 minutes, causing an outage. You switch to Karpenter and it takes 45 seconds. What architectural difference allows Karpenter to provision nodes so much faster in this scenario?
Cluster Autoscaler works by adjusting the desired capacity of existing Auto Scaling Groups. It must: (1) detect pending pods, (2) evaluate which ASG to scale, (3) increment the ASG desired count, (4) wait for the ASG to launch an instance. This involves multiple AWS API calls with propagation delays. Karpenter bypasses ASGs entirely and calls the EC2 Fleet API directly, requesting a specific instance type in a specific subnet. It also pre-calculates the optimal instance type based on pending pod requirements, eliminating the trial-and-error of ASG scaling. The result is 30-90 seconds vs 3-10 minutes.
Question 2: You configure a Karpenter NodePool for a batch processing job and include both `spot` and `on-demand` in the capacity-type requirements. When 100 pending batch pods appear, how does Karpenter decide whether to launch Spot or On-Demand instances to fulfill the request?
Karpenter evaluates the cost of both capacity types for each pending pod batch. It first attempts to provision Spot instances because they are cheaper. If Spot capacity is unavailable for the requested instance types (InsufficientInstanceCapacity error from EC2), Karpenter automatically falls back to On-Demand. You can influence this behavior using NodePool weight — create a Spot-preferred NodePool with weight 100 and an On-Demand fallback NodePool with weight 10. Karpenter tries higher-weight NodePools first. Pods can also force On-Demand using nodeSelector: {karpenter.sh/capacity-type: on-demand}.
Question 3: Your cluster has 30% idle cost according to Kubecost. What three actions would have the highest impact on reducing this waste?
(1) Right-size pod resource requests. Use Vertical Pod Autoscaler (VPA) in recommendation mode to analyze actual usage and reduce over-provisioned requests. If a pod requests 2 CPU but only uses 0.3 CPU on average, reducing the request to 0.5 CPU frees 1.5 CPU of capacity. (2) Enable Karpenter consolidation. Set consolidationPolicy: WhenEmptyOrUnderutilized with a short consolidateAfter (30s). Karpenter will continuously pack pods onto fewer, better-utilized nodes and terminate empty or underutilized ones. (3) Use Spot instances for stateless workloads. Spot prices are 60-90% lower than On-Demand. Move all fault-tolerant workloads (web frontends, batch jobs, CI/CD runners) to Spot, keeping On-Demand only for databases and critical singletons.
Question 4: You enabled all 5 EKS control plane log types and your CloudWatch bill spiked by $300/month. Which log types should you keep enabled permanently, and which should be on-demand only?
Keep audit and authenticator logs enabled permanently. Audit logs are essential for security compliance — they record who performed what action and when, and you cannot retroactively generate them. Authenticator logs help troubleshoot IAM-to-RBAC mapping issues. The api, controllerManager, and scheduler logs should be enabled only when actively debugging issues. API logs in particular are extremely high-volume (every GET, LIST, WATCH request is logged). Enable them temporarily, diagnose the issue, then disable them.
Question 5: A Spot instance running 8 pods receives a two-minute interruption notice. What happens to the pods, and how should your application handle this?
When a Spot interruption notice arrives, Karpenter (or the AWS Node Termination Handler if using Cluster Autoscaler) marks the node as unschedulable (cordons it) and begins draining pods. Kubernetes respects PodDisruptionBudgets during the drain, evicting pods gracefully. Each pod receives a SIGTERM signal and has until its terminationGracePeriodSeconds (default 30 seconds) to shut down cleanly. Applications should: (1) handle SIGTERM by finishing in-flight requests, (2) use a preStop lifecycle hook for cleanup, (3) have terminationGracePeriodSeconds set appropriately (no more than 90 seconds for Spot), and (4) be designed for redundancy so that losing one pod does not cause user-visible errors.
Question 6: Your startup just launched and needs robust EKS monitoring. The platform team is debating between enabling CloudWatch Container Insights or deploying a self-managed Prometheus stack. What scenario or cluster characteristics would make Prometheus the better choice?
Container Insights sends metrics to CloudWatch as custom metrics. It provides pre-built dashboards, integrates with CloudWatch Alarms, and requires no infrastructure to run (just the agent DaemonSet). However, it charges per-metric ($0.30/metric/month) and can become expensive for large clusters ($1,500-3,000/month). Prometheus is an open-source metrics system that stores metrics locally (or in AMP) with powerful PromQL querying. It is free to run (self-managed) or lower-cost (AMP charges $0.03/10M samples). Use Container Insights for small clusters (under 20 nodes) or teams that want zero-ops monitoring. Use Prometheus for larger clusters, teams that need custom metrics, or organizations with existing Grafana dashboards.
Question 7: A developer deploys a new memory-intensive application to your Karpenter-managed EKS cluster but forgets to define CPU and memory requests in the pod spec. What will happen when Karpenter attempts to provision capacity for these pods?
Karpenter uses pod resource requests (not limits) to determine what instance type to provision. If a pod has no CPU or memory request, Karpenter assumes it needs zero resources and may pack it onto an already-full node, leading to severe resource contention and OOM (Out of Memory) kills. Without accurate requests, Karpenter cannot effectively bin-pack pods—it might provision the wrong instance sizes, wasting money. Furthermore, the Kubernetes scheduler treats pods without requests as “BestEffort”, making them the first to be evicted under node pressure. Always set resource requests that reflect actual usage.
Hands-On Exercise: Replace Cluster Autoscaler with Karpenter + Spot Batch Processing
Section titled “Hands-On Exercise: Replace Cluster Autoscaler with Karpenter + Spot Batch Processing”In this exercise, you will deploy Karpenter, create NodePools for general and Spot workloads, run a batch processing job on Spot instances, and set up basic cost visibility.
What you will build:
┌────────────────────────────────────────────────────────────────┐│ EKS Cluster with Karpenter ││ ││ NodePool: general-purpose ││ ┌──────────────────────────────────────┐ ││ │ On-Demand m6i/c6i instances │ ││ │ Web frontends, APIs │ ││ └──────────────────────────────────────┘ ││ ││ NodePool: spot-batch ││ ┌──────────────────────────────────────┐ ││ │ Spot instances (diverse types) │ ││ │ Batch processing jobs │ ││ │ Tainted: spot=true:NoSchedule │ ││ └──────────────────────────────────────┘ ││ ││ OpenCost: Cost allocation dashboard │└────────────────────────────────────────────────────────────────┘Task 1: Remove Cluster Autoscaler (If Present)
Section titled “Task 1: Remove Cluster Autoscaler (If Present)”Solution
# Check if Cluster Autoscaler is runningk get deployment cluster-autoscaler -n kube-system 2>/dev/null
# If present, remove ithelm uninstall cluster-autoscaler -n kube-system 2>/dev/null || \ k delete deployment cluster-autoscaler -n kube-system 2>/dev/null
# Verify it is gonek get pods -n kube-system | grep autoscaler# Should return nothingTask 2: Install Karpenter
Section titled “Task 2: Install Karpenter”Solution
# Tag your subnets for Karpenter discoveryCLUSTER_NAME="my-cluster"SUBNET_IDS=$(aws eks describe-cluster --name $CLUSTER_NAME \ --query 'cluster.resourcesVpcConfig.subnetIds[]' --output text)
for SUBNET in $SUBNET_IDS; do aws ec2 create-tags --resources $SUBNET \ --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAMEdone
# Tag the cluster security groupCLUSTER_SG=$(aws eks describe-cluster --name $CLUSTER_NAME \ --query 'cluster.resourcesVpcConfig.clusterSecurityGroupId' --output text)aws ec2 create-tags --resources $CLUSTER_SG \ --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME
# Install Karpenter via HelmCLUSTER_ENDPOINT=$(aws eks describe-cluster --name $CLUSTER_NAME \ --query 'cluster.endpoint' --output text)
helm repo add karpenter https://charts.karpenter.shhelm repo update
helm install karpenter karpenter/karpenter \ --namespace kube-system \ --set clusterName=$CLUSTER_NAME \ --set clusterEndpoint=$CLUSTER_ENDPOINT \ --version 1.1.0 \ --wait
# Verify Karpenter is runningk get pods -n kube-system -l app.kubernetes.io/name=karpenterk logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=20Task 3: Create the General Purpose NodePool
Section titled “Task 3: Create the General Purpose NodePool”Solution
cat <<'EOF' | k apply -f -apiVersion: karpenter.k8s.aws/v1kind: EC2NodeClassmetadata: name: defaultspec: role: KarpenterNodeRole amiSelectorTerms: - alias: al2023@latest subnetSelectorTerms: - tags: karpenter.sh/discovery: my-cluster securityGroupSelectorTerms: - tags: karpenter.sh/discovery: my-cluster blockDeviceMappings: - deviceName: /dev/xvda ebs: volumeSize: 80Gi volumeType: gp3 encrypted: true deleteOnTermination: true metadataOptions: httpEndpoint: enabled httpPutResponseHopLimit: 1 httpTokens: required tags: ManagedBy: karpenter---apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: general-purposespec: template: metadata: labels: workload-type: general spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["on-demand"] - key: karpenter.k8s.aws/instance-category operator: In values: ["m", "c", "r"] - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["5"] - key: karpenter.k8s.aws/instance-size operator: In values: ["large", "xlarge", "2xlarge"] expireAfter: 720h limits: cpu: "200" memory: 800Gi disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 60s weight: 50EOF
echo "General purpose NodePool created."k get nodepool general-purposeTask 4: Create the Spot Batch NodePool
Section titled “Task 4: Create the Spot Batch NodePool”Solution
cat <<'EOF' | k apply -f -apiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: spot-batchspec: template: metadata: labels: workload-type: batch spec: nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-category operator: In values: ["m", "c", "r"] - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["4"] - key: karpenter.k8s.aws/instance-size operator: In values: ["xlarge", "2xlarge", "4xlarge"] taints: - key: spot value: "true" effect: NoSchedule expireAfter: 168h # Rotate weekly for Spot limits: cpu: "500" memory: 2000Gi disruption: consolidationPolicy: WhenEmpty consolidateAfter: 30s weight: 100EOF
echo "Spot batch NodePool created."k get nodepoolsTask 5: Run a Batch Processing Job on Spot
Section titled “Task 5: Run a Batch Processing Job on Spot”Solution
k create namespace batch
cat <<'EOF' | k apply -f -apiVersion: batch/v1kind: Jobmetadata: name: video-encode-batch namespace: batch labels: team: media cost-center: cc-5678spec: parallelism: 10 completions: 50 backoffLimit: 5 template: metadata: labels: app: video-encoder team: media cost-center: cc-5678 spec: tolerations: - key: spot operator: Equal value: "true" effect: NoSchedule nodeSelector: karpenter.sh/capacity-type: spot terminationGracePeriodSeconds: 90 containers: - name: encoder image: busybox:latest command: - /bin/sh - -c - | echo "Starting video encoding job..." echo "Processing batch item $((RANDOM % 1000))" sleep $((30 + RANDOM % 60)) echo "Encoding complete." resources: requests: cpu: "1" memory: 2Gi limits: cpu: "2" memory: 4Gi lifecycle: preStop: exec: command: ["/bin/sh", "-c", "echo 'Saving progress...' && sleep 5"] restartPolicy: OnFailureEOF
# Watch Karpenter provision Spot nodesecho "Watching Karpenter logs for node provisioning..."k logs -n kube-system -l app.kubernetes.io/name=karpenter -f --tail=5 &LOG_PID=$!
# Watch pods schedulingk get pods -n batch -w &POD_PID=$!
# Wait a moment then check what Karpenter provisionedsleep 30k get nodes -l karpenter.sh/capacity-type=spot \ -o custom-columns='NAME:.metadata.name,TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,AZ:.metadata.labels.topology\.kubernetes\.io/zone,CAPACITY:karpenter\.sh/capacity-type'
# Clean up background processeskill $LOG_PID $POD_PID 2>/dev/nullTask 6: Install OpenCost and View Cost Allocation
Section titled “Task 6: Install OpenCost and View Cost Allocation”Solution
# Install OpenCosthelm repo add opencost https://opencost.github.io/opencost-helm-charthelm repo update
helm install opencost opencost/opencost \ --namespace opencost \ --create-namespace \ --set opencost.exporter.defaultClusterId=my-cluster \ --set opencost.ui.enabled=true
# Wait for OpenCost to be readyk wait --for=condition=Ready pods -l app.kubernetes.io/name=opencost -n opencost --timeout=120s
# Port-forward to access the UIk port-forward -n opencost svc/opencost 9090:9090 &
echo "OpenCost UI available at: http://localhost:9090"
# Query cost allocation via APIcurl -s http://localhost:9090/allocation/compute \ -d window=1d \ -d aggregate=namespace \ -d accumulate=true | jq '.data[0] | to_entries[] | {namespace: .key, totalCost: .value.totalCost}'
# Check cost by team labelcurl -s http://localhost:9090/allocation/compute \ -d window=1d \ -d aggregate=label:team \ -d accumulate=true | jq '.data[0] | to_entries[] | {team: .key, totalCost: .value.totalCost}'Clean Up
Section titled “Clean Up”k delete namespace batchk delete job video-encode-batch -n batch 2>/dev/nullhelm uninstall opencost -n opencostk delete namespace opencostk delete nodepool spot-batch general-purposek delete ec2nodeclass defaulthelm uninstall karpenter -n kube-system# Karpenter-managed nodes will be terminated automatically when NodePools are deletedSuccess Criteria
Section titled “Success Criteria”- I removed Cluster Autoscaler (if present) before installing Karpenter
- I installed Karpenter and created an EC2NodeClass with IMDSv2 enforcement
- I created a general-purpose NodePool with On-Demand instances and consolidation
- I created a Spot batch NodePool with taints to isolate batch workloads
- I ran a batch Job with 10 parallel pods on Spot instances
- I verified Karpenter provisioned Spot nodes within 90 seconds
- I installed OpenCost and viewed cost allocation by namespace and team label
- I can explain why Karpenter is faster than Cluster Autoscaler and how Spot fallback works
Next Module
Section titled “Next Module”You have completed the EKS Deep Dive series. You now understand EKS architecture, networking, identity, storage, and production operations. To continue your cloud Kubernetes journey, explore the AKS Deep Dive or GKE Deep Dive series, or compare all three providers with the Hyperscaler Rosetta Stone.