Skip to content

Module 6.1: Karpenter

Toolkit Track | Complexity: [COMPLEX] | Time: 45-50 minutes

Cluster Autoscaler was fine. Karpenter is better. Instead of scaling node groups, Karpenter provisions individual nodes matched to pending pod requirements—in seconds, not minutes. This module covers Karpenter’s architecture, NodePools, and strategies for efficient autoscaling.

What You’ll Learn:

  • Karpenter architecture and how it differs from Cluster Autoscaler
  • NodePools and NodeClasses configuration
  • Consolidation and cost optimization
  • Multi-architecture and spot instance strategies

Prerequisites:

  • Kubernetes scheduling concepts
  • SRE Discipline — Capacity planning basics
  • Cloud provider fundamentals (EC2, instance types)

After completing this module, you will be able to:

  • Deploy Karpenter for intelligent Kubernetes node provisioning based on pending pod requirements
  • Configure Karpenter NodePools with instance type constraints, zones, and consolidation policies
  • Implement Karpenter’s spot instance strategies with fallback provisioning for cost-optimized clusters
  • Compare Karpenter’s pod-driven scaling against Cluster Autoscaler’s node-group approach for cost efficiency

Cluster Autoscaler thinks in node groups. “Need more capacity? Add another node from this pre-defined group.” Karpenter thinks in pods. “This pod needs 4 CPU, 8GB RAM, ARM64, and GPU? I’ll provision exactly that.” The result: faster scaling, better bin-packing, and lower costs.

💡 Did You Know? Karpenter was created by AWS and open-sourced in 2021. It can provision a node in under 60 seconds, compared to 5-10 minutes with Cluster Autoscaler. The difference is architectural: Cluster Autoscaler triggers ASG scaling and waits; Karpenter directly calls EC2 APIs to create instances.


CLUSTER AUTOSCALER (OLD WAY)
════════════════════════════════════════════════════════════════════
1. Pod pending → scheduler can't place it
2. Cluster Autoscaler detects pending pod
3. CA increases desired count in ASG
4. ASG launches new node (from fixed instance type)
5. Node joins cluster
6. Pod scheduled
Time: 5-10 minutes
Limitation: Must pre-define node groups
Problem: Instance type might not match pod needs
═══════════════════════════════════════════════════════════════════
KARPENTER (NEW WAY)
════════════════════════════════════════════════════════════════════
1. Pod pending → scheduler can't place it
2. Karpenter detects pending pod immediately
3. Karpenter analyzes pod requirements:
- CPU/memory requests
- Node selectors
- Tolerations
- Topology constraints
4. Karpenter selects optimal instance type
5. Karpenter calls EC2 API directly → node launches
6. Node joins cluster
7. Pod scheduled
Time: < 60 seconds
Advantage: Right-sized instances for actual needs
Benefit: No node groups to manage
FeatureCluster AutoscalerKarpenter
Scale-up time5-10 minutes< 60 seconds
Instance selectionPre-defined node groupsDynamic, per-pod
Bin-packingBasicIntelligent
Spot handlingSeparate ASGsNative, mixed
ConsolidationManualAutomatic
Multi-archSeparate node groupsNative
GPU workloadsSeparate node groupsNative

KARPENTER ARCHITECTURE
════════════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ KARPENTER CONTROLLER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Provisioner│ │ Consolidator│ │ Disruption │ │ │
│ │ │ (scale up) │ │ (optimize) │ │ (scale down)│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ └───────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼──────────────────────────────┐ │
│ │ KUBERNETES API │ │
│ │ │ │
│ │ NodePool ─────▶ "What constraints?" │ │
│ │ EC2NodeClass ──▶ "How to launch?" │ │
│ │ Pending Pods ──▶ "What's needed?" │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
└──────────────────────────────┼───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ AWS EC2 API │
│ │
│ Karpenter calls CreateFleet → EC2 launches instance │
│ Instance registers with cluster via bootstrap script │
│ │
└─────────────────────────────────────────────────────────────────┘
ComponentPurpose
NodePoolDefines constraints and requirements for nodes (replaces old Provisioner)
EC2NodeClassAWS-specific launch configuration (AMI, subnets, security groups)
NodeClaimRepresents a request for a node (created by Karpenter)

Terminal window
# Set environment variables
export KARPENTER_NAMESPACE="kube-system"
export KARPENTER_VERSION="1.0.1"
export CLUSTER_NAME="my-cluster"
export AWS_PARTITION="aws"
export AWS_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
# Install Karpenter
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "${KARPENTER_VERSION}" \
--namespace "${KARPENTER_NAMESPACE}" \
--create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
# Verify installation
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
# Instance types Karpenter can choose from
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # Compute, General, Memory optimized
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium", "large", "xlarge", "2xlarge"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h # 30 days - nodes replaced to stay fresh
limits:
cpu: 1000 # Max 1000 vCPUs in this NodePool
memory: 2000Gi # Max 2000 GB memory
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023 # Amazon Linux 2023
role: "KarpenterNodeRole-${CLUSTER_NAME}"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
encrypted: true
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 1 # IMDSv2 requirement
httpTokens: required # IMDSv2 requirement
tags:
Environment: production
ManagedBy: karpenter

💡 Did You Know? Karpenter can choose from all 500+ EC2 instance types automatically. It calculates the best price-performance ratio for your specific workload requirements. You don’t need to manually select m5.large vs m5.xlarge—Karpenter does the math in real-time based on current pricing.


apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["g", "p"] # GPU instance families
- key: karpenter.k8s.aws/instance-gpu-count
operator: Gt
values: ["0"]
- key: nvidia.com/gpu
operator: Exists
taints:
- key: nvidia.com/gpu
effect: NoSchedule
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: gpu
limits:
nvidia.com/gpu: 100
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: graviton
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # Graviton versions of these
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: high-memory
spec:
template:
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["r", "x"] # Memory-optimized
- key: karpenter.k8s.aws/instance-memory
operator: Gt
values: ["32768"] # > 32GB memory
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-first
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Wide instance type selection = better spot availability
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["nano", "micro", "small"] # Too small
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
# Karpenter prefers spot when available
# Falls back to on-demand if spot unavailable
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
disruption:
# Options: WhenEmpty, WhenEmptyOrUnderutilized
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m # Wait before consolidating
# Budgets limit disruption rate
budgets:
- nodes: "10%" # Max 10% of nodes disrupted at once
- nodes: "0"
schedule: "0 9 * * 1-5" # No disruption 9 AM weekdays
duration: 8h # For 8 hours
CONSOLIDATION EXAMPLE
════════════════════════════════════════════════════════════════════
BEFORE CONSOLIDATION:
─────────────────────────────────────────────────────────────────
Node 1 (m5.xlarge) Node 2 (m5.xlarge) Node 3 (m5.xlarge)
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Pod A (2 CPU) │ │ Pod C (1 CPU) │ │ Pod E (1 CPU) │
│ Pod B (1 CPU) │ │ │ │ │
│ │ │ │ │ │
│ (4 CPU total) │ │ (4 CPU total) │ │ (4 CPU total) │
│ 3 CPU used │ │ 1 CPU used │ │ 1 CPU used │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Total: 12 CPU capacity, 5 CPU used (42% utilization)
AFTER CONSOLIDATION:
─────────────────────────────────────────────────────────────────
Node 1 (m5.xlarge)
┌──────────────────┐
│ Pod A (2 CPU) │
│ Pod B (1 CPU) │
│ Pod C (1 CPU) │
│ Pod E (1 CPU) │
│ (4 CPU total) │
│ 5 CPU used │ Nodes 2 and 3 TERMINATED
└──────────────────┘
Total: 4 CPU capacity, 5 CPU used → switch to m5.2xlarge
Final: 8 CPU capacity, 5 CPU used (63% utilization)
COST SAVINGS: 3 nodes → 1 node = 66% reduction

💡 Did You Know? Karpenter’s consolidation can save 30-50% on compute costs for typical workloads. It continuously evaluates whether pods could be repacked more efficiently and automatically replaces underutilized nodes. This runs every 15 seconds by default.


# Karpenter handles spot interruptions automatically
# when settings.interruptionQueue is configured
# EC2 sends interruption notice (2 min warning)
# Karpenter cordons node
# Karpenter drains pods gracefully
# New node provisioned for displaced pods
# Your pods should handle graceful termination:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Time to shutdown gracefully
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 30"] # Drain connections

💡 Did You Know? Karpenter was built after AWS spent years running Cluster Autoscaler at scale and understanding its limitations. The key insight was that Cluster Autoscaler works at the node group level, but Karpenter works at the pod level—it looks at what pods need and provisions exactly the right nodes. This “just-in-time” approach is why provisioning is so fast.


MistakeProblemSolution
Too narrow instance typesSpot unavailable, higher costsAllow many instance types
No CPU/memory limits on NodePoolRunaway scalingSet reasonable limits
No disruption budgetsConsolidation causes outagesUse budgets to limit churn
Single AZAZ failure = total outageMulti-AZ subnets in EC2NodeClass
No taints for specialized nodesWrong pods land on GPU nodesUse taints + tolerations
IMDSv1 enabledSecurity riskUse httpTokens: required

A team deployed Karpenter without CPU limits. A bug in their HPA created infinite scaling. Monday morning: 2,000 nodes, $50,000 bill.

What went wrong:

  1. HPA misconfigured with wrong metric
  2. HPA kept requesting more replicas
  3. Karpenter dutifully provisioned nodes
  4. No alerts on node count or spend

The fix:

spec:
limits:
cpu: 500 # Hard limit on total CPUs
memory: 1000Gi
# Plus alerts:
# - Alert when node count > threshold
# - Alert when hourly spend > threshold
# - Alert when scaling rate > threshold

Why is Karpenter faster than Cluster Autoscaler?

Show Answer

Cluster Autoscaler:

  1. Detects pending pods
  2. Increases ASG desired count
  3. ASG calls EC2 to launch from launch template
  4. EC2 provisions instance
  5. Instance bootstraps, joins cluster

Steps 2-3 involve ASG reconciliation loops = slow

Karpenter:

  1. Detects pending pods
  2. Calls EC2 CreateFleet API directly
  3. EC2 provisions instance
  4. Instance bootstraps, joins cluster

Karpenter bypasses ASG entirely = faster

What’s the difference between a NodePool and EC2NodeClass?

Show Answer

NodePool (cloud-agnostic):

  • What kind of nodes are acceptable
  • Instance requirements (CPU, memory, arch)
  • Capacity limits
  • Disruption policies
  • Taints and labels

EC2NodeClass (AWS-specific):

  • How to launch nodes on AWS
  • AMI family and version
  • Subnets and security groups
  • IAM role
  • Block device configuration
  • User data/bootstrap scripts

One NodePool references one EC2NodeClass. Multiple NodePools can share an EC2NodeClass.

How does consolidation save money?

Show Answer

Consolidation identifies underutilized nodes and repacks pods:

  1. Detects nodes with low utilization
  2. Simulates where pods could move
  3. Cordons the underutilized node
  4. Drains pods to other nodes
  5. Terminates the empty node

Also:

  • Replaces larger nodes with smaller ones when possible
  • Combines multiple small nodes into fewer large ones
  • Respects disruption budgets to avoid outages

Typical savings: 30-50% compute cost reduction


Deploy Karpenter and observe dynamic node provisioning.

Terminal window
# For this exercise, you need an EKS cluster
# Follow AWS documentation to set up Karpenter prerequisites:
# https://karpenter.sh/docs/getting-started/
# Create basic NodePool
kubectl apply -f - <<EOF
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium", "large", "xlarge"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 100
memory: 200Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
EOF
  1. Watch Karpenter logs:

    Terminal window
    kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f
  2. Create workload that needs new node:

    Terminal window
    kubectl apply -f - <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: inflate
    spec:
    replicas: 5
    selector:
    matchLabels:
    app: inflate
    template:
    metadata:
    labels:
    app: inflate
    spec:
    containers:
    - name: inflate
    image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
    resources:
    requests:
    cpu: "1"
    memory: "1Gi"
    EOF
  3. Observe node provisioning:

    Terminal window
    kubectl get nodes -w
    # Watch for new node to appear
  4. Check NodeClaim:

    Terminal window
    kubectl get nodeclaims
    kubectl describe nodeclaim <name>
  5. Scale down and observe consolidation:

    Terminal window
    kubectl scale deployment inflate --replicas=0
    # Watch nodes get consolidated/terminated
    kubectl get nodes -w
  • Karpenter controller running
  • NodePool created successfully
  • New node provisioned in < 2 minutes
  • Node has correct labels (instance type, capacity type)
  • Consolidation removes empty node after scale-down

Create a separate NodePool for ARM64/Graviton instances and deploy a workload that specifically requests ARM64.



Continue to Module 6.2: KEDA to learn event-driven autoscaling for workloads based on metrics, queues, and custom triggers.


“The best infrastructure is invisible. Karpenter makes capacity planning disappear.”