Skip to content

Module 8.8: Cloud Cost Optimization (Advanced)

Complexity: [MEDIUM]

Time to Complete: 2 hours

Prerequisites: Basic understanding of Kubernetes resource requests/limits and cloud billing concepts

Track: Advanced Cloud Operations

After completing this module, you will be able to:

  • Implement FinOps practices with cloud-native cost allocation tagging, showback, and chargeback mechanisms
  • Configure Kubernetes cost visibility using Kubecost, OpenCost, or cloud-native cost tools across multi-cluster environments
  • Optimize compute costs using reserved instances, committed use discounts, Spot/preemptible instances, and right-sizing
  • Design automated cost anomaly detection and budget alerting pipelines that trigger remediation actions

Q1 2024. A Series C startup. $8 million annual cloud spend.

The CFO called an all-hands meeting. Cloud costs had grown 340% year-over-year while revenue grew 180%. The engineering team had no visibility into which teams, services, or features drove the cost. The finance team’s cloud bill showed 12,000 line items per month. When the VP of Engineering was asked “how much does the recommendation engine cost?”, the honest answer was “we have no idea.”

Three months of forensic analysis revealed: 38% of EC2 instances were running at under 10% CPU utilization. The company was paying on-demand prices for workloads that ran 24/7 (perfect candidates for reserved instances or savings plans). Twenty-six EBS volumes were orphaned — detached from any instance but still accruing charges. A development EKS cluster that was “temporary” had been running for 14 months. And the biggest surprise: cross-AZ data transfer for their Kubernetes pods cost $14,000 per month — a line item nobody had ever noticed because it was buried in the EC2 data transfer category.

After implementing the techniques in this module — right-sizing, committed use discounts, Kubecost for allocation, VPA for resource optimization, and spot instances for non-critical workloads — they reduced cloud spend by 42% ($3.36 million annually) without changing a single line of application code.


The Four Pillars of Cloud Cost Optimization

Section titled “The Four Pillars of Cloud Cost Optimization”
graph TD
classDef pillar fill:#f9f9f9,stroke:#333,stroke-width:2px;
classDef header fill:#e1f5fe,stroke:#0288d1,stroke-width:2px;
title[COST OPTIMIZATION FRAMEWORK]:::header
subgraph Optimization Process [ ]
direction LR
P1["1. VISIBILITY<br/>'Where does the money go?'<br/>- Cost allocation<br/>- Showback/chargeback<br/>- Kubecost/OpenCost"]:::pillar
P2["2. RIGHT-SIZING<br/>'Are resources matched to actual usage?'<br/>- CPU/memory utilization<br/>- VPA recommendations<br/>- Node right-sizing"]:::pillar
P3["3. RATE OPTIMIZATION<br/>'Are we paying the best price?'<br/>- Savings Plans/CUDs<br/>- Reserved Instances<br/>- Committed Use"]:::pillar
P4["4. ARCHITECTURAL<br/>'Can we change HOW we run things?'<br/>- Spot/preemptible instances<br/>- Topology-aware routing<br/>- Ephemeral environments<br/>- Orphaned resource cleanup"]:::pillar
P1 --> P2 --> P3 --> P4
end
note[Implementation order: 1 --> 2 --> 3 --> 4<br/>You can't optimize what you can't see.]
Optimization Process --> note

Pause and predict: If three teams share a single Kubernetes node, how can you determine who pays for what?

Pillar 1: Visibility with Kubecost and OpenCost

Section titled “Pillar 1: Visibility with Kubecost and OpenCost”

Kubernetes makes cost allocation hard because workloads share nodes. If three teams run pods on the same node, who pays for that node?

flowchart TD
classDef external fill:#fff3e0,stroke:#e65100,stroke-width:2px;
classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
B["Cloud Billing API<br/>(AWS CUR / GCP Billing / Azure Cost Export)"]:::external
M["Kubernetes Metrics<br/>(Prometheus / metrics-server)"]:::external
subgraph Kubecost ["Kubecost Allocation Engine"]
direction TB
S1["1. Get actual cloud cost per node"]
S2["2. Get resource usage per pod per node"]
S3["3. Allocate node cost to pods based on resource consumption"]
S4["4. Aggregate by namespace, label, team"]
S1 ~~~ S2 ~~~ S3 ~~~ S4
subgraph Example ["Example Scenario"]
direction TB
N["Node cost: $100/day (m7i.xlarge)"]
PA["Pod A uses 40% CPU, 30% memory<br/>Allocation: $100 * (0.4+0.3)/2 = $35/day"]
PB["Pod B uses 20% CPU, 50% memory<br/>Allocation: $100 * (0.2+0.5)/2 = $35/day"]
PC["Pod C uses 10% CPU, 10% memory<br/>Allocation: $100 * (0.1+0.1)/2 = $10/day"]
I["Idle: $100 - $35 - $35 - $10 = $20/day"]
N --> PA & PB & PC --> I
end
end
B --> Kubecost
M --> Kubecost
Terminal window
# Install Kubecost via Helm
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="YOUR_TOKEN" \
--set prometheus.server.retention="30d" \
--set kubecostProductConfigs.clusterName="prod-us-east-1"
# For multi-cluster, install the agent on each cluster
# and point to a central Kubecost instance
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set agent.enabled=true \
--set kubecostProductConfigs.clusterName="prod-eu-west-1" \
--set federatedETL.primaryCluster="https://kubecost.prod-us-east-1.internal"
# Access the Kubecost UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
Terminal window
# OpenCost is CNCF-supported and free
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm repo update
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.exporter.defaultClusterId="prod-us-east-1" \
--set opencost.ui.enabled=true
# Query the API for cost allocation
curl http://localhost:9003/allocation/compute \
--data-urlencode "window=7d" \
--data-urlencode "aggregate=namespace" \
--data-urlencode "accumulate=true" | jq '.data[0]'
# Label-based cost allocation strategy
# Every workload MUST have these labels for cost tracking
apiVersion: apps/v1
kind: Deployment
metadata:
name: recommendation-engine
namespace: ml-platform
labels:
team: ml-engineering
cost-center: CC-4200
product: recommendations
environment: production
spec:
template:
metadata:
labels:
team: ml-engineering
cost-center: CC-4200
product: recommendations
environment: production
spec:
containers:
- name: engine
image: company/rec-engine:v2.1.0
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
Terminal window
# Enforce required labels with Kyverno
kubectl apply -f - <<'EOF'
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-labels
spec:
validationFailureAction: Enforce
rules:
- name: check-cost-labels
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
- Job
validate:
message: "All workloads must have 'team', 'cost-center', and 'environment' labels"
pattern:
metadata:
labels:
team: "?*"
cost-center: "?*"
environment: "production|staging|development"
EOF

Stop and think: Why is over-provisioning a pod’s requested CPU worse than over-provisioning its limits?

The most common waste pattern in Kubernetes: developers set resource requests based on guesswork, then never revisit them.

Vertical Pod Autoscaler (VPA) for Right-Sizing

Section titled “Vertical Pod Autoscaler (VPA) for Right-Sizing”
graph TD
classDef before fill:#ffebee,stroke:#c62828,stroke-width:2px,text-align:left;
classDef after fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,text-align:left;
classDef note fill:#fff9c4,stroke:#fbc02d,stroke-width:1px;
subgraph Before["Before VPA analysis"]
B_CPU["Request: 4 CPU<br/>████<br/>████<br/>████<br/>Actual: 600m"]:::before
B_MEM["Request: 8Gi mem<br/>████████████████<br/>Actual: 1.5Gi"]:::before
end
subgraph After["After VPA recommendation"]
A_CPU["Request: 800m CPU<br/>██<br/>Actual usage: 600m"]:::after
A_MEM["Request: 2Gi mem<br/>████<br/>Actual usage: 1.5Gi"]:::after
end
Before -->|Savings: 80% CPU, 75% memory| After
N["Over-provisioning wastes money because K8s schedules based on<br/>REQUESTS, not actual usage. A pod requesting 4 CPU blocks<br/>4 CPU from being used by other pods, even if it only uses 600m."]:::note
After --> N
# VPA in recommendation mode (safe -- doesn't change anything)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: recommendation-engine-vpa
namespace: ml-platform
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: recommendation-engine
updatePolicy:
updateMode: "Off" # Only recommend, don't auto-apply
resourcePolicy:
containerPolicies:
- containerName: engine
minAllowed:
cpu: "100m"
memory: "256Mi"
maxAllowed:
cpu: "8"
memory: "16Gi"
Terminal window
# Check VPA recommendations
k get vpa recommendation-engine-vpa -n ml-platform -o yaml
# The recommendation section shows:
# - lowerBound: minimum safe resources
# - target: recommended resources
# - upperBound: maximum expected resources
# - uncappedTarget: ideal without min/max constraints
# Example output:
# recommendation:
# containerRecommendations:
# - containerName: engine
# lowerBound:
# cpu: 500m
# memory: 1Gi
# target:
# cpu: 800m
# memory: 2Gi
# upperBound:
# cpu: 1500m
# memory: 4Gi
# HPA with both CPU and custom metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 25 # Scale down max 25% at a time
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100 # Can double immediately under load
periodSeconds: 60
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Target 70% CPU utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75

graph TD
classDef provider fill:#eceff1,stroke:#607d8b,stroke-width:2px;
classDef strategy fill:#e3f2fd,stroke:#1565c0,stroke-width:2px;
subgraph AWS ["AWS (m7i.xlarge)"]
A1["On-Demand: $0.192/hr"]
A2["1yr Savings Plan: $0.121/hr (-37%)"]
A3["3yr Savings Plan: $0.077/hr (-60%)"]
A4["Spot: $0.058/hr (-70%, interruptible)"]
end:::provider
subgraph GCP ["GCP (n2-standard-4)"]
G1["On-Demand: $0.189/hr"]
G2["1yr CUD: $0.119/hr (-37%)"]
G3["3yr CUD: $0.085/hr (-55%)"]
G4["Spot: $0.057/hr (-70%)"]
G5["SUDs (automatic): $0.151/hr (-20%)"]
end:::provider
subgraph Azure ["Azure (D4s v5)"]
Z1["On-Demand: $0.192/hr"]
Z2["1yr Reserved: $0.124/hr (-35%)"]
Z3["3yr Reserved: $0.079/hr (-59%)"]
Z4["Spot: ~$0.038/hr (-80%)"]
end:::provider
subgraph STRATEGY ["Optimization Strategy"]
S1["Baseline (24/7 workloads) --> Savings Plan / CUD"]
S2["Bursty (predictable peaks) --> On-demand"]
S3["Fault-tolerant (batch, CI) --> Spot instances"]
S4["Development --> Spot + auto-shutdown"]
end:::strategy
Terminal window
# AWS: Analyze your usage to determine the right commitment
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type COMPUTE_SAVINGS_PLANS \
--term-in-years ONE_YEAR \
--payment-option NO_UPFRONT \
--lookback-period-in-days SIXTY_DAYS \
--output json | jq '.SavingsPlansPurchaseRecommendation'
# The output tells you:
# - Recommended hourly commitment (e.g., $12.50/hr)
# - Estimated monthly savings (e.g., $2,800/month)
# - Coverage percentage (e.g., 72% of on-demand usage)
# GCP: Analyze committed use
gcloud billing accounts describe BILLING_ACCOUNT_ID --format=json
# Use the GCP Billing Console > Committed use discounts > Analysis

Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) offer 60-90% discounts but can be interrupted with short notice. Kubernetes makes them practical by handling rescheduling automatically.

# EKS managed node group with Spot instances
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: prod-cluster
region: us-east-1
nodeGroups:
# On-demand for critical workloads
- name: on-demand-critical
instanceType: m7i.xlarge
desiredCapacity: 3
minSize: 3
maxSize: 6
labels:
node-type: on-demand
workload-class: critical
taints:
- key: workload-class
value: critical
effect: NoSchedule
# Spot for non-critical workloads
- name: spot-general
instanceTypes:
- m7i.xlarge
- m6i.xlarge
- m5.xlarge
- c7i.xlarge # Diversify instance types
spot: true
desiredCapacity: 5
minSize: 2
maxSize: 15
labels:
node-type: spot
workload-class: general
# Non-critical workload: prefers Spot, tolerates interruption
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
namespace: data-pipeline
spec:
replicas: 8
selector:
matchLabels:
app: batch-processor
template:
metadata:
labels:
app: batch-processor
spec:
# Prefer Spot nodes
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 90
preference:
matchExpressions:
- key: node-type
operator: In
values:
- spot
# Tolerate Spot taints
tolerations:
- key: "kubernetes.io/spot"
operator: "Exists"
effect: "NoSchedule"
# Handle graceful shutdown on Spot interruption
terminationGracePeriodSeconds: 120
containers:
- name: processor
image: company/batch-processor:v1.8.0
resources:
requests:
cpu: "1"
memory: "2Gi"
# Checkpoint progress periodically so interruption loses minimal work
env:
- name: CHECKPOINT_INTERVAL_SECONDS
value: "30"
# AWS Node Termination Handler (NTH)
# Detects Spot interruption notices and gracefully drains nodes
# Install via Helm:
# helm install aws-node-termination-handler \
# eks/aws-node-termination-handler \
# --namespace kube-system
# Karpenter: Automatically replaces interrupted Spot nodes
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: spot-pool
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m7i.xlarge
- m7i.2xlarge
- m6i.xlarge
- m6i.2xlarge
- c7i.xlarge
- r7i.xlarge
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 60s
limits:
cpu: "100"
memory: "400Gi"

Orphaned resources are cloud resources that are no longer attached to any active workload but continue accruing charges. They are the silent budget killer.

ResourceHow It Gets OrphanedMonthly Cost (typical)
Unattached EBS volumesPVC deleted, PV not reclaimed$8-$80 per volume
Unused Elastic IPsService deleted, EIP not released$3.65 each
Old EBS snapshotsBackup policy with no expiry$0.05/GB
Idle load balancersService deleted, LB remains$16-$25 each
Stopped EC2 instances”Paused” but never terminatedEBS costs continue
Orphaned NAT GatewaysVPC deleted, NAT GW remains$32 each
Unused RDS snapshotsManual snapshots accumulated$0.095/GB
Terminal window
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime,AZ:AvailabilityZone}' \
--output table
# Find unused Elastic IPs
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].{IP:PublicIp,AllocID:AllocationId}' \
--output table
# Find load balancers with no targets
for LB_ARN in $(aws elbv2 describe-load-balancers --query 'LoadBalancers[*].LoadBalancerArn' --output text); do
TG_COUNT=$(aws elbv2 describe-target-groups \
--load-balancer-arn $LB_ARN \
--query 'length(TargetGroups)' --output text)
if [ "$TG_COUNT" = "0" ]; then
LB_NAME=$(aws elbv2 describe-load-balancers \
--load-balancer-arns $LB_ARN \
--query 'LoadBalancers[0].LoadBalancerName' --output text)
echo "ORPHANED LB: $LB_NAME ($LB_ARN)"
fi
done
# Find EBS snapshots older than 90 days
NINETY_DAYS_AGO=$(date -u -v-90d +%Y-%m-%dT%H:%M:%S 2>/dev/null || date -u -d '90 days ago' +%Y-%m-%dT%H:%M:%S)
aws ec2 describe-snapshots \
--owner-ids self \
--query "Snapshots[?StartTime<='${NINETY_DAYS_AGO}'].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}" \
--output table
# CronJob to detect and report orphaned resources
apiVersion: batch/v1
kind: CronJob
metadata:
name: orphan-detector
namespace: finops
spec:
schedule: "0 8 * * 1" # Every Monday at 8 AM
jobTemplate:
spec:
template:
spec:
serviceAccountName: orphan-detector
containers:
- name: detector
image: company/orphan-detector:v1.2.0
env:
- name: SLACK_WEBHOOK
valueFrom:
secretKeyRef:
name: slack-webhook
key: url
- name: STALE_THRESHOLD_DAYS
value: "30"
command:
- /bin/sh
- -c
- |
echo "Scanning for orphaned resources..."
# Detect unattached volumes
VOLUMES=$(aws ec2 describe-volumes --filters Name=status,Values=available \
--query 'length(Volumes)' --output text)
# Detect unused EIPs
EIPS=$(aws ec2 describe-addresses \
--query 'length(Addresses[?AssociationId==null])' --output text)
# Send report to Slack
curl -X POST "$SLACK_WEBHOOK" -H 'Content-type: application/json' \
--data "{\"text\":\"Orphan Report: $VOLUMES unattached volumes, $EIPS unused EIPs\"}"
restartPolicy: OnFailure

  1. Kubernetes clusters typically run at 30-50% resource utilization according to data from Kubecost across thousands of clusters. This means 50-70% of compute spend is wasted on idle resources. The primary cause is over-provisioned resource requests: developers set CPU and memory requests based on worst-case scenarios and never revisit them. VPA in recommendation mode can identify right-sizing opportunities without any risk.

  2. AWS Spot instances have been interrupted less than 5% of the time for the most popular instance types (m5.xlarge, m6i.xlarge) in US regions, based on the AWS Spot Instance Advisor. The interruption rate varies dramatically by instance type and region: r5.8xlarge in ap-southeast-1 might see 15-20% interruption rate, while m7i.xlarge in us-east-1 sees under 3%. Diversifying across instance types and AZs is the key to reliable Spot usage.

  3. Cross-AZ data transfer is one of the top 5 cost categories for most Kubernetes deployments on AWS. A company running 20 microservices with 100 pods across 3 AZs can easily spend $2,000-$5,000/month on cross-AZ traffic alone. GCP made cross-zone traffic free in 2022. AWS has not followed suit, making topology-aware routing a significant cost optimization lever for AWS-based Kubernetes deployments.

  4. OpenCost became a CNCF Sandbox project in 2022 and reached Incubation status in 2024. It was originally developed by Kubecost as the open-source core of their commercial product. The CNCF adoption signaled that Kubernetes cost management was becoming a first-class concern alongside security and observability. OpenCost’s cost allocation API is now integrated into several commercial FinOps platforms.


MistakeWhy It HappensHow to Fix It
Setting resource requests to match limits”Same value means guaranteed QoS”Requests should reflect typical usage, limits reflect peak. VPA recommendations help find the right values. Over-requesting wastes money.
Buying Savings Plans based on current usage”We’re using $10K/month now, commit to $10K”Usage fluctuates. Commit to 60-70% of your average usage. The rest stays on-demand for flexibility. Over-commitment is worse than no commitment.
Running dev/staging clusters 24/7”Someone might need them on weekends”Implement auto-shutdown for non-production clusters. Scale to zero outside business hours. A $3,000/month staging cluster running only business hours costs $900.
Not diversifying Spot instance types”We need m7i.xlarge specifically”Spot pools with a single instance type have higher interruption rates. Specify 4-6 compatible instance types. Karpenter handles this automatically.
Ignoring namespace-level resource quotas”Trust developers to be reasonable”Without quotas, one team can consume the entire cluster. Set ResourceQuotas per namespace based on team budgets.
No cost alerts or budgets”We check the bill monthly”By the time you see the monthly bill, the damage is done. Set budget alerts at 50%, 80%, and 100% thresholds for each account.
Deleting Spot nodes during business hours”Karpenter consolidated idle nodes”Configure consolidation windows to avoid Spot node replacement during peak hours. Use disruption.consolidateAfter to delay.
Not accounting for EBS costs separately from EC2”Compute is our biggest cost”EBS volumes persist after pods are deleted. Monitor PVC lifecycle and implement reclaimPolicy: Delete for non-production volumes.

1. Your CFO hands you the monthly AWS bill, pointing to a single $45,000 line item for EC2 instances in your production EKS cluster. She asks you to split this cost between the Data Science team and the Frontend team. Why is this impossible to do accurately using just the AWS Billing Console?

Cloud billing consoles only show costs per infrastructure resource (like EC2 instances or EBS volumes), not per Kubernetes workload. Because Kubernetes schedules pods from multiple teams onto the same shared nodes, a single $500/month EC2 instance might be running three Data Science jobs and two Frontend APIs simultaneously. To accurately split this cost, you need a tool like Kubecost or OpenCost that merges the billing data (the node’s actual price) with Kubernetes metrics (how much CPU and memory each team’s pods consumed on that specific node) and aggregates it via namespace or label. Without this workload-level correlation, any cost splitting is just a blind guess.

2. During a cost review, you notice the `recommendation-engine` deployment is consistently using only 15% of its requested CPU, while traffic patterns are highly unpredictable. Your junior engineer suggests implementing VPA in auto-update mode to fix the waste. Why might a combination of VPA (in recommendation mode) and HPA be a better financial and architectural decision?

If you use VPA in auto-update mode on an unpredictable workload, it will aggressively scale down the pod’s CPU requests during quiet periods, which can lead to severe CPU throttling and performance degradation when traffic spikes suddenly. Instead, you should use VPA in “Off” (recommendation) mode to determine the optimal baseline size for a single pod based on historical data. Then, implement HPA to dynamically add or remove those correctly-sized pods based on real-time traffic demand. By right-sizing the individual pods with VPA insights and scaling their count horizontally with HPA, you eliminate the baseline waste of over-provisioning while maintaining the elasticity needed to handle sudden traffic peaks gracefully.

3. You have $10,000/month in on-demand EC2 usage that runs 24/7. Should you commit to a $10,000/month Savings Plan?

No. Commit to $6,000-$7,000 (60-70% of current usage). Savings Plans commit you to a minimum hourly spend regardless of actual usage. If your usage drops (due to right-sizing, traffic changes, or migration), you still pay the committed amount. The remaining 30-40% stays on-demand, giving you flexibility. Over time, as you’re confident in your baseline, you can increase the commitment. Also consider: some of that $10K might be better served by Spot instances (for fault-tolerant workloads), which provide deeper discounts without long-term commitment. The optimal strategy is often: 60% Savings Plans + 20% Spot + 20% On-demand.

4. Your team wants to migrate a legacy monolithic application to a Spot instance node group to save 70% on compute costs. The application takes 5 minutes to gracefully shut down, requires persistent local disk state, and runs as a single replica. Why will this migration result in a catastrophic production outage?

Spot instances can be reclaimed by the cloud provider with only a 2-minute interruption warning. Since the legacy monolith takes 5 minutes to shut down, it will be forcefully terminated before it finishes its shutdown sequence, leading to data corruption or incomplete transactions. Furthermore, because it relies on local disk state and runs as a single replica, the entire application will go offline and lose its state when the underlying node disappears. Spot instances are only safe for stateless, fault-tolerant workloads that can gracefully terminate within 2 minutes and have multiple replicas distributed across different nodes to ensure continuous availability during interruptions.

5. A development EKS cluster costs $3,000/month and is used Monday-Friday, 9AM-6PM. How much can you save?

Business hours represent roughly 45 hours per week out of 168 total hours (27% of the time). If you scale the cluster to zero (or minimum) outside business hours, you save approximately 73% of compute costs: $3,000 x 0.73 = $2,190/month saved. Implementation options: (a) Karpenter with consolidation + scheduled scaling to zero, (b) a CronJob that scales node groups to 0 at 6PM and back to desired count at 9AM, (c) tools like kube-downscaler that annotate deployments with shutdown schedules. Additional savings: shut down NAT Gateways and load balancers when the cluster is empty. Caveat: factor in the 10-15 minute spin-up time each morning.

6. You recently deleted a large development namespace containing StatefulSets, LoadBalancer services, and hundreds of pods. A month later, your cloud bill shows an unexpected $800 charge associated with the deleted environment. What specific Kubernetes architectural mechanisms likely caused these resources to be orphaned and continue accruing charges?

When deleting Kubernetes resources, the underlying cloud infrastructure isn’t always automatically cleaned up due to default retention policies. The most likely culprit for the $800 charge is unattached EBS volumes left behind by the StatefulSets, because the default StorageClass often uses reclaimPolicy: Retain, meaning the cloud disk persists even after the PersistentVolumeClaim is deleted. Additionally, if the LoadBalancer services were forcefully deleted or the namespace was abruptly terminated without allowing controllers to finalize cleanup, the cloud provider’s Load Balancers and associated Elastic IPs would remain active. To prevent this, you must configure reclaimPolicy: Delete for non-critical storage and implement automated scanning tools to detect and alert on unattached cloud resources.


Hands-On Exercise: Cost Optimization Audit

Section titled “Hands-On Exercise: Cost Optimization Audit”

In this exercise, you will perform a cost optimization audit on a Kubernetes cluster.

  • A running Kubernetes cluster (kind, minikube, or cloud)
  • kubectl installed
  • Metrics server installed (for VPA)

Task 1: Identify Over-Provisioned Workloads

Section titled “Task 1: Identify Over-Provisioned Workloads”

Deploy some intentionally over-provisioned workloads and use kubectl to identify waste.

Solution
Terminal window
# Create a kind cluster with metrics server
kind create cluster --name cost-lab
# Install metrics server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Patch for kind (insecure kubelet)
kubectl patch deployment metrics-server -n kube-system \
--type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Wait for metrics server
sleep 30
kubectl wait --for=condition=Ready pod -l k8s-app=metrics-server -n kube-system --timeout=120s
# Deploy over-provisioned workloads
kubectl create namespace cost-audit
kubectl apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server-wasteful
namespace: cost-audit
labels:
team: backend
cost-center: CC-1000
spec:
replicas: 5
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
team: backend
spec:
containers:
- name: api
image: nginx:stable
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker-wasteful
namespace: cost-audit
labels:
team: data
cost-center: CC-2000
spec:
replicas: 3
selector:
matchLabels:
app: worker
template:
metadata:
labels:
app: worker
team: data
spec:
containers:
- name: worker
image: nginx:stable
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
EOF
# Wait for pods (some will be Pending due to insufficient resources)
sleep 15
# Check actual usage vs requests
echo "=== Pod Resource Usage vs Requests ==="
kubectl top pods -n cost-audit 2>/dev/null || echo "Metrics not ready yet, wait 60s"
# Compare requests to actual usage
kubectl get pods -n cost-audit -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
STATUS:.status.phase
Solution
Terminal window
# Calculate total requested vs actual
echo "=== Requested Resources ==="
echo "api-server: 5 pods x 2 CPU = 10 CPU requested"
echo "api-server: 5 pods x 4Gi = 20Gi memory requested"
echo "worker: 3 pods x 1 CPU = 3 CPU requested"
echo "worker: 3 pods x 2Gi = 6Gi memory requested"
echo ""
echo "TOTAL REQUESTED: 13 CPU, 26Gi memory"
echo ""
echo "At m7i.xlarge pricing ($0.192/hr, 4 CPU, 16Gi):"
echo "13 CPU / 4 CPU per node = 4 nodes needed (by CPU)"
echo "26Gi / 16Gi per node = 2 nodes needed (by memory)"
echo "Limiting factor: CPU (4 nodes)"
echo ""
echo "Cost: 4 nodes x $0.192/hr x 730 hours = $561/month"
echo ""
echo "=== Actual Usage (nginx idle) ==="
echo "Each nginx pod uses ~5m CPU and ~5Mi memory"
echo "Total actual: ~40m CPU, ~40Mi memory"
echo "Actual need: 1 node (easily)"
echo ""
echo "WASTE: $561 - $140 (1 node) = $421/month (75% waste)"
echo ""
echo "=== VPA Recommendations ==="
echo "api-server: request 50m CPU, 64Mi memory (from 2 CPU, 4Gi)"
echo "worker: request 50m CPU, 64Mi memory (from 1 CPU, 2Gi)"
Solution
Terminal window
# Right-size the deployments based on "VPA recommendations"
kubectl patch deployment api-server-wasteful -n cost-audit --type=json -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "100m"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "500m"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"}
]'
kubectl patch deployment worker-wasteful -n cost-audit --type=json -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "100m"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "500m"},
{"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"}
]'
# Verify all pods are now Running (they fit on fewer nodes)
kubectl get pods -n cost-audit
echo "=== After Right-Sizing ==="
echo "api-server: 5 pods x 100m CPU = 500m CPU requested"
echo "worker: 3 pods x 100m CPU = 300m CPU requested"
echo "TOTAL: 800m CPU, ~1Gi memory"
echo "Fits on 1 node easily. Savings: 75%"

Write a cost optimization report for a fictional team based on the audit findings.

Solution
# Cost Optimization Report: Cost-Audit Namespace
## Executive Summary
Current monthly spend: ~$561 (4 nodes at on-demand pricing)
Optimized monthly spend: ~$140 (1 node at on-demand pricing)
Potential savings: $421/month ($5,052/year) -- 75% reduction
## Findings
### 1. Over-Provisioned Resources (Impact: $421/month)
- api-server requests 2 CPU per pod but uses ~5m (0.25%)
- worker requests 1 CPU per pod but uses ~5m (0.5%)
- Total CPU requested: 13 cores. Total used: 40 millicores.
- Recommendation: Reduce requests to 100m CPU, 128Mi memory
### 2. No Horizontal Pod Autoscaler (Impact: TBD)
- api-server runs 5 replicas constantly
- Likely needs 2 replicas at baseline, scale to 5 during peak
- Recommendation: Add HPA with min=2, max=8, target CPU=70%
- Estimated additional savings: 40% during off-peak
### 3. On-Demand Pricing (Impact: ~$50/month)
- Workloads run 24/7, perfect for Savings Plans
- With 1-year Compute Savings Plan: $140 * 0.63 = $88/month
- Savings: $52/month
## Recommended Actions (priority order)
1. Apply right-sized resource requests (immediate, $421/month)
2. Add HPA for api-server (1 day, ~$30/month additional)
3. Purchase Savings Plan for baseline compute (1 week, ~$50/month)
## Total Estimated Savings: $501/month ($6,012/year)
Terminal window
kind delete cluster --name cost-lab
  • Over-provisioned workloads deployed and identified
  • Waste quantified in dollar terms
  • Right-sized resource requests applied
  • All pods running after right-sizing (no OOM or throttling)
  • Cost optimization report includes specific dollar savings

Module 8.9: Large-Scale Observability & Telemetry — You can see where the money goes. Now learn how to see where the problems are. Multi-cluster Prometheus with Thanos, OpenTelemetry at scale, and the art of monitoring Kubernetes without drowning in data.