Module 8.8: Cloud Cost Optimization (Advanced)
Complexity:
[MEDIUM]Time to Complete: 2 hours
Prerequisites: Basic understanding of Kubernetes resource requests/limits and cloud billing concepts
Track: Advanced Cloud Operations
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Implement FinOps practices with cloud-native cost allocation tagging, showback, and chargeback mechanisms
- Configure Kubernetes cost visibility using Kubecost, OpenCost, or cloud-native cost tools across multi-cluster environments
- Optimize compute costs using reserved instances, committed use discounts, Spot/preemptible instances, and right-sizing
- Design automated cost anomaly detection and budget alerting pipelines that trigger remediation actions
Why This Module Matters
Section titled “Why This Module Matters”Q1 2024. A Series C startup. $8 million annual cloud spend.
The CFO called an all-hands meeting. Cloud costs had grown 340% year-over-year while revenue grew 180%. The engineering team had no visibility into which teams, services, or features drove the cost. The finance team’s cloud bill showed 12,000 line items per month. When the VP of Engineering was asked “how much does the recommendation engine cost?”, the honest answer was “we have no idea.”
Three months of forensic analysis revealed: 38% of EC2 instances were running at under 10% CPU utilization. The company was paying on-demand prices for workloads that ran 24/7 (perfect candidates for reserved instances or savings plans). Twenty-six EBS volumes were orphaned — detached from any instance but still accruing charges. A development EKS cluster that was “temporary” had been running for 14 months. And the biggest surprise: cross-AZ data transfer for their Kubernetes pods cost $14,000 per month — a line item nobody had ever noticed because it was buried in the EC2 data transfer category.
After implementing the techniques in this module — right-sizing, committed use discounts, Kubecost for allocation, VPA for resource optimization, and spot instances for non-critical workloads — they reduced cloud spend by 42% ($3.36 million annually) without changing a single line of application code.
The Four Pillars of Cloud Cost Optimization
Section titled “The Four Pillars of Cloud Cost Optimization”graph TD classDef pillar fill:#f9f9f9,stroke:#333,stroke-width:2px; classDef header fill:#e1f5fe,stroke:#0288d1,stroke-width:2px;
title[COST OPTIMIZATION FRAMEWORK]:::header
subgraph Optimization Process [ ] direction LR P1["1. VISIBILITY<br/>'Where does the money go?'<br/>- Cost allocation<br/>- Showback/chargeback<br/>- Kubecost/OpenCost"]:::pillar P2["2. RIGHT-SIZING<br/>'Are resources matched to actual usage?'<br/>- CPU/memory utilization<br/>- VPA recommendations<br/>- Node right-sizing"]:::pillar P3["3. RATE OPTIMIZATION<br/>'Are we paying the best price?'<br/>- Savings Plans/CUDs<br/>- Reserved Instances<br/>- Committed Use"]:::pillar P4["4. ARCHITECTURAL<br/>'Can we change HOW we run things?'<br/>- Spot/preemptible instances<br/>- Topology-aware routing<br/>- Ephemeral environments<br/>- Orphaned resource cleanup"]:::pillar
P1 --> P2 --> P3 --> P4 end
note[Implementation order: 1 --> 2 --> 3 --> 4<br/>You can't optimize what you can't see.] Optimization Process --> notePause and predict: If three teams share a single Kubernetes node, how can you determine who pays for what?
Pillar 1: Visibility with Kubecost and OpenCost
Section titled “Pillar 1: Visibility with Kubecost and OpenCost”Kubernetes makes cost allocation hard because workloads share nodes. If three teams run pods on the same node, who pays for that node?
Kubecost Architecture
Section titled “Kubecost Architecture”flowchart TD classDef external fill:#fff3e0,stroke:#e65100,stroke-width:2px; classDef core fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
B["Cloud Billing API<br/>(AWS CUR / GCP Billing / Azure Cost Export)"]:::external M["Kubernetes Metrics<br/>(Prometheus / metrics-server)"]:::external
subgraph Kubecost ["Kubecost Allocation Engine"] direction TB S1["1. Get actual cloud cost per node"] S2["2. Get resource usage per pod per node"] S3["3. Allocate node cost to pods based on resource consumption"] S4["4. Aggregate by namespace, label, team"]
S1 ~~~ S2 ~~~ S3 ~~~ S4
subgraph Example ["Example Scenario"] direction TB N["Node cost: $100/day (m7i.xlarge)"] PA["Pod A uses 40% CPU, 30% memory<br/>Allocation: $100 * (0.4+0.3)/2 = $35/day"] PB["Pod B uses 20% CPU, 50% memory<br/>Allocation: $100 * (0.2+0.5)/2 = $35/day"] PC["Pod C uses 10% CPU, 10% memory<br/>Allocation: $100 * (0.1+0.1)/2 = $10/day"] I["Idle: $100 - $35 - $35 - $10 = $20/day"]
N --> PA & PB & PC --> I end end
B --> Kubecost M --> KubecostInstalling Kubecost
Section titled “Installing Kubecost”# Install Kubecost via Helmhelm repo add kubecost https://kubecost.github.io/cost-analyzer/helm repo update
helm install kubecost kubecost/cost-analyzer \ --namespace kubecost \ --create-namespace \ --set kubecostToken="YOUR_TOKEN" \ --set prometheus.server.retention="30d" \ --set kubecostProductConfigs.clusterName="prod-us-east-1"
# For multi-cluster, install the agent on each cluster# and point to a central Kubecost instancehelm install kubecost kubecost/cost-analyzer \ --namespace kubecost \ --create-namespace \ --set agent.enabled=true \ --set kubecostProductConfigs.clusterName="prod-eu-west-1" \ --set federatedETL.primaryCluster="https://kubecost.prod-us-east-1.internal"
# Access the Kubecost UIkubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090OpenCost: The Open-Source Alternative
Section titled “OpenCost: The Open-Source Alternative”# OpenCost is CNCF-supported and freehelm repo add opencost https://opencost.github.io/opencost-helm-charthelm repo update
helm install opencost opencost/opencost \ --namespace opencost \ --create-namespace \ --set opencost.exporter.defaultClusterId="prod-us-east-1" \ --set opencost.ui.enabled=true
# Query the API for cost allocationcurl http://localhost:9003/allocation/compute \ --data-urlencode "window=7d" \ --data-urlencode "aggregate=namespace" \ --data-urlencode "accumulate=true" | jq '.data[0]'Multi-Tenant Cost Allocation
Section titled “Multi-Tenant Cost Allocation”# Label-based cost allocation strategy# Every workload MUST have these labels for cost trackingapiVersion: apps/v1kind: Deploymentmetadata: name: recommendation-engine namespace: ml-platform labels: team: ml-engineering cost-center: CC-4200 product: recommendations environment: productionspec: template: metadata: labels: team: ml-engineering cost-center: CC-4200 product: recommendations environment: production spec: containers: - name: engine image: company/rec-engine:v2.1.0 resources: requests: cpu: "2" memory: "4Gi" limits: cpu: "4" memory: "8Gi"# Enforce required labels with Kyvernokubectl apply -f - <<'EOF'apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-cost-labelsspec: validationFailureAction: Enforce rules: - name: check-cost-labels match: any: - resources: kinds: - Deployment - StatefulSet - Job validate: message: "All workloads must have 'team', 'cost-center', and 'environment' labels" pattern: metadata: labels: team: "?*" cost-center: "?*" environment: "production|staging|development"EOFStop and think: Why is over-provisioning a pod’s requested CPU worse than over-provisioning its limits?
Pillar 2: Right-Sizing with VPA and HPA
Section titled “Pillar 2: Right-Sizing with VPA and HPA”The most common waste pattern in Kubernetes: developers set resource requests based on guesswork, then never revisit them.
Vertical Pod Autoscaler (VPA) for Right-Sizing
Section titled “Vertical Pod Autoscaler (VPA) for Right-Sizing”graph TD classDef before fill:#ffebee,stroke:#c62828,stroke-width:2px,text-align:left; classDef after fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,text-align:left; classDef note fill:#fff9c4,stroke:#fbc02d,stroke-width:1px;
subgraph Before["Before VPA analysis"] B_CPU["Request: 4 CPU<br/>████<br/>████<br/>████<br/>Actual: 600m"]:::before B_MEM["Request: 8Gi mem<br/>████████████████<br/>Actual: 1.5Gi"]:::before end
subgraph After["After VPA recommendation"] A_CPU["Request: 800m CPU<br/>██<br/>Actual usage: 600m"]:::after A_MEM["Request: 2Gi mem<br/>████<br/>Actual usage: 1.5Gi"]:::after end
Before -->|Savings: 80% CPU, 75% memory| After
N["Over-provisioning wastes money because K8s schedules based on<br/>REQUESTS, not actual usage. A pod requesting 4 CPU blocks<br/>4 CPU from being used by other pods, even if it only uses 600m."]:::note After --> N# VPA in recommendation mode (safe -- doesn't change anything)apiVersion: autoscaling.k8s.io/v1kind: VerticalPodAutoscalermetadata: name: recommendation-engine-vpa namespace: ml-platformspec: targetRef: apiVersion: apps/v1 kind: Deployment name: recommendation-engine updatePolicy: updateMode: "Off" # Only recommend, don't auto-apply resourcePolicy: containerPolicies: - containerName: engine minAllowed: cpu: "100m" memory: "256Mi" maxAllowed: cpu: "8" memory: "16Gi"# Check VPA recommendationsk get vpa recommendation-engine-vpa -n ml-platform -o yaml
# The recommendation section shows:# - lowerBound: minimum safe resources# - target: recommended resources# - upperBound: maximum expected resources# - uncappedTarget: ideal without min/max constraints
# Example output:# recommendation:# containerRecommendations:# - containerName: engine# lowerBound:# cpu: 500m# memory: 1Gi# target:# cpu: 800m# memory: 2Gi# upperBound:# cpu: 1500m# memory: 4GiHPA for Cost-Efficient Scaling
Section titled “HPA for Cost-Efficient Scaling”# HPA with both CPU and custom metricsapiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: api-server-hpa namespace: productionspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-server minReplicas: 2 maxReplicas: 20 behavior: scaleDown: stabilizationWindowSeconds: 300 # Wait 5 min before scaling down policies: - type: Percent value: 25 # Scale down max 25% at a time periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 30 policies: - type: Percent value: 100 # Can double immediately under load periodSeconds: 60 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Target 70% CPU utilization - type: Resource resource: name: memory target: type: Utilization averageUtilization: 75Pillar 3: Rate Optimization
Section titled “Pillar 3: Rate Optimization”Savings Plans and Committed Use Discounts
Section titled “Savings Plans and Committed Use Discounts”graph TD classDef provider fill:#eceff1,stroke:#607d8b,stroke-width:2px; classDef strategy fill:#e3f2fd,stroke:#1565c0,stroke-width:2px;
subgraph AWS ["AWS (m7i.xlarge)"] A1["On-Demand: $0.192/hr"] A2["1yr Savings Plan: $0.121/hr (-37%)"] A3["3yr Savings Plan: $0.077/hr (-60%)"] A4["Spot: $0.058/hr (-70%, interruptible)"] end:::provider
subgraph GCP ["GCP (n2-standard-4)"] G1["On-Demand: $0.189/hr"] G2["1yr CUD: $0.119/hr (-37%)"] G3["3yr CUD: $0.085/hr (-55%)"] G4["Spot: $0.057/hr (-70%)"] G5["SUDs (automatic): $0.151/hr (-20%)"] end:::provider
subgraph Azure ["Azure (D4s v5)"] Z1["On-Demand: $0.192/hr"] Z2["1yr Reserved: $0.124/hr (-35%)"] Z3["3yr Reserved: $0.079/hr (-59%)"] Z4["Spot: ~$0.038/hr (-80%)"] end:::provider
subgraph STRATEGY ["Optimization Strategy"] S1["Baseline (24/7 workloads) --> Savings Plan / CUD"] S2["Bursty (predictable peaks) --> On-demand"] S3["Fault-tolerant (batch, CI) --> Spot instances"] S4["Development --> Spot + auto-shutdown"] end:::strategyCalculating Your Savings Plan Commitment
Section titled “Calculating Your Savings Plan Commitment”# AWS: Analyze your usage to determine the right commitmentaws ce get-savings-plans-purchase-recommendation \ --savings-plans-type COMPUTE_SAVINGS_PLANS \ --term-in-years ONE_YEAR \ --payment-option NO_UPFRONT \ --lookback-period-in-days SIXTY_DAYS \ --output json | jq '.SavingsPlansPurchaseRecommendation'
# The output tells you:# - Recommended hourly commitment (e.g., $12.50/hr)# - Estimated monthly savings (e.g., $2,800/month)# - Coverage percentage (e.g., 72% of on-demand usage)
# GCP: Analyze committed usegcloud billing accounts describe BILLING_ACCOUNT_ID --format=json# Use the GCP Billing Console > Committed use discounts > AnalysisPillar 4: Spot Instance Lifecycle
Section titled “Pillar 4: Spot Instance Lifecycle”Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) offer 60-90% discounts but can be interrupted with short notice. Kubernetes makes them practical by handling rescheduling automatically.
Spot-Friendly Node Groups
Section titled “Spot-Friendly Node Groups”# EKS managed node group with Spot instancesapiVersion: eksctl.io/v1alpha5kind: ClusterConfigmetadata: name: prod-cluster region: us-east-1nodeGroups: # On-demand for critical workloads - name: on-demand-critical instanceType: m7i.xlarge desiredCapacity: 3 minSize: 3 maxSize: 6 labels: node-type: on-demand workload-class: critical taints: - key: workload-class value: critical effect: NoSchedule
# Spot for non-critical workloads - name: spot-general instanceTypes: - m7i.xlarge - m6i.xlarge - m5.xlarge - c7i.xlarge # Diversify instance types spot: true desiredCapacity: 5 minSize: 2 maxSize: 15 labels: node-type: spot workload-class: generalPod Scheduling for Spot
Section titled “Pod Scheduling for Spot”# Non-critical workload: prefers Spot, tolerates interruptionapiVersion: apps/v1kind: Deploymentmetadata: name: batch-processor namespace: data-pipelinespec: replicas: 8 selector: matchLabels: app: batch-processor template: metadata: labels: app: batch-processor spec: # Prefer Spot nodes affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 90 preference: matchExpressions: - key: node-type operator: In values: - spot # Tolerate Spot taints tolerations: - key: "kubernetes.io/spot" operator: "Exists" effect: "NoSchedule" # Handle graceful shutdown on Spot interruption terminationGracePeriodSeconds: 120 containers: - name: processor image: company/batch-processor:v1.8.0 resources: requests: cpu: "1" memory: "2Gi" # Checkpoint progress periodically so interruption loses minimal work env: - name: CHECKPOINT_INTERVAL_SECONDS value: "30"Spot Interruption Handling
Section titled “Spot Interruption Handling”# AWS Node Termination Handler (NTH)# Detects Spot interruption notices and gracefully drains nodes# Install via Helm:# helm install aws-node-termination-handler \# eks/aws-node-termination-handler \# --namespace kube-system
# Karpenter: Automatically replaces interrupted Spot nodesapiVersion: karpenter.sh/v1kind: NodePoolmetadata: name: spot-poolspec: template: spec: requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: node.kubernetes.io/instance-type operator: In values: - m7i.xlarge - m7i.2xlarge - m6i.xlarge - m6i.2xlarge - c7i.xlarge - r7i.xlarge nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: default disruption: consolidationPolicy: WhenEmpty consolidateAfter: 60s limits: cpu: "100" memory: "400Gi"Orphaned Resource Cleanup
Section titled “Orphaned Resource Cleanup”Orphaned resources are cloud resources that are no longer attached to any active workload but continue accruing charges. They are the silent budget killer.
Common Orphaned Resources
Section titled “Common Orphaned Resources”| Resource | How It Gets Orphaned | Monthly Cost (typical) |
|---|---|---|
| Unattached EBS volumes | PVC deleted, PV not reclaimed | $8-$80 per volume |
| Unused Elastic IPs | Service deleted, EIP not released | $3.65 each |
| Old EBS snapshots | Backup policy with no expiry | $0.05/GB |
| Idle load balancers | Service deleted, LB remains | $16-$25 each |
| Stopped EC2 instances | ”Paused” but never terminated | EBS costs continue |
| Orphaned NAT Gateways | VPC deleted, NAT GW remains | $32 each |
| Unused RDS snapshots | Manual snapshots accumulated | $0.095/GB |
Automated Cleanup
Section titled “Automated Cleanup”# Find unattached EBS volumesaws ec2 describe-volumes \ --filters Name=status,Values=available \ --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime,AZ:AvailabilityZone}' \ --output table
# Find unused Elastic IPsaws ec2 describe-addresses \ --query 'Addresses[?AssociationId==null].{IP:PublicIp,AllocID:AllocationId}' \ --output table
# Find load balancers with no targetsfor LB_ARN in $(aws elbv2 describe-load-balancers --query 'LoadBalancers[*].LoadBalancerArn' --output text); do TG_COUNT=$(aws elbv2 describe-target-groups \ --load-balancer-arn $LB_ARN \ --query 'length(TargetGroups)' --output text) if [ "$TG_COUNT" = "0" ]; then LB_NAME=$(aws elbv2 describe-load-balancers \ --load-balancer-arns $LB_ARN \ --query 'LoadBalancers[0].LoadBalancerName' --output text) echo "ORPHANED LB: $LB_NAME ($LB_ARN)" fidone
# Find EBS snapshots older than 90 daysNINETY_DAYS_AGO=$(date -u -v-90d +%Y-%m-%dT%H:%M:%S 2>/dev/null || date -u -d '90 days ago' +%Y-%m-%dT%H:%M:%S)aws ec2 describe-snapshots \ --owner-ids self \ --query "Snapshots[?StartTime<='${NINETY_DAYS_AGO}'].{ID:SnapshotId,Size:VolumeSize,Date:StartTime}" \ --output table# CronJob to detect and report orphaned resourcesapiVersion: batch/v1kind: CronJobmetadata: name: orphan-detector namespace: finopsspec: schedule: "0 8 * * 1" # Every Monday at 8 AM jobTemplate: spec: template: spec: serviceAccountName: orphan-detector containers: - name: detector image: company/orphan-detector:v1.2.0 env: - name: SLACK_WEBHOOK valueFrom: secretKeyRef: name: slack-webhook key: url - name: STALE_THRESHOLD_DAYS value: "30" command: - /bin/sh - -c - | echo "Scanning for orphaned resources..." # Detect unattached volumes VOLUMES=$(aws ec2 describe-volumes --filters Name=status,Values=available \ --query 'length(Volumes)' --output text) # Detect unused EIPs EIPS=$(aws ec2 describe-addresses \ --query 'length(Addresses[?AssociationId==null])' --output text) # Send report to Slack curl -X POST "$SLACK_WEBHOOK" -H 'Content-type: application/json' \ --data "{\"text\":\"Orphan Report: $VOLUMES unattached volumes, $EIPS unused EIPs\"}" restartPolicy: OnFailureDid You Know?
Section titled “Did You Know?”-
Kubernetes clusters typically run at 30-50% resource utilization according to data from Kubecost across thousands of clusters. This means 50-70% of compute spend is wasted on idle resources. The primary cause is over-provisioned resource requests: developers set CPU and memory requests based on worst-case scenarios and never revisit them. VPA in recommendation mode can identify right-sizing opportunities without any risk.
-
AWS Spot instances have been interrupted less than 5% of the time for the most popular instance types (m5.xlarge, m6i.xlarge) in US regions, based on the AWS Spot Instance Advisor. The interruption rate varies dramatically by instance type and region: r5.8xlarge in ap-southeast-1 might see 15-20% interruption rate, while m7i.xlarge in us-east-1 sees under 3%. Diversifying across instance types and AZs is the key to reliable Spot usage.
-
Cross-AZ data transfer is one of the top 5 cost categories for most Kubernetes deployments on AWS. A company running 20 microservices with 100 pods across 3 AZs can easily spend $2,000-$5,000/month on cross-AZ traffic alone. GCP made cross-zone traffic free in 2022. AWS has not followed suit, making topology-aware routing a significant cost optimization lever for AWS-based Kubernetes deployments.
-
OpenCost became a CNCF Sandbox project in 2022 and reached Incubation status in 2024. It was originally developed by Kubecost as the open-source core of their commercial product. The CNCF adoption signaled that Kubernetes cost management was becoming a first-class concern alongside security and observability. OpenCost’s cost allocation API is now integrated into several commercial FinOps platforms.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Setting resource requests to match limits | ”Same value means guaranteed QoS” | Requests should reflect typical usage, limits reflect peak. VPA recommendations help find the right values. Over-requesting wastes money. |
| Buying Savings Plans based on current usage | ”We’re using $10K/month now, commit to $10K” | Usage fluctuates. Commit to 60-70% of your average usage. The rest stays on-demand for flexibility. Over-commitment is worse than no commitment. |
| Running dev/staging clusters 24/7 | ”Someone might need them on weekends” | Implement auto-shutdown for non-production clusters. Scale to zero outside business hours. A $3,000/month staging cluster running only business hours costs $900. |
| Not diversifying Spot instance types | ”We need m7i.xlarge specifically” | Spot pools with a single instance type have higher interruption rates. Specify 4-6 compatible instance types. Karpenter handles this automatically. |
| Ignoring namespace-level resource quotas | ”Trust developers to be reasonable” | Without quotas, one team can consume the entire cluster. Set ResourceQuotas per namespace based on team budgets. |
| No cost alerts or budgets | ”We check the bill monthly” | By the time you see the monthly bill, the damage is done. Set budget alerts at 50%, 80%, and 100% thresholds for each account. |
| Deleting Spot nodes during business hours | ”Karpenter consolidated idle nodes” | Configure consolidation windows to avoid Spot node replacement during peak hours. Use disruption.consolidateAfter to delay. |
| Not accounting for EBS costs separately from EC2 | ”Compute is our biggest cost” | EBS volumes persist after pods are deleted. Monitor PVC lifecycle and implement reclaimPolicy: Delete for non-production volumes. |
1. Your CFO hands you the monthly AWS bill, pointing to a single $45,000 line item for EC2 instances in your production EKS cluster. She asks you to split this cost between the Data Science team and the Frontend team. Why is this impossible to do accurately using just the AWS Billing Console?
Cloud billing consoles only show costs per infrastructure resource (like EC2 instances or EBS volumes), not per Kubernetes workload. Because Kubernetes schedules pods from multiple teams onto the same shared nodes, a single $500/month EC2 instance might be running three Data Science jobs and two Frontend APIs simultaneously. To accurately split this cost, you need a tool like Kubecost or OpenCost that merges the billing data (the node’s actual price) with Kubernetes metrics (how much CPU and memory each team’s pods consumed on that specific node) and aggregates it via namespace or label. Without this workload-level correlation, any cost splitting is just a blind guess.
2. During a cost review, you notice the `recommendation-engine` deployment is consistently using only 15% of its requested CPU, while traffic patterns are highly unpredictable. Your junior engineer suggests implementing VPA in auto-update mode to fix the waste. Why might a combination of VPA (in recommendation mode) and HPA be a better financial and architectural decision?
If you use VPA in auto-update mode on an unpredictable workload, it will aggressively scale down the pod’s CPU requests during quiet periods, which can lead to severe CPU throttling and performance degradation when traffic spikes suddenly. Instead, you should use VPA in “Off” (recommendation) mode to determine the optimal baseline size for a single pod based on historical data. Then, implement HPA to dynamically add or remove those correctly-sized pods based on real-time traffic demand. By right-sizing the individual pods with VPA insights and scaling their count horizontally with HPA, you eliminate the baseline waste of over-provisioning while maintaining the elasticity needed to handle sudden traffic peaks gracefully.
3. You have $10,000/month in on-demand EC2 usage that runs 24/7. Should you commit to a $10,000/month Savings Plan?
No. Commit to $6,000-$7,000 (60-70% of current usage). Savings Plans commit you to a minimum hourly spend regardless of actual usage. If your usage drops (due to right-sizing, traffic changes, or migration), you still pay the committed amount. The remaining 30-40% stays on-demand, giving you flexibility. Over time, as you’re confident in your baseline, you can increase the commitment. Also consider: some of that $10K might be better served by Spot instances (for fault-tolerant workloads), which provide deeper discounts without long-term commitment. The optimal strategy is often: 60% Savings Plans + 20% Spot + 20% On-demand.
4. Your team wants to migrate a legacy monolithic application to a Spot instance node group to save 70% on compute costs. The application takes 5 minutes to gracefully shut down, requires persistent local disk state, and runs as a single replica. Why will this migration result in a catastrophic production outage?
Spot instances can be reclaimed by the cloud provider with only a 2-minute interruption warning. Since the legacy monolith takes 5 minutes to shut down, it will be forcefully terminated before it finishes its shutdown sequence, leading to data corruption or incomplete transactions. Furthermore, because it relies on local disk state and runs as a single replica, the entire application will go offline and lose its state when the underlying node disappears. Spot instances are only safe for stateless, fault-tolerant workloads that can gracefully terminate within 2 minutes and have multiple replicas distributed across different nodes to ensure continuous availability during interruptions.
5. A development EKS cluster costs $3,000/month and is used Monday-Friday, 9AM-6PM. How much can you save?
Business hours represent roughly 45 hours per week out of 168 total hours (27% of the time). If you scale the cluster to zero (or minimum) outside business hours, you save approximately 73% of compute costs: $3,000 x 0.73 = $2,190/month saved. Implementation options: (a) Karpenter with consolidation + scheduled scaling to zero, (b) a CronJob that scales node groups to 0 at 6PM and back to desired count at 9AM, (c) tools like kube-downscaler that annotate deployments with shutdown schedules. Additional savings: shut down NAT Gateways and load balancers when the cluster is empty. Caveat: factor in the 10-15 minute spin-up time each morning.
6. You recently deleted a large development namespace containing StatefulSets, LoadBalancer services, and hundreds of pods. A month later, your cloud bill shows an unexpected $800 charge associated with the deleted environment. What specific Kubernetes architectural mechanisms likely caused these resources to be orphaned and continue accruing charges?
When deleting Kubernetes resources, the underlying cloud infrastructure isn’t always automatically cleaned up due to default retention policies. The most likely culprit for the $800 charge is unattached EBS volumes left behind by the StatefulSets, because the default StorageClass often uses reclaimPolicy: Retain, meaning the cloud disk persists even after the PersistentVolumeClaim is deleted. Additionally, if the LoadBalancer services were forcefully deleted or the namespace was abruptly terminated without allowing controllers to finalize cleanup, the cloud provider’s Load Balancers and associated Elastic IPs would remain active. To prevent this, you must configure reclaimPolicy: Delete for non-critical storage and implement automated scanning tools to detect and alert on unattached cloud resources.
Hands-On Exercise: Cost Optimization Audit
Section titled “Hands-On Exercise: Cost Optimization Audit”In this exercise, you will perform a cost optimization audit on a Kubernetes cluster.
Prerequisites
Section titled “Prerequisites”- A running Kubernetes cluster (kind, minikube, or cloud)
- kubectl installed
- Metrics server installed (for VPA)
Task 1: Identify Over-Provisioned Workloads
Section titled “Task 1: Identify Over-Provisioned Workloads”Deploy some intentionally over-provisioned workloads and use kubectl to identify waste.
Solution
# Create a kind cluster with metrics serverkind create cluster --name cost-lab
# Install metrics serverkubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Patch for kind (insecure kubelet)kubectl patch deployment metrics-server -n kube-system \ --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
# Wait for metrics serversleep 30kubectl wait --for=condition=Ready pod -l k8s-app=metrics-server -n kube-system --timeout=120s
# Deploy over-provisioned workloadskubectl create namespace cost-audit
kubectl apply -f - <<'EOF'apiVersion: apps/v1kind: Deploymentmetadata: name: api-server-wasteful namespace: cost-audit labels: team: backend cost-center: CC-1000spec: replicas: 5 selector: matchLabels: app: api-server template: metadata: labels: app: api-server team: backend spec: containers: - name: api image: nginx:stable resources: requests: cpu: "2" memory: "4Gi" limits: cpu: "4" memory: "8Gi"---apiVersion: apps/v1kind: Deploymentmetadata: name: worker-wasteful namespace: cost-audit labels: team: data cost-center: CC-2000spec: replicas: 3 selector: matchLabels: app: worker template: metadata: labels: app: worker team: data spec: containers: - name: worker image: nginx:stable resources: requests: cpu: "1" memory: "2Gi" limits: cpu: "2" memory: "4Gi"EOF
# Wait for pods (some will be Pending due to insufficient resources)sleep 15
# Check actual usage vs requestsecho "=== Pod Resource Usage vs Requests ==="kubectl top pods -n cost-audit 2>/dev/null || echo "Metrics not ready yet, wait 60s"
# Compare requests to actual usagekubectl get pods -n cost-audit -o custom-columns=\NAME:.metadata.name,\CPU_REQ:.spec.containers[0].resources.requests.cpu,\MEM_REQ:.spec.containers[0].resources.requests.memory,\STATUS:.status.phaseTask 2: Calculate the Waste
Section titled “Task 2: Calculate the Waste”Solution
# Calculate total requested vs actualecho "=== Requested Resources ==="echo "api-server: 5 pods x 2 CPU = 10 CPU requested"echo "api-server: 5 pods x 4Gi = 20Gi memory requested"echo "worker: 3 pods x 1 CPU = 3 CPU requested"echo "worker: 3 pods x 2Gi = 6Gi memory requested"echo ""echo "TOTAL REQUESTED: 13 CPU, 26Gi memory"echo ""echo "At m7i.xlarge pricing ($0.192/hr, 4 CPU, 16Gi):"echo "13 CPU / 4 CPU per node = 4 nodes needed (by CPU)"echo "26Gi / 16Gi per node = 2 nodes needed (by memory)"echo "Limiting factor: CPU (4 nodes)"echo ""echo "Cost: 4 nodes x $0.192/hr x 730 hours = $561/month"echo ""echo "=== Actual Usage (nginx idle) ==="echo "Each nginx pod uses ~5m CPU and ~5Mi memory"echo "Total actual: ~40m CPU, ~40Mi memory"echo "Actual need: 1 node (easily)"echo ""echo "WASTE: $561 - $140 (1 node) = $421/month (75% waste)"echo ""echo "=== VPA Recommendations ==="echo "api-server: request 50m CPU, 64Mi memory (from 2 CPU, 4Gi)"echo "worker: request 50m CPU, 64Mi memory (from 1 CPU, 2Gi)"Task 3: Apply Right-Sizing
Section titled “Task 3: Apply Right-Sizing”Solution
# Right-size the deployments based on "VPA recommendations"kubectl patch deployment api-server-wasteful -n cost-audit --type=json -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "100m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "500m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"}]'
kubectl patch deployment worker-wasteful -n cost-audit --type=json -p='[ {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "100m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "128Mi"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "500m"}, {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "256Mi"}]'
# Verify all pods are now Running (they fit on fewer nodes)kubectl get pods -n cost-audit
echo "=== After Right-Sizing ==="echo "api-server: 5 pods x 100m CPU = 500m CPU requested"echo "worker: 3 pods x 100m CPU = 300m CPU requested"echo "TOTAL: 800m CPU, ~1Gi memory"echo "Fits on 1 node easily. Savings: 75%"Task 4: Create a Cost Optimization Report
Section titled “Task 4: Create a Cost Optimization Report”Write a cost optimization report for a fictional team based on the audit findings.
Solution
# Cost Optimization Report: Cost-Audit Namespace
## Executive SummaryCurrent monthly spend: ~$561 (4 nodes at on-demand pricing)Optimized monthly spend: ~$140 (1 node at on-demand pricing)Potential savings: $421/month ($5,052/year) -- 75% reduction
## Findings
### 1. Over-Provisioned Resources (Impact: $421/month)- api-server requests 2 CPU per pod but uses ~5m (0.25%)- worker requests 1 CPU per pod but uses ~5m (0.5%)- Total CPU requested: 13 cores. Total used: 40 millicores.- Recommendation: Reduce requests to 100m CPU, 128Mi memory
### 2. No Horizontal Pod Autoscaler (Impact: TBD)- api-server runs 5 replicas constantly- Likely needs 2 replicas at baseline, scale to 5 during peak- Recommendation: Add HPA with min=2, max=8, target CPU=70%- Estimated additional savings: 40% during off-peak
### 3. On-Demand Pricing (Impact: ~$50/month)- Workloads run 24/7, perfect for Savings Plans- With 1-year Compute Savings Plan: $140 * 0.63 = $88/month- Savings: $52/month
## Recommended Actions (priority order)1. Apply right-sized resource requests (immediate, $421/month)2. Add HPA for api-server (1 day, ~$30/month additional)3. Purchase Savings Plan for baseline compute (1 week, ~$50/month)
## Total Estimated Savings: $501/month ($6,012/year)Clean Up
Section titled “Clean Up”kind delete cluster --name cost-labSuccess Criteria
Section titled “Success Criteria”- Over-provisioned workloads deployed and identified
- Waste quantified in dollar terms
- Right-sized resource requests applied
- All pods running after right-sizing (no OOM or throttling)
- Cost optimization report includes specific dollar savings
Next Module
Section titled “Next Module”Module 8.9: Large-Scale Observability & Telemetry — You can see where the money goes. Now learn how to see where the problems are. Multi-cluster Prometheus with Thanos, OpenTelemetry at scale, and the art of monitoring Kubernetes without drowning in data.