Module 7.4: AKS Storage, Observability & Scaling

Complexity: [MEDIUM] | Time to Complete: 2.5h | Prerequisites: Module 7.1: AKS Architecture & Node Management

What You’ll Be Able to Do

After completing this module, you will be able to:

Configure KEDA (Kubernetes Event-Driven Autoscaling) on AKS for scaling based on Azure service metrics
Implement AKS observability with Azure Monitor Container Insights, Managed Prometheus, and Managed Grafana
Deploy Azure Disk and Azure Files CSI drivers with storage classes optimized for performance and cost on AKS
Design AKS cost optimization strategies using Spot node pools, cluster autoscaler tuning, and right-sizing

Why This Module Matters

In November 2023, an online retailer running on AKS experienced a catastrophic failure during their Black Friday sale. Their order processing service used Azure Premium SSD disks for a write-ahead log. When traffic spiked to 15x normal levels, the disk IOPS ceiling was hit and writes started queuing. The application had no metrics on disk I/O latency---their observability stack only monitored CPU and memory. Without visibility into the real bottleneck, the on-call engineer scaled the deployment from 6 to 30 replicas, which made things dramatically worse: 30 pods now competed for the same disk’s IOPS budget. The queue grew, timeouts cascaded, and the entire order pipeline froze for 90 minutes during peak sales hours. Post-incident analysis estimated $4.2 million in lost revenue. The fix was straightforward: migrate to Ultra Disks with provisioned IOPS, add disk I/O metrics to their Grafana dashboards, and implement KEDA-based scaling that responded to queue depth rather than CPU utilization.

This story illustrates a pattern that repeats across organizations: storage, observability, and scaling are treated as afterthoughts during initial cluster setup, then become the root cause of the most painful production incidents. The three topics are deeply interconnected. Without proper observability, you cannot make informed scaling decisions. Without proper scaling, your storage layer gets overwhelmed. Without proper storage, your observability pipeline loses data during the exact moments you need it most.

In this module, you will learn how to choose between Azure Disks and Azure Files for different workload patterns, configure Container Insights with Managed Prometheus and Grafana for full-stack observability, and implement event-driven autoscaling with the KEDA add-on. By the end, you will have a cluster that monitors itself, scales based on real business signals, and stores data on the right tier for each workload.

Azure Storage for Kubernetes: Disks vs Files

AKS integrates with two primary Azure storage services for persistent volumes: Azure Disks and Azure Files. The choice between them depends on your access patterns, performance requirements, and cross-zone needs.

Azure Disks: Block Storage for Single-Pod Workloads

Azure Disks provide block-level storage that attaches to a single node at a time. This maps to ReadWriteOnce (RWO) access mode in Kubernetes---only one pod on one node can mount the disk for read-write access.

    Azure Disk Types for AKS:
    ┌─────────────────────────────────────────────────────────────────┐
    │                                                                 │
    │  Standard HDD     Standard SSD     Premium SSD     Ultra Disk   │
    │  ────────────     ────────────     ───────────     ──────────   │
    │  Max IOPS: 2000   Max IOPS: 6000   Max IOPS: 20k  Max IOPS:   │
    │  Max BW: 500MB/s  Max BW: 750MB/s  Max BW: 900MB  160,000     │
    │  Latency: ~10ms   Latency: ~4ms    Latency: ~1ms  Max BW: 4GB │
    │                                                    Latency:     │
    │  Use: backups,    Use: dev/test,   Use: most      sub-ms       │
    │  cold data        light workloads  production     Use: high-   │
    │                                    databases      perf DBs,    │
    │                                                   real-time    │
    │                                                   analytics    │
    │  Cost: $          Cost: $$         Cost: $$$      Cost: $$$$   │
    └─────────────────────────────────────────────────────────────────┘

AKS uses CSI (Container Storage Interface) drivers for storage. The disk.csi.azure.com driver handles Azure Disks. You create a StorageClass that specifies the disk type, then reference it in PersistentVolumeClaims.

# StorageClass for Premium SSD v2 with provisioned IOPS
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: premium-ssd-v2
provisioner: disk.csi.azure.com
parameters:
  skuName: PremiumV2_LRS
  DiskIOPSReadWrite: "5000"
  DiskMBpsReadWrite: "200"
  cachingMode: None
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

---
# PVC using the StorageClass
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: database
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: premium-ssd-v2
  resources:
    requests:
      storage: 256Gi

The volumeBindingMode: WaitForFirstConsumer setting is critical for AKS clusters with availability zones. It delays disk creation until a pod actually needs it, ensuring the disk is created in the same zone as the node where the pod is scheduled. Without this, the disk might be created in Zone 1 while the pod gets scheduled to Zone 2, causing a permanent scheduling failure.

Ultra Disks: When Premium SSD Is Not Enough

Ultra Disks allow you to independently provision IOPS and throughput, decoupled from disk size. A 64 GB Ultra Disk can deliver 50,000 IOPS if you need it. This makes them ideal for databases like PostgreSQL, MySQL, and Cassandra that have high I/O requirements relative to their data size.

# Enable Ultra Disk support on a node pool
az aks nodepool add \
  --resource-group rg-aks-prod \
  --cluster-name aks-prod-westeurope \
  --name dbpool \
  --node-count 3 \
  --node-vm-size Standard_D8s_v5 \
  --zones 1 2 3 \
  --enable-ultra-ssd \
  --mode User \
  --node-taints "workload=database:NoSchedule" \
  --labels workload=database

# StorageClass for Ultra Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ultra-disk
provisioner: disk.csi.azure.com
parameters:
  skuName: UltraSSD_LRS
  DiskIOPSReadWrite: "50000"
  DiskMBpsReadWrite: "1000"
  cachingMode: None
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Azure Files: Shared Storage for Multi-Pod Access

Pause and predict: If you have a legacy CMS that writes user uploads to a local filesystem and you want to scale it to 3 replicas across different nodes, which Azure storage solution must you use and why?

Azure Files provides SMB and NFS file shares that multiple pods across multiple nodes can mount simultaneously (ReadWriteMany / RWX). This is essential for workloads that need shared storage: CMS platforms, shared configuration files, machine learning training data, and legacy applications that expect a shared filesystem.

    Azure Files Access Patterns:
    ┌───────────────────────────────────────────────────────────────┐
    │                                                               │
    │  SMB Protocol (default)          NFS Protocol (Premium only)  │
    │  ─────────────────────          ──────────────────────────── │
    │  Windows + Linux                Linux only                    │
    │  Broad compatibility            POSIX-compliant               │
    │  AD-based authentication        No authentication overhead    │
    │  Lower throughput               Higher throughput              │
    │                                                               │
    │  Use: general shared            Use: high-performance         │
    │  storage, Windows               shared storage, ML training   │
    │  workloads                      data, media processing        │
    └───────────────────────────────────────────────────────────────┘

# StorageClass for Azure Files NFS (Premium tier)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-files-nfs-premium
provisioner: file.csi.azure.com
parameters:
  protocol: nfs
  skuName: Premium_LRS
mountOptions:
  - nconnect=4
  - noresvport
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true

---
# PVC for shared ML training data
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: training-data
  namespace: ml-pipeline
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azure-files-nfs-premium
  resources:
    requests:
      storage: 1Ti

Shared Disks for High Availability

Azure Shared Disks allow a single Premium SSD or Ultra Disk to be attached to multiple nodes simultaneously. This enables cluster-aware applications (like SQL Server Failover Cluster Instances or custom HA storage engines) to share a disk at the block level.

# StorageClass for shared disks
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: shared-premium-disk
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS
  maxShares: "3"
  cachingMode: None
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Warning: Shared Disks do not provide a filesystem. The application must handle concurrent block-level access using a cluster filesystem (like GFS2) or its own coordination protocol. Do not mount a shared disk with ext4 or xfs from multiple nodes---you will corrupt your data.

The Storage Decision Matrix

Criteria	Azure Disk (Premium)	Azure Disk (Ultra)	Azure Files (SMB)	Azure Files (NFS)
Access mode	RWO	RWO	RWX	RWX
Max IOPS	20,000	160,000	10,000	100,000
Cross-zone	No (zone-locked)	No (zone-locked)	Yes (ZRS available)	Yes (ZRS available)
Latency	~1ms	Sub-ms	~5-10ms	~2-5ms
Windows support	Yes	Yes	Yes	No
Best for	Databases, stateful apps	High-IOPS databases	Shared config, CMS	ML data, media
Cost	$$$	$$$$	$$	$$$

Container Insights and Azure Monitor

Container Insights is Azure’s native observability solution for AKS. It collects logs, metrics, and performance data from your cluster and presents them in the Azure portal with pre-built dashboards and query capabilities.

Enabling Container Insights

# Create a Log Analytics workspace
az monitor log-analytics workspace create \
  --resource-group rg-aks-prod \
  --workspace-name law-aks-prod \
  --location westeurope \
  --retention-in-days 90

WORKSPACE_ID=$(az monitor log-analytics workspace show \
  -g rg-aks-prod -n law-aks-prod --query id -o tsv)

# Enable Container Insights
az aks enable-addons \
  --resource-group rg-aks-prod \
  --name aks-prod-westeurope \
  --addons monitoring \
  --workspace-resource-id "$WORKSPACE_ID"

# Verify the monitoring agent is running
k get pods -n kube-system -l component=ama-logs

What Container Insights Collects

Container Insights deploys a monitoring agent (Azure Monitor Agent) as a DaemonSet on each node. This agent collects:

Node metrics: CPU, memory, disk I/O, network throughput per node
Pod metrics: CPU/memory requests vs actual usage, restart counts, OOM kills
Container logs: stdout/stderr from all containers (sent to Log Analytics)
Kubernetes events: Pod scheduling, image pulls, resource quota violations
Inventory data: Running pods, nodes, deployments, services

# Query container logs in Log Analytics
az monitor log-analytics query \
  --workspace "$WORKSPACE_ID" \
  --analytics-query "ContainerLogV2 | where ContainerName == 'payment-service' | where LogMessage contains 'error' | top 20 by TimeGenerated desc" \
  --timespan "PT6H"

Cost Control for Container Insights

Pause and predict: You just deployed Container Insights on a busy cluster and your Log Analytics bill spiked by $500 in one day. What is the most likely culprit, and what configuration component will fix it?

Container Insights can generate significant Log Analytics costs if you send every log line from every container. Use the ConfigMap to control what gets collected:

# Save as container-insights-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: container-azm-ms-agentconfig
  namespace: kube-system
data:
  schema-version: v1
  config-version: v1
  log-data-collection-settings: |
    [log_collection_settings]
      [log_collection_settings.stdout]
        enabled = true
        exclude_namespaces = ["kube-system", "gatekeeper-system"]
      [log_collection_settings.stderr]
        enabled = true
        exclude_namespaces = ["kube-system"]
      [log_collection_settings.env_var]
        enabled = false
  prometheus-data-collection-settings: |
    [prometheus_data_collection_settings.cluster]
      interval = "60s"
      monitor_kubernetes_pods = true

k apply -f container-insights-config.yaml

Managed Prometheus and Grafana: Cloud-Native Monitoring

While Container Insights works well for logs and basic metrics, production teams often need Prometheus for application-specific metrics and Grafana for custom dashboards. Azure offers fully managed versions of both, eliminating the operational burden of running your own Prometheus server and Grafana instance.

Setting Up Managed Prometheus

Azure Monitor managed service for Prometheus stores metrics in an Azure Monitor workspace. AKS ships metrics using a Prometheus-compatible agent.

# Create an Azure Monitor workspace (for Prometheus)
az monitor account create \
  --resource-group rg-aks-prod \
  --name amw-aks-prod \
  --location westeurope

MONITOR_WORKSPACE_ID=$(az monitor account show \
  -g rg-aks-prod -n amw-aks-prod --query id -o tsv)

# Enable Managed Prometheus on the cluster
az aks update \
  --resource-group rg-aks-prod \
  --name aks-prod-westeurope \
  --enable-azure-monitor-metrics \
  --azure-monitor-workspace-resource-id "$MONITOR_WORKSPACE_ID"

# Verify the Prometheus agent is running
k get pods -n kube-system -l rsName=ama-metrics

Setting Up Managed Grafana

# Create a Managed Grafana instance
az grafana create \
  --resource-group rg-aks-prod \
  --name grafana-aks-prod \
  --location westeurope

# Link Grafana to the Azure Monitor workspace
GRAFANA_ID=$(az grafana show -g rg-aks-prod -n grafana-aks-prod --query id -o tsv)

az monitor account update \
  --resource-group rg-aks-prod \
  --name amw-aks-prod \
  --linked-grafana "$GRAFANA_ID"

# Get the Grafana URL
az grafana show -g rg-aks-prod -n grafana-aks-prod --query "properties.endpoint" -o tsv

Once linked, Managed Grafana automatically discovers the Prometheus data source. Azure provides pre-built dashboards for Kubernetes cluster monitoring, node performance, pod resource usage, and more.

Custom Prometheus Metrics from Your Application

Your application can expose custom Prometheus metrics, and the managed Prometheus agent will scrape them automatically if you annotate your pods correctly.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: payments
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: payment
          image: myregistry.azurecr.io/payment-service:v2.1.0
          ports:
            - containerPort: 8080
              name: http
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "1"
              memory: "512Mi"

Creating Alert Rules

# Create a Prometheus alert rule for high error rate
az monitor metrics alert create \
  --resource-group rg-aks-prod \
  --name "payment-high-error-rate" \
  --scopes "$MONITOR_WORKSPACE_ID" \
  --condition "avg http_requests_total{status=~'5..',service='payment-service'} by (service) / avg http_requests_total{service='payment-service'} by (service) > 0.05" \
  --description "Payment service error rate exceeds 5%" \
  --severity 1 \
  --window-size 5m \
  --evaluation-frequency 1m

For more flexible alerting, use Prometheus-native alert rules through the Azure Monitor workspace:

# PrometheusRuleGroup for custom alerts
apiVersion: alerts.monitor.azure.com/v1
kind: PrometheusRuleGroup
metadata:
  name: payment-alerts
spec:
  rules:
    - alert: PaymentServiceHighLatency
      expr: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{service="payment-service"}[5m])) > 2
      for: 3m
      labels:
        severity: warning
      annotations:
        summary: "Payment service p99 latency exceeds 2 seconds"
    - alert: PaymentServiceDown
      expr: up{job="payment-service"} == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Payment service is down"

KEDA: Event-Driven Autoscaling

The standard Kubernetes Horizontal Pod Autoscaler (HPA) scales based on CPU and memory utilization. This works for stateless web servers but fails spectacularly for event-driven workloads: message queue consumers, batch processors, and services that need to scale based on business metrics rather than infrastructure metrics.

KEDA (Kubernetes Event-Driven Autoscaler) extends the HPA with over 60 scalers that can trigger scaling from external event sources: Azure Service Bus queue depth, Azure Event Hubs partition lag, PostgreSQL query results, Prometheus metrics, and many more.

    Traditional HPA:                    KEDA:
    ┌──────────────────┐               ┌──────────────────┐
    │ Metrics Server   │               │ KEDA Operator    │
    │ (CPU/memory only)│               │ (60+ scalers)    │
    │                  │               │                  │
    │ "Pod at 80% CPU" │               │ "Queue has 500   │
    │ → scale up       │               │  messages"       │
    │                  │               │ → scale up       │
    │ Cannot scale     │               │                  │
    │ to zero          │               │ Can scale to     │
    └──────────────────┘               │ zero (!)         │
                                       └──────────────────┘

Enabling the KEDA Add-on

# Enable KEDA as an AKS add-on
az aks update \
  --resource-group rg-aks-prod \
  --name aks-prod-westeurope \
  --enable-keda

# Verify KEDA pods are running
k get pods -n kube-system -l app.kubernetes.io/name=keda-operator

Scaling Based on Azure Service Bus Queue Depth

This is the most common KEDA pattern in Azure: scale your consumer pods based on how many messages are waiting in a queue.

# ScaledObject: scale order-processor based on Service Bus queue depth
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: orders
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 15
  cooldownPeriod: 120
  minReplicaCount: 0      # Scale to zero when queue is empty!
  maxReplicaCount: 50
  triggers:
    - type: azure-servicebus
      metadata:
        queueName: incoming-orders
        namespace: sb-prod-westeurope
        messageCount: "10"  # 1 pod per 10 messages
      authenticationRef:
        name: servicebus-auth

The messageCount: "10" means KEDA targets 1 pod for every 10 messages in the queue. If there are 250 messages, KEDA will scale to 25 replicas. When the queue drains to zero, KEDA scales the deployment down to 0 replicas, saving costs entirely.

KEDA Authentication with Workload Identity

KEDA needs credentials to check the queue depth. Using Workload Identity (from Module 7.3), you can avoid storing connection strings:

# TriggerAuthentication using Workload Identity
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: servicebus-auth
  namespace: orders
spec:
  podIdentity:
    provider: azure-workload
    identityId: "<CLIENT_ID_OF_MANAGED_IDENTITY>"

The managed identity needs the “Azure Service Bus Data Receiver” role on the Service Bus namespace to check queue metrics.

Scaling Based on Prometheus Metrics

KEDA can also scale based on custom Prometheus metrics from your Azure Monitor workspace. This lets you scale on any business metric your application exposes.

# Scale based on a custom Prometheus metric
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-gateway-scaler
  namespace: gateway
spec:
  scaleTargetRef:
    name: api-gateway
  pollingInterval: 30
  cooldownPeriod: 180
  minReplicaCount: 2
  maxReplicaCount: 30
  triggers:
    - type: prometheus
      metadata:
        serverAddress: "http://prometheus-server.monitoring:9090"
        metricName: http_requests_per_second
        query: "sum(rate(http_requests_total{service='api-gateway'}[2m]))"
        threshold: "100"  # 1 pod per 100 requests/sec

KEDA Scaling Strategies Compared

Scaler	Trigger Source	Scale to Zero	Typical Use Case
azure-servicebus	Queue message count	Yes	Order processing, async tasks
azure-eventhub	Consumer group lag	Yes	Event streaming, IoT data
azure-queue	Storage queue length	Yes	Background jobs, batch processing
prometheus	Any Prometheus metric	No (min 1)	RPS-based scaling, custom metrics
cron	Time schedule	Yes	Predictable traffic patterns
azure-monitor	Azure Monitor metrics	Yes	Infrastructure-based triggers

Combining KEDA with Cluster Autoscaler

KEDA scales pods. The cluster autoscaler scales nodes. They work together beautifully:

KEDA detects 500 messages in the queue and scales the deployment to 50 replicas
The scheduler finds that existing nodes can only fit 30 of those pods
20 pods go to Pending state
The cluster autoscaler detects pending pods and adds nodes to the VMSS
New nodes register, and the scheduler places the remaining pods
Messages get processed. Queue drains.
KEDA scales pods down to 0
Cluster autoscaler detects underutilized nodes and removes them after the cool-down period

    Queue depth: 500 messages
    ┌─────────────────────────────────────────────────────────────────┐
    │ t=0s   KEDA: 0 pods → 50 pods (target)                         │
    │ t=10s  Scheduler: 30 pods running, 20 pending                   │
    │ t=20s  Cluster Autoscaler: adding 4 nodes to VMSS               │
    │ t=80s  New nodes ready: 50/50 pods running                      │
    │ t=300s Queue drained to 0 messages                              │
    │ t=420s KEDA: 50 pods → 0 pods                                   │
    │ t=1020s Cluster Autoscaler: removing 4 underutilized nodes      │
    └─────────────────────────────────────────────────────────────────┘

Cost Optimization: Spot Instances and Right-Sizing

Compute costs dominate the typical Kubernetes bill. While auto-scaling ensures you only run the nodes you need, cost optimization ensures you pay the lowest possible price for those nodes and pack them as efficiently as possible.

Spot Node Pools

Azure Spot Virtual Machines offer unutilized Azure capacity at a deep discount—up to 90% off the pay-as-you-go rate. The trade-off is that Azure can evict these VMs at any time with only a 30-second warning if the capacity is needed for full-price customers.

Spot VMs are perfect for fault-tolerant, interruptible workloads:

Batch processing and background jobs
Stateless web servers (if you run enough replicas across both Spot and regular nodes)
CI/CD build agents
Machine learning training jobs

# Add a Spot node pool to an existing cluster
az aks nodepool add \
  --resource-group rg-aks-prod \
  --cluster-name aks-prod-westeurope \
  --name spotpool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10 \
  --node-vm-size Standard_D4s_v5

When you create a Spot node pool, AKS automatically adds the taint kubernetes.azure.com/scalesetpriority=spot:NoSchedule. This prevents normal pods from being scheduled on Spot nodes unless they explicitly tolerate the taint.

# Pod configured to run on Spot nodes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-worker
spec:
  template:
    spec:
      tolerations:
      - key: "kubernetes.azure.com/scalesetpriority"
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: kubernetes.azure.com/scalesetpriority
                operator: In
                values:
                - spot

Stop and think: If your entire web frontend is running on a Spot node pool and Azure experiences a sudden surge in demand for that VM size in your region, what happens to your application? How should you architect a production deployment to utilize Spot savings without risking downtime?

To use Spot instances safely in production, employ a mixed strategy: run your baseline minimum replicas on regular (On-Demand) nodes, and use KEDA or HPA to scale out onto Spot nodes during traffic spikes.

Workload Right-Sizing

Running workloads with CPU and memory requests that are vastly larger than their actual usage leads to “slack” capacity. The cluster autoscaler provisions new nodes because the requested resources exceed capacity, even if the nodes are physically sitting at 10% CPU utilization.

Right-sizing involves aligning your container requests with reality.

Analyze Historical Usage: Use Azure Monitor Container Insights or Grafana dashboards to compare kube_pod_container_resource_requests against actual container_cpu_usage_seconds_total.
Vertical Pod Autoscaler (VPA): Run the VPA in Recommendation mode. It analyzes pod metrics over time and suggests optimal CPU and memory requests without actively restarting your pods.
Set Requests = Limits for Memory: To prevent unexpected Out-Of-Memory (OOM) kills during traffic spikes, a common best practice is to set memory requests equal to memory limits.
Allow CPU Throttling (Carefully): Unlike memory, CPU is a compressible resource. Setting CPU limits higher than requests allows a pod to burst during startup or brief spikes, though aggressive throttling can cause latency.

# A well-sized container specification
resources:
  requests:
    memory: "256Mi"
    cpu: "100m"     # 1/10th of a core for baseline
  limits:
    memory: "256Mi" # Equal to request to prevent OOM
    cpu: "500m"     # Allowed to burst up to half a core

Did You Know?

Azure Disk IOPS scale with disk size on Premium SSD, but Ultra Disk decouples them. A 256 GB Premium SSD v1 gets 1,100 IOPS. To get 5,000 IOPS you need a 1 TB disk, even if you only store 50 GB of data. Ultra Disk lets you provision 50,000 IOPS on a 64 GB disk. This decoupling can save thousands of dollars per month for I/O-intensive databases that do not need large storage volumes.
KEDA can scale to zero replicas, which the standard HPA cannot do. The HPA requires a minimum of 1 replica. KEDA’s ability to scale to zero is transformative for cost optimization on batch processing workloads. A cluster with 200 different queue consumers that are each idle 95% of the time can run zero pods for most of those consumers, only spinning them up when messages arrive. Combined with the cluster autoscaler, this means you can run a multi-tenant batch processing platform where idle tenants cost nothing.
Azure Managed Prometheus stores metrics for 18 months at no additional retention cost. Self-hosted Prometheus typically requires careful capacity planning for long-term storage (using Thanos or Cortex). Azure Monitor workspace handles this natively, making it possible to query 18 months of historical metrics for capacity planning and trend analysis without managing any storage infrastructure.
The nconnect mount option for Azure Files NFS multiplies throughput by opening multiple TCP connections. A single NFS connection typically tops out at 300-400 MB/s due to TCP window limitations. Setting nconnect=4 in your StorageClass mount options opens 4 parallel TCP connections per mount, effectively quadrupling throughput. This is essential for ML training workloads that read large datasets from shared storage.

Common Mistakes

Mistake	Why It Happens	How to Fix It
Using Premium SSD when IOPS requirement exceeds the disk-size-to-IOPS ratio	Not understanding that Premium SSD IOPS are tied to disk size	Calculate required IOPS first. If you need high IOPS on small storage, use Ultra Disk or Premium SSD v2
Mounting Azure Disks without `WaitForFirstConsumer` binding mode	Copying StorageClass examples that use `Immediate` binding	Always use `volumeBindingMode: WaitForFirstConsumer` on zone-aware clusters to prevent zone mismatches
Sending all container logs to Log Analytics without filtering	Default Container Insights config collects everything	Use the ConfigMap to exclude noisy namespaces (kube-system, monitoring) and disable env_var collection
Setting KEDA minReplicaCount to 0 for latency-sensitive services	Attracted by cost savings of scale-to-zero	Only scale to zero for batch/queue consumers. Latency-sensitive services need minReplicaCount >= 1 to avoid cold start delays
Not configuring PodDisruptionBudgets for KEDA-scaled workloads	PDBs seem unnecessary for “elastic” workloads	KEDA scales pods, but node upgrades drain them. Without PDBs, all replicas can be evicted simultaneously during cluster upgrades
Mounting Azure Files SMB when NFS would perform better	SMB is the default and works on both Windows and Linux	For Linux-only workloads needing high throughput, always use NFS with the `nconnect` mount option
Creating Grafana dashboards without alert rules	”We will check the dashboards when something is wrong”	If nobody is watching the dashboard when the incident starts, it has zero value. Always pair dashboards with alert rules
Ignoring disk I/O metrics in observability setup	CPU and memory are the default metrics; disk I/O requires explicit configuration	Add disk IOPS, throughput, and latency to your monitoring ConfigMap and Grafana dashboards

Quiz

1. Scenario: You deployed a StatefulSet using a Premium SSD StorageClass with `Immediate` binding mode across a 3-zone AKS cluster. The first pod comes up fine, but the second pod is permanently stuck in `Pending` state. What architectural constraint caused this, and how does `WaitForFirstConsumer` solve it?

Azure Disks are zone-locked resources, meaning a disk created in Availability Zone 1 can only be attached to a virtual machine physically located in Zone 1. When you use Immediate binding mode, the Kubernetes control plane creates the disk immediately upon seeing the PersistentVolumeClaim, without knowing which node the scheduler will eventually choose for the pod. If the disk happens to be created in Zone 1, but the pod is scheduled onto a node in Zone 2, the pod cannot mount the volume and remains stuck in Pending. Using WaitForFirstConsumer solves this by delaying the disk creation API call until the exact moment the scheduler places the pod on a specific node, ensuring the disk is provisioned in the correct matching zone.

2. Scenario: Your DBA team needs to migrate a high-transaction PostgreSQL database to AKS. The database is only 50 GB in size, but requires a guaranteed 15,000 IOPS to handle peak loads. Why would provisioning a 50 GB Premium SSD fail to meet this requirement, and what storage tier is mathematically required instead?

Standard Premium SSDs tie their IOPS and throughput performance directly to the provisioned capacity of the disk. A 64 GB Premium SSD (P6) provides only 240 IOPS, meaning you would have to provision and pay for a 1 TB disk just to achieve the 5,000 IOPS tier, and even larger to hit 15,000. Ultra Disks and Premium SSD v2 solve this by decoupling capacity from performance, allowing you to independently dial in exact IOPS and throughput metrics. By using Ultra Disk, you can provision a 50 GB disk but explicitly set the DiskIOPSReadWrite parameter to 15,000, paying only for the performance you need without wasting money on empty terabytes of storage.

3. Scenario: A machine learning pipeline needs to train a model using 5 TB of image data shared across 20 GPU pods simultaneously. The data scientists initially used Azure Files SMB but are complaining that the data loading phase takes hours due to network bottlenecking. Which Azure Files protocol should they switch to, and what specific mount option will drastically reduce their load times?

The data scientists should switch their StorageClass to use Azure Files with the NFS protocol, which avoids the authentication overhead and Windows-centric design of SMB. NFS on Azure Files Premium provides significantly higher throughput for Linux-based workloads like machine learning containers. Furthermore, they must add the nconnect=4 (or up to 16) setting in their StorageClass mount options. By default, an NFS mount uses a single TCP connection that tops out at around 300-400 MB/s due to TCP window limits; nconnect opens multiple parallel TCP connections to the storage account, multiplying the throughput and drastically reducing data load times.

4. Scenario: An e-commerce backend uses standard HPA (CPU/Memory) to scale its order processing workers. During a flash sale, 10,000 orders hit the Azure Service Bus queue in seconds. The workers process them so quickly that their CPU never exceeds 40%, so the HPA never scales them up, resulting in a 2-hour processing backlog. How would KEDA fundamentally change how this scaling decision is made?

The standard HPA is entirely blind to external business metrics like queue depth, relying solely on lagging infrastructure metrics like CPU utilization which may not correlate with the actual backlog. KEDA replaces this paradigm by connecting directly to the Azure Service Bus API and reading the exact number of pending messages waiting to be processed. Instead of waiting for CPU to spike, KEDA can be configured to instantly provision one worker pod for every 50 messages in the queue. This event-driven approach ensures the deployment scales out preemptively the moment the queue begins to fill, processing the 10,000 orders in minutes rather than hours, and then safely scaling back down to zero when the queue is empty.

5. Scenario: You configure KEDA to scale a consumer deployment to 100 replicas based on queue depth, but your AKS cluster currently only has 3 nodes which can fit 30 pods total. Walk through the exact sequence of events that occurs between KEDA and the Cluster Autoscaler when 1,000 messages suddenly arrive in the queue.

When the messages arrive, the KEDA operator detects the queue depth and immediately updates the deployment’s target replica count to 100. The Kubernetes scheduler successfully places 30 pods on the existing 3 nodes, but the remaining 70 pods transition into a Pending state due to insufficient CPU or memory resources on the cluster. The Cluster Autoscaler constantly watches for Pending pods; upon detecting them, it calculates how many new nodes are required and makes an API call to Azure to expand the Virtual Machine Scale Set. Once the new VMs boot up and join the AKS cluster as Ready nodes, the scheduler automatically places the remaining 70 pods onto them, allowing all 100 consumers to process the queue in parallel.

6. Scenario: A junior engineer enables Container Insights on a production cluster with default settings to troubleshoot a specific microservice. A week later, the Azure Log Analytics bill arrives at $2,000. Why did this happen by default, and what specific configuration changes in the `container-azm-ms-agentconfig` ConfigMap are required to stop the bleeding while still monitoring the application?

By default, the Azure Monitor Agent deployed by Container Insights captures every single line of standard output (stdout) and standard error (stderr) from every container in the cluster, including incredibly noisy system components. This massive ingestion volume is billed per gigabyte by Log Analytics, leading to the rapid cost spike. To fix this, the engineer must deploy a custom ConfigMap named container-azm-ms-agentconfig in the kube-system namespace. In this configuration, they need to explicitly add kube-system and other high-volume namespaces to the exclude_namespaces array for stdout and stderr, and disable environment variable collection (env_var.enabled = false), ensuring only relevant application logs are ingested and billed.

7. Scenario: To save money, a team creates a single 1 TB Premium SSD with `maxShares: 3` and mounts it to three different web server pods using the default `ext4` filesystem so they can share static assets. Within an hour, the filesystem is completely corrupted and the data is lost. What architectural rule of Shared Disks did they violate, and what is required to share block storage safely?

The team misunderstood the difference between block storage and file storage; Azure Shared Disks provide concurrent block-level access to the underlying storage device, not a managed filesystem. Standard Linux filesystems like ext4 or xfs cache data in memory and are completely unaware that other operating systems might be modifying the same underlying disk blocks simultaneously, inevitably leading to catastrophic data corruption. To share a disk safely, the pods must either utilize a specialized cluster-aware filesystem (like GFS2) that coordinates locks across nodes, or the application itself must be explicitly designed to manage concurrent block-level arbitration, such as SQL Server Failover Cluster Instances. For simple shared static assets, the team should have used Azure Files (NFS or SMB) instead.

Hands-On Exercise: KEDA + Azure Service Bus Queue Scaling + Monitor Alerts

In this exercise, you will set up event-driven autoscaling where a consumer deployment scales from zero to many replicas based on Azure Service Bus queue depth, with monitoring alerts that fire when the queue exceeds a threshold. You will also create a zone-aware StorageClass to properly deploy stateful workloads.

Prerequisites

AKS cluster with KEDA add-on enabled
Azure CLI authenticated
Workload Identity configured (from Module 7.3)

Task 1: Create a Zone-Aware StorageClass and PVC

Before setting up scaling, provision a Premium SSD v2 StorageClass that correctly handles availability zones, and create a PersistentVolumeClaim.

Solution

# Create a zone-aware StorageClass
k apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: premium-ssd-v2-zone-aware
provisioner: disk.csi.azure.com
parameters:
  skuName: PremiumV2_LRS
  DiskIOPSReadWrite: "3000"
  DiskMBpsReadWrite: "125"
  cachingMode: None
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
EOF

# Create a PersistentVolumeClaim
k apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: order-db-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: premium-ssd-v2-zone-aware
  resources:
    requests:
      storage: 100Gi
EOF

# Verify the PVC stays in Pending state (because WaitForFirstConsumer delays provisioning until a Pod uses it)
k get pvc order-db-pvc

Task 2: Create the Azure Service Bus Namespace and Queue

Solution

# Create the Service Bus namespace
az servicebus namespace create \
  --resource-group rg-aks-prod \
  --name sb-aks-lab-$(openssl rand -hex 4) \
  --location westeurope \
  --sku Standard

SB_NAMESPACE=$(az servicebus namespace list -g rg-aks-prod \
  --query "[0].name" -o tsv)

# Create the queue
az servicebus queue create \
  --resource-group rg-aks-prod \
  --namespace-name "$SB_NAMESPACE" \
  --name incoming-orders \
  --max-size 1024 \
  --default-message-time-to-live "PT1H"

# Get the connection string for the producer script
SB_CONNECTION=$(az servicebus namespace authorization-rule keys list \
  --resource-group rg-aks-prod \
  --namespace-name "$SB_NAMESPACE" \
  --name RootManageSharedAccessKey \
  --query primaryConnectionString -o tsv)

echo "Service Bus Namespace: $SB_NAMESPACE"

Task 3: Set Up Workload Identity for KEDA and the Consumer

Create a managed identity that KEDA and the consumer pods will use to read from the queue.

Solution

# Get the OIDC issuer
OIDC_ISSUER=$(az aks show -g rg-aks-prod -n aks-prod-westeurope \
  --query "oidcIssuerProfile.issuerUrl" -o tsv)

# Create the managed identity
az identity create \
  --resource-group rg-aks-prod \
  --name id-order-processor \
  --location westeurope

SB_CLIENT_ID=$(az identity show -g rg-aks-prod -n id-order-processor \
  --query clientId -o tsv)
SB_PRINCIPAL_ID=$(az identity show -g rg-aks-prod -n id-order-processor \
  --query principalId -o tsv)

# Grant Service Bus Data Receiver role
SB_ID=$(az servicebus namespace show -g rg-aks-prod -n "$SB_NAMESPACE" --query id -o tsv)

az role assignment create \
  --assignee-object-id "$SB_PRINCIPAL_ID" \
  --assignee-principal-type ServicePrincipal \
  --role "Azure Service Bus Data Receiver" \
  --scope "$SB_ID"

# Create federated credential
az identity federated-credential create \
  --name fed-order-processor \
  --identity-name id-order-processor \
  --resource-group rg-aks-prod \
  --issuer "$OIDC_ISSUER" \
  --subject "system:serviceaccount:orders:order-processor-sa" \
  --audiences "api://AzureADTokenExchange"

# Create the namespace and service account
k create namespace orders

k apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: order-processor-sa
  namespace: orders
  annotations:
    azure.workload.identity/client-id: "$SB_CLIENT_ID"
  labels:
    azure.workload.identity/use: "true"
EOF

Task 4: Deploy the Consumer Application and KEDA ScaledObject

Deploy the consumer and configure KEDA to scale it based on queue depth.

Solution

# Deploy the order processor (a simple consumer simulator)
k apply -f - <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-processor
  namespace: orders
spec:
  replicas: 0
  selector:
    matchLabels:
      app: order-processor
  template:
    metadata:
      labels:
        app: order-processor
    spec:
      serviceAccountName: order-processor-sa
      containers:
        - name: processor
          image: busybox:1.36
          command:
            - /bin/sh
            - -c
            - |
              echo "Order processor started. Processing messages..."
              while true; do
                echo "$(date): Processing order batch..."
                sleep 5
              done
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "250m"
              memory: "256Mi"
EOF

# Create the KEDA TriggerAuthentication
TENANT_ID=$(az account show --query tenantId -o tsv)

k apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: servicebus-workload-auth
  namespace: orders
spec:
  podIdentity:
    provider: azure-workload
    identityId: "$SB_CLIENT_ID"
EOF

# Create the ScaledObject
k apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
  namespace: orders
spec:
  scaleTargetRef:
    name: order-processor
  pollingInterval: 10
  cooldownPeriod: 60
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
    - type: azure-servicebus
      metadata:
        queueName: incoming-orders
        namespace: $SB_NAMESPACE
        messageCount: "5"
      authenticationRef:
        name: servicebus-workload-auth
EOF

# Verify KEDA is watching the queue
k get scaledobject -n orders
k get hpa -n orders

Task 5: Send Messages and Observe Scaling

Flood the queue with messages and watch KEDA scale the consumer.

Solution

# Verify current state: 0 replicas
k get deployment order-processor -n orders

# Send 100 messages to the queue
for i in $(seq 1 100); do
  az servicebus queue message send \
    --resource-group rg-aks-prod \
    --namespace-name "$SB_NAMESPACE" \
    --queue-name incoming-orders \
    --body "{\"orderId\": \"ORD-$i\", \"amount\": $((RANDOM % 1000 + 1))}"
done

echo "Sent 100 messages. Watching KEDA scale..."

# Watch the scaling happen (KEDA polls every 10 seconds)
# Run this in a loop or use watch
k get deployment order-processor -n orders -w

# After a few moments, you should see replicas increasing:
# order-processor   0/20   0  0  0s
# order-processor   20/20  20 0  15s
# (KEDA targets 1 pod per 5 messages: 100/5 = 20 pods)

# Check the HPA that KEDA created
k describe hpa -n orders

# Check queue depth decreasing (in a real app, consumers would drain the queue)
az servicebus queue show \
  --resource-group rg-aks-prod \
  --namespace-name "$SB_NAMESPACE" \
  --name incoming-orders \
  --query "countDetails.activeMessageCount" -o tsv

Task 6: Set Up Azure Monitor Alert for Queue Backlog

Create an alert that fires when the queue depth exceeds a threshold, indicating consumers cannot keep up.

Solution

# Create an action group for notifications
az monitor action-group create \
  --resource-group rg-aks-prod \
  --name ag-aks-oncall \
  --short-name aks-oncall \
  --email-receiver name="Platform Team" address="platform-oncall@contoso.com"

ACTION_GROUP_ID=$(az monitor action-group show \
  -g rg-aks-prod -n ag-aks-oncall --query id -o tsv)

# Create metric alert on Service Bus queue depth
az monitor metrics alert create \
  --resource-group rg-aks-prod \
  --name "high-order-queue-depth" \
  --scopes "$SB_ID" \
  --condition "avg ActiveMessages > 200" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --severity 2 \
  --description "Order queue has more than 200 active messages for 5 minutes. Consumers may not be keeping up." \
  --action "$ACTION_GROUP_ID"

# Verify the alert rule
az monitor metrics alert show \
  -g rg-aks-prod -n "high-order-queue-depth" -o table

# Create a second alert for KEDA scaling failures
# (when KEDA hits maxReplicaCount but queue is still growing)
az monitor metrics alert create \
  --resource-group rg-aks-prod \
  --name "order-queue-critical" \
  --scopes "$SB_ID" \
  --condition "avg ActiveMessages > 1000" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --severity 1 \
  --description "CRITICAL: Order queue exceeds 1000 messages. KEDA may have hit maxReplicaCount. Investigate immediately." \
  --action "$ACTION_GROUP_ID"

Task 7: Verify Scale-to-Zero

Drain the queue and confirm KEDA scales the deployment back to zero.

Solution

# In a real scenario, consumers process messages. For the lab, purge the queue:
az servicebus queue message purge \
  --resource-group rg-aks-prod \
  --namespace-name "$SB_NAMESPACE" \
  --queue-name incoming-orders

# Watch the deployment scale down (takes cooldownPeriod seconds: 60s in our config)
echo "Waiting for KEDA cooldown (60 seconds)..."
k get deployment order-processor -n orders -w

# After ~60-90 seconds:
# order-processor   20/0   20  20  2m
# order-processor   0/0    0   0   3m

# Verify final state
k get pods -n orders
# Expected: No resources found in orders namespace

# Verify the ScaledObject status
k describe scaledobject order-processor-scaler -n orders | grep -A5 "Status:"

echo "Scale-to-zero verified. Clean up when ready:"
echo "az group delete --name rg-aks-prod --yes --no-wait"

Success Criteria

Next Module

This is the final module in the AKS Deep Dive series. You now have the knowledge to architect, secure, network, observe, and scale production AKS clusters. For further learning, explore the Platform Engineering Track to deepen your understanding of SRE practices, GitOps workflows, and DevSecOps pipelines that build on this AKS foundation.