Skip to content

Module 9.5: Advanced Caching Services (ElastiCache / Memorystore)

Complexity: [COMPLEX] | Time to Complete: 2h | Prerequisites: Module 9.1 (Databases), Module 9.4 (Object Storage), Redis fundamentals

After completing this module, you will be able to:

  • Configure Kubernetes pods to connect to managed caching services (ElastiCache, Memorystore, Azure Cache for Redis)
  • Implement cache-aside, write-through, and write-behind patterns for applications running on Kubernetes
  • Deploy Redis Sentinel or Cluster mode configurations via managed services with private endpoint connectivity
  • Diagnose cache performance issues including connection pooling, serialization overhead, and hot key distribution

In November 2023, an e-commerce platform ran their product catalog API on GKE. Every product page required three database queries: product details, pricing, and reviews. During a flash sale, traffic jumped from 3,000 to 45,000 requests per second. Cloud SQL hit its connection limit at 15,000 connections. The auto-scaler was adding pods, but each new pod opened more database connections, making the problem worse. The site went down for 28 minutes during peak sale time. Estimated lost revenue: $2.3 million.

The team had a Redis cluster running in GKE — but it was a self-managed StatefulSet with 3 nodes and no monitoring. They did not discover until the postmortem that Redis had silently evicted 60% of the cache 20 minutes before the sale started due to a memory limit that nobody had reviewed since initial deployment. The database was hit with the full 45,000 RPS because the cache was effectively empty.

They migrated to Google Memorystore for Redis with proper memory sizing, connection limits, and eviction monitoring. The next sale handled 62,000 RPS with the database seeing only 800 QPS — a 98% cache hit rate. Caching is not optional for production Kubernetes workloads. It is the difference between your database being a bottleneck and your database being a safety net.


FactorRedisMemcached
Data structuresStrings, hashes, lists, sets, sorted sets, streamsStrings only (key-value)
PersistenceOptional (RDB snapshots, AOF)None (pure cache)
ReplicationPrimary-replica with automatic failoverNone (each node independent)
ClusteringRedis Cluster (data sharding)Client-side sharding
Pub/SubBuilt-inNot available
Lua scriptingYesNo
Max item size512 MB1 MB (default)
Multi-threadedSingle-threaded (I/O threads since 6.0)Multi-threaded
Best forComplex caching, sessions, leaderboards, pub/subSimple key-value, large working sets, multi-threaded reads

For 90% of Kubernetes workloads, Redis is the right choice. Memcached is simpler but far less capable. Choose Memcached only when you need pure key-value caching at extremely high throughput with no need for data structures, persistence, or replication.

FeatureAWS ElastiCache RedisGCP Memorystore RedisAzure Cache for Redis
Max memory6.1 TB (cluster mode)300 GB (standard)1.2 TB (Enterprise)
Cluster modeYes (up to 500 shards)Yes (since 2024)Yes (Premium/Enterprise)
Multi-AZ failoverAutomaticAutomatic (Standard tier)Automatic (Premium+)
Encryption at restYes (KMS)Yes (CMEK)Yes (managed keys)
Encryption in transitTLSTLSTLS
VPC integrationVPC subnetsVPC networkVNET injection

The “right” caching strategy depends on your read/write ratio and consistency requirements.

The most common pattern. The application checks the cache first; on a miss, it reads from the database and populates the cache.

flowchart TD
A[Client Request] --> B{Check Cache}
B -- hit --> C[Return cached data]
B -- miss --> D[Query Database]
D --> E[Write to Cache]
E --> F[Return data]
import redis
import json
r = redis.Redis(host='redis-master.cache.svc', port=6379, decode_responses=True)
def get_product(product_id):
# Step 1: Check cache
cache_key = f"product:{product_id}"
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Step 2: Cache miss -- query database
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
# Step 3: Populate cache with TTL
r.setex(cache_key, 300, json.dumps(product)) # 5-minute TTL
return product

Pros: Only caches data that is actually requested. Simple to implement. Cons: Cache miss penalty (extra latency on first request). Data can become stale until TTL expires.

Every write goes to both the cache and the database simultaneously.

flowchart TD
A[Write Request] --> B[Write to Cache]
B --> C[Write to Database]
C --> D[Return success]
def update_product_price(product_id, new_price):
# Write to database first
db.execute("UPDATE products SET price = %s WHERE id = %s", new_price, product_id)
# Update cache (same transaction boundary)
product = db.query("SELECT * FROM products WHERE id = %s", product_id)
cache_key = f"product:{product_id}"
r.setex(cache_key, 300, json.dumps(product))
return product

Pros: Cache is always consistent with the database. No stale reads after writes. Cons: Write latency increases (two writes per operation). Caches data that may never be read.

Writes go to the cache immediately and are asynchronously flushed to the database.

flowchart TD
A[Write Request] --> B[Write to Cache]
B --> C[Return success fast!]
B -. async, batched .-> D[Flush to Database]

Pros: Extremely fast writes. Database load is smoothed by batching. Cons: Risk of data loss if cache fails before flush. Complex to implement correctly. Not suitable for critical data.

ScenarioStrategyWhy
Product catalog (read-heavy)Cache-asideMost reads, occasional writes
User sessionsWrite-throughMust be consistent after login/logout
Analytics countersWrite-behindHigh write volume, eventual consistency OK
API rate limitingCache-aside + TTLNatural expiration, no DB needed
Shopping cartWrite-throughConsistency critical for commerce
Leaderboard scoresCache-aside + sorted setsRedis sorted sets are purpose-built for this

Stop and think: You are designing a shopping cart service where every item added must be securely recorded, but users frequently refresh the page to view their cart. Which caching strategy provides the necessary consistency while handling the read traffic?


Terminal window
# Create ElastiCache Redis cluster
aws elasticache create-replication-group \
--replication-group-id app-cache \
--replication-group-description "App caching layer" \
--engine redis --engine-version 7.1 \
--cache-node-type cache.r7g.large \
--num-cache-clusters 3 \
--multi-az-enabled \
--automatic-failover-enabled \
--at-rest-encryption-enabled \
--transit-encryption-enabled \
--cache-subnet-group-name eks-cache-subnets \
--security-group-ids sg-0abc123def456
# Kubernetes Service for Redis endpoint
apiVersion: v1
kind: Service
metadata:
name: redis-primary
namespace: cache
spec:
type: ExternalName
externalName: app-cache.abc123.ng.0001.use1.cache.amazonaws.com
---
apiVersion: v1
kind: Service
metadata:
name: redis-reader
namespace: cache
spec:
type: ExternalName
externalName: app-cache-ro.abc123.ng.0001.use1.cache.amazonaws.com
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
spec:
replicas: 10
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: mycompany/api-server:3.0.0
env:
- name: REDIS_PRIMARY_HOST
value: redis-primary.cache.svc.cluster.local
- name: REDIS_READER_HOST
value: redis-reader.cache.svc.cluster.local
- name: REDIS_PORT
value: "6379"
- name: REDIS_TLS_ENABLED
value: "true"
- name: REDIS_AUTH_TOKEN
valueFrom:
secretKeyRef:
name: redis-auth
key: token
- name: REDIS_MAX_CONNECTIONS
value: "20"
- name: REDIS_CONNECT_TIMEOUT_MS
value: "2000"
- name: REDIS_COMMAND_TIMEOUT_MS
value: "500"
resources:
requests:
cpu: 500m
memory: 512Mi

Pause and predict: If you provision an ElastiCache Redis cluster with 3 nodes (1 primary, 2 replicas), how should your Kubernetes application route write commands versus read commands?


A cache stampede (also called “thundering herd”) happens when a popular cache key expires and hundreds of pods simultaneously query the database to rebuild it.

flowchart LR
subgraph Normal
A[100 pods] -->|Cache HIT| B[Return cached data]
B -.- C[DB: 0 queries]
end
subgraph Stampede
D[100 pods] -->|Cache MISS| E[100 database queries]
E --> F[100 cache writes]
F -.- G[DB: 100 queries]
end

Refresh the cache before it expires, with a probability that increases as the TTL decreases:

import random
import time
def get_with_per(key, ttl=300, beta=1.0):
"""Probabilistic early refresh to prevent stampedes."""
cached = r.get(key)
if cached:
data = json.loads(cached)
remaining_ttl = r.ttl(key)
# As TTL decreases, probability of refresh increases
# beta controls aggressiveness (higher = earlier refresh)
delta = ttl * beta * random.random()
if remaining_ttl < delta:
# This pod refreshes the cache early
return refresh_cache(key, ttl)
return data
return refresh_cache(key, ttl)
def refresh_cache(key, ttl):
data = db.query_product(key.split(':')[1])
r.setex(key, ttl, json.dumps(data))
return data

Only one pod rebuilds the cache; others wait or serve stale data:

def get_with_lock(key, ttl=300):
cached = r.get(key)
if cached:
return json.loads(cached)
lock_key = f"lock:{key}"
# Try to acquire lock (NX = set if not exists, EX = expiry)
acquired = r.set(lock_key, "1", nx=True, ex=10)
if acquired:
# This pod rebuilds the cache
try:
data = db.query_product(key.split(':')[1])
r.setex(key, ttl, json.dumps(data))
return data
finally:
r.delete(lock_key)
else:
# Another pod is rebuilding -- wait briefly, then retry
time.sleep(0.1)
cached = r.get(key)
if cached:
return json.loads(cached)
# Fallback: query database directly (rare)
return db.query_product(key.split(':')[1])

Cache entries never expire. A background process refreshes them on a schedule:

apiVersion: batch/v1
kind: CronJob
metadata:
name: cache-warmer
namespace: production
spec:
schedule: "*/4 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: warmer
image: mycompany/cache-warmer:1.0.0
env:
- name: REDIS_HOST
value: redis-primary.cache.svc.cluster.local
command:
- python
- -c
- |
import redis, json
r = redis.Redis(host='redis-primary.cache.svc.cluster.local')
# Refresh top 1000 products
products = db.query("SELECT id FROM products ORDER BY view_count DESC LIMIT 1000")
for p in products:
data = db.query_product(p['id'])
r.setex(f"product:{p['id']}", 600, json.dumps(data))
print(f"Warmed {len(products)} products")

A “hot key” occurs when a single Redis key receives a disproportionate amount of traffic. Because Redis is single-threaded for command execution, a hot key on a clustered Redis setup will overwhelm a single shard, causing high CPU utilization on one node while other nodes remain idle.

  1. CPU Monitoring: Monitor CPU utilization per shard. If one shard is at 99% CPU while others are at 10%, you likely have a hot key.
  2. Redis CLI: Use redis-cli --hotkeys (requires the maxmemory-policy to be an LFU policy like allkeys-lfu).
  3. Command Monitoring: Alternatively, use OBJECT FREQ <key> to check access frequencies. Avoid running the MONITOR command in production as it drastically reduces performance.
  • Local Caching: Cache the hot key in the application’s memory (e.g., using a local in-memory cache variable) for a few seconds to absorb the read spike before it hits Redis.
  • Key Duplication: Create copies of the key (e.g., product:123:1, product:123:2) and have clients randomly read from one of the copies to distribute load across multiple shards.

Stop and think: If a celebrity tweets a link to a specific product, creating a sudden massive read spike on that single product’s cache key, why won’t simply adding more Redis cluster nodes solve the performance issue?


Managed Redis instances have maximum connection limits based on instance size. Exceeding them causes connection refused errors.

Instance TypeMax ConnectionsWith 50 pods (20 conn each)Remaining
cache.r7g.large65,0001,00064,000
cache.r7g.xlarge65,0001,00064,000
cache.t4g.micro65,0001,00064,000

Redis connection limits are generous, but the bottleneck is often on the client side. Each connection consumes memory and a file descriptor in the pod.

# Good: Connection pool (shared connections)
import redis
pool = redis.ConnectionPool(
host='redis-primary.cache.svc.cluster.local',
port=6379,
max_connections=20, # Per pod
socket_timeout=2.0, # Fail fast
socket_connect_timeout=1.0,
retry_on_timeout=True,
health_check_interval=30,
ssl=True,
)
r = redis.Redis(connection_pool=pool)
# Bad: New connection per request (connection leak)
# r = redis.Redis(host='redis-primary.cache.svc.cluster.local') # DON'T
# PrometheusRule for Redis connection alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: redis-alerts
namespace: monitoring
spec:
groups:
- name: redis
rules:
- alert: RedisConnectionsHigh
expr: redis_connected_clients / redis_config_maxclients > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Redis connection usage above 80%"
- alert: RedisCacheHitRateLow
expr: |
rate(redis_keyspace_hits_total[5m]) /
(rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) < 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Redis cache hit rate below 90%"

Pause and predict: Your application is scaling up during a Black Friday event. If each of your 100 pods opens 50 concurrent Redis connections, and your Redis instance limit is 65,000, why might you still see connection errors during a rolling deployment?


For HTTP-based APIs, you can add caching at the proxy layer using Envoy as a sidecar. This caches responses without modifying application code.

flowchart TD
A[Client] --> B[K8s Service]
B --> C[Pod]
subgraph Pod
D[Envoy Sidecar port 8080]
E[(Local Cache)]
F[App Container port 8081]
D -->|hit| E
D -->|miss| F
end
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-cache-config
namespace: production
data:
envoy.yaml: |
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress
http_filters:
- name: envoy.filters.http.cache
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.cache.v3.CacheConfig
typed_config:
"@type": type.googleapis.com/envoy.extensions.http.cache.simple_http_cache.v3.SimpleHttpCacheConfig
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: local_app
clusters:
- name: local_app
type: STATIC
load_assignment:
cluster_name: local_app
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8081

Your application must return proper Cache-Control headers:

from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/api/products/<product_id>')
def get_product(product_id):
product = fetch_product(product_id)
response = jsonify(product)
response.headers['Cache-Control'] = 'public, max-age=60'
return response

Stop and think: What HTTP headers are absolutely essential for the Envoy sidecar cache filter to know how long to retain a response?


  1. Redis processes over 100,000 operations per second on a single core in typical workloads. Despite being single-threaded for command execution, Redis achieves this throughput because it operates entirely in memory and uses efficient data structures like skip lists and hash tables. Since version 6.0, I/O threading handles network reads/writes on separate threads while command execution remains single-threaded.

  2. The term “cache stampede” was formally studied in a 2009 paper titled “Optimal Probabilistic Cache Stampede Prevention” by Vattani, Chierichetti, and Lowenstein. The “probabilistic early expiration” technique from that paper is now standard practice at companies like Facebook, where cache stampedes on popular content could otherwise bring down entire database clusters.

  3. AWS ElastiCache Serverless (launched 2023) automatically scales Redis with no capacity planning required. It charges per ECPU (ElastiCache Compute Unit) and per GB of storage, eliminating the need to choose instance types. For workloads with variable traffic, this can reduce costs by 40-60% compared to provisioned instances.

  4. Google Memorystore for Redis Cluster became generally available in 2024, supporting up to 25 shards with 250 GB total capacity. Before this, GCP customers who needed Redis clustering had to self-manage Redis on GKE or use third-party services — a gap that existed for over five years.


MistakeWhy It HappensHow to Fix It
Not setting TTL on cache entries”We will invalidate manually”Always set TTL as a safety net, even with manual invalidation
Using the same Redis for cache and persistent data”One cluster is simpler”Separate cache (can be flushed) from persistent data (sessions, queues)
Ignoring memory eviction policyDefault is noeviction (errors when full)Set maxmemory-policy allkeys-lru for cache workloads
Opening new Redis connection per requestFramework default or developer habitUse connection pooling; configure max_connections per pod
No monitoring of cache hit rate”It is just a cache, it either works or it does not”Track hit rate, memory usage, evictions, and connection count
Caching errors/null resultsCache miss returns null, null gets cachedCheck for valid data before caching; use “negative cache” with short TTL only intentionally
No circuit breaker when Redis is downRedis failure cascades to database overloadImplement circuit breaker; serve stale data or degrade gracefully
Storing serialized objects larger than 100 KBConvenient to cache entire API responsesCache individual fields or use Redis hashes; large values cause latency spikes

1. An e-commerce site experiences heavy read traffic on its product catalog, but product details rarely change. They also have a shopping cart service that updates constantly. Which caching strategies should they apply to each service, and why?

For the product catalog, they should use the cache-aside (lazy loading) pattern. This pattern only caches data when it is requested, making it ideal for read-heavy workloads where most data is rarely accessed or updated, thus saving memory and reducing initial write overhead. For the shopping cart, they should use the write-through pattern. This pattern writes to both the cache and the database on every write operation. It ensures strict consistency between the cache and database, which is critical for commerce where reading stale cart data could lead to lost sales or customer frustration. The higher write latency is an acceptable trade-off for this consistency.

2. Your marketing team sends out a push notification to 5 million users about a 90% off flash sale on a specific gaming console. The console's cache key expires exactly as the notification lands. Your database instantly crashes. What caused this, and how could you have architected the application to prevent it?

This was caused by a cache stampede. When the popular cache key expired, thousands of concurrent requests all missed the cache and simultaneously queried the database to rebuild it, overwhelming its connection limits. To prevent this, you could implement a distributed locking strategy. When the cache miss occurs, the first request acquires a Redis lock and queries the database, while all other requests either wait briefly or return slightly stale data. Alternatively, you could use probabilistic early expiration, where requests have an increasing chance of refreshing the cache before it actually expires, spreading the database load over time.

3. A junior engineer proposes saving money by running the application's user session data and its rendered HTML page cache on the exact same Redis cluster, as "they both just store key-value pairs." Why is this architectural decision dangerous for production reliability?

This decision is dangerous because caches and persistent data stores have fundamentally different lifecycles and memory requirements. A cache is designed to be ephemeral and can be safely flushed or evicted without data loss, as the database remains the source of truth. User sessions, however, are persistent data that cannot be easily regenerated; losing them logs out users. If placed on the same cluster, the heavy memory pressure from the HTML page cache would trigger Redis’s eviction policies (like allkeys-lru), potentially deleting active user sessions to make room for cached pages.

4. You are provisioning an ElastiCache Redis instance that supports up to 65,000 connections. Your application has 100 pods, each configured with a connection pool size of 50. During a rolling deployment, the database operations team alerts you that Redis connections are being refused. What went wrong with your connection budgeting?

Your connection budget failed to account for the overlapping pods during a rolling deployment. While 100 pods with 50 connections each require 5,000 connections (well below the 65,000 limit), a rolling update can temporarily double the number of pods to 200, requiring 10,000 connections. Furthermore, if applications leak connections or if timeouts are configured improperly, old pods may not release their connections promptly. You must always calculate the budget based on the maximum possible simultaneous pods (including surge pods during deployments) plus overhead for monitoring and sidecars.

5. Your company acquired a startup running a monolithic legacy API written in a proprietary language that no one knows how to safely modify. The API is crushing its backend database under read load. How can you implement caching for this API without touching a single line of its code?

You can implement caching by injecting an Envoy sidecar proxy into the legacy application’s Kubernetes pods. Envoy can be configured with an HTTP cache filter that intercepts incoming requests before they reach the application container. If a request matches a cached response, Envoy serves it directly, entirely bypassing the application and the database. This approach requires no code changes, relying instead on standard HTTP Cache-Control headers (if the app emits them) or custom routing rules defined in the Envoy configuration to cache the REST API responses at the network layer.

6. Your cache-aside implementation is throwing intermittent timeouts, and the database is seeing elevated load. You check the Redis cluster and see it is at 100% memory utilization with the `maxmemory-policy` set to `noeviction`. How is this policy directly causing your application's symptoms?

The noeviction policy tells Redis to return an Out of Memory (OOM) error for any write command when it is full, rather than making space. Because your application uses the cache-aside pattern, every cache miss results in a database query followed by an attempt to write the result to Redis. Since Redis rejects the write, the data is never cached. Subsequent requests for the same data result in more cache misses and more database queries, causing the elevated database load. For a cache workload, you must use a policy like allkeys-lru or volatile-lru so Redis automatically deletes old entries to make room for new ones.


Hands-On Exercise: Redis Caching with Stampede Prevention

Section titled “Hands-On Exercise: Redis Caching with Stampede Prevention”
Terminal window
# Create kind cluster
kind create cluster --name cache-lab

Task 1: Provision Managed Redis via CLI Simulation

Section titled “Task 1: Provision Managed Redis via CLI Simulation”

Before deploying the application, let’s practice provisioning a managed Redis instance. While we use Helm for local testing, the command syntax mirrors cloud CLI tools.

Solution
Terminal window
# In an AWS environment, you would use:
# aws elasticache create-replication-group \
# --replication-group-id cache-lab-cluster \
# --engine redis --cache-node-type cache.t4g.micro \
# --num-cache-clusters 1
# For our local Kubernetes lab, we simulate the managed service via Helm:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
--namespace cache --create-namespace \
--set architecture=standalone \
--set auth.password=cache-lab-pass \
--set master.persistence.enabled=false \
--set master.resources.requests.memory=128Mi
k wait --for=condition=ready pod -l app.kubernetes.io/name=redis \
--namespace cache --timeout=120s

Deploy a pod that demonstrates cache-aside with Redis.

Solution
Terminal window
cat <<'EOF' | k apply -n cache -f -
apiVersion: v1
kind: Pod
metadata:
name: cache-aside-demo
spec:
restartPolicy: Never
containers:
- name: demo
image: python:3.12-slim
command:
- /bin/sh
- -c
- |
pip install redis -q
python3 << 'PYEOF'
import redis
import json
import time
r = redis.Redis(
host='redis-master.cache.svc.cluster.local',
port=6379,
password='cache-lab-pass',
decode_responses=True,
socket_timeout=2.0,
max_connections=10,
)
# Simulated database
DATABASE = {
"prod-101": {"name": "Widget Pro", "price": 29.99, "stock": 150},
"prod-102": {"name": "Gadget Max", "price": 49.99, "stock": 75},
"prod-103": {"name": "Tool Kit", "price": 89.99, "stock": 200},
}
def get_product(product_id):
"""Cache-aside pattern."""
cache_key = f"product:{product_id}"
# Step 1: Check cache
cached = r.get(cache_key)
if cached:
print(f" CACHE HIT: {product_id}")
return json.loads(cached)
# Step 2: Cache miss -- "query database"
print(f" CACHE MISS: {product_id} (querying DB)")
time.sleep(0.05) # Simulate DB latency
product = DATABASE.get(product_id)
if product:
# Step 3: Populate cache (TTL = 60 seconds)
r.setex(cache_key, 60, json.dumps(product))
return product
# Demo: First call is a miss, second is a hit
print("=== Cache-Aside Demo ===")
for round_num in range(1, 4):
print(f"\nRound {round_num}:")
for pid in ["prod-101", "prod-102", "prod-103"]:
result = get_product(pid)
# Show cache stats
info = r.info("stats")
print(f"\nHits: {info['keyspace_hits']}, Misses: {info['keyspace_misses']}")
hit_rate = info['keyspace_hits'] / (info['keyspace_hits'] + info['keyspace_misses']) * 100
print(f"Hit rate: {hit_rate:.1f}%")
PYEOF
EOF
k wait --for=condition=ready pod/cache-aside-demo -n cache --timeout=60s
sleep 5
k logs cache-aside-demo -n cache

Simulate a stampede by launching many concurrent requests after a cache key expires.

Solution
Terminal window
cat <<'EOF' | k apply -n cache -f -
apiVersion: v1
kind: Pod
metadata:
name: stampede-demo
spec:
restartPolicy: Never
containers:
- name: demo
image: python:3.12-slim
command:
- /bin/sh
- -c
- |
pip install redis -q
python3 << 'PYEOF'
import redis
import json
import time
import threading
r = redis.Redis(
host='redis-master.cache.svc.cluster.local',
port=6379,
password='cache-lab-pass',
decode_responses=True,
)
db_queries = {"count": 0}
lock = threading.Lock()
def simulate_db_query():
"""Simulate an expensive database query."""
with lock:
db_queries["count"] += 1
time.sleep(0.1) # 100ms DB latency
return {"name": "Popular Product", "price": 99.99}
def get_without_protection(product_id):
"""No stampede protection -- every miss hits DB."""
cached = r.get(f"product:{product_id}")
if cached:
return json.loads(cached)
data = simulate_db_query()
r.setex(f"product:{product_id}", 5, json.dumps(data))
return data
def get_with_lock_protection(product_id):
"""Distributed lock prevents stampede."""
cache_key = f"product:{product_id}"
cached = r.get(cache_key)
if cached:
return json.loads(cached)
lock_key = f"lock:{cache_key}"
acquired = r.set(lock_key, "1", nx=True, ex=5)
if acquired:
try:
data = simulate_db_query()
r.setex(cache_key, 5, json.dumps(data))
return data
finally:
r.delete(lock_key)
else:
time.sleep(0.15) # Wait for rebuilder
cached = r.get(cache_key)
return json.loads(cached) if cached else simulate_db_query()
# Test 1: Without protection
r.flushall()
db_queries["count"] = 0
threads = []
for i in range(50):
t = threading.Thread(target=get_without_protection, args=("hot-product",))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"WITHOUT protection: {db_queries['count']} DB queries from 50 requests")
# Test 2: With lock protection
r.flushall()
db_queries["count"] = 0
threads = []
for i in range(50):
t = threading.Thread(target=get_with_lock_protection, args=("hot-product",))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"WITH lock protection: {db_queries['count']} DB queries from 50 requests")
PYEOF
EOF
k wait --for=condition=ready pod/stampede-demo -n cache --timeout=60s
sleep 10
k logs stampede-demo -n cache

Create a pod that reports Redis statistics.

Solution
Terminal window
cat <<'EOF' | k apply -n cache -f -
apiVersion: v1
kind: Pod
metadata:
name: redis-monitor
spec:
restartPolicy: Never
containers:
- name: monitor
image: redis:7
command:
- /bin/sh
- -c
- |
echo "=== Redis Health Report ==="
echo ""
echo "--- Memory ---"
redis-cli -h redis-master -a cache-lab-pass INFO memory 2>/dev/null | grep -E "used_memory_human|maxmemory_human|mem_fragmentation"
echo ""
echo "--- Clients ---"
redis-cli -h redis-master -a cache-lab-pass INFO clients 2>/dev/null | grep -E "connected_clients|blocked_clients|maxclients"
echo ""
echo "--- Stats ---"
redis-cli -h redis-master -a cache-lab-pass INFO stats 2>/dev/null | grep -E "keyspace_hits|keyspace_misses|evicted_keys|total_commands"
echo ""
echo "--- Keyspace ---"
redis-cli -h redis-master -a cache-lab-pass INFO keyspace 2>/dev/null
echo ""
echo "--- Eviction Policy ---"
redis-cli -h redis-master -a cache-lab-pass CONFIG GET maxmemory-policy 2>/dev/null
echo ""
echo "=== Report Complete ==="
EOF
k wait --for=condition=ready pod/redis-monitor -n cache --timeout=30s
sleep 3
k logs redis-monitor -n cache

Change the Redis eviction policy and demonstrate eviction behavior.

Solution
Terminal window
cat <<'EOF' | k apply -n cache -f -
apiVersion: v1
kind: Pod
metadata:
name: eviction-demo
spec:
restartPolicy: Never
containers:
- name: demo
image: python:3.12-slim
command:
- /bin/sh
- -c
- |
pip install redis -q
python3 << 'PYEOF'
import redis
r = redis.Redis(
host='redis-master.cache.svc.cluster.local',
port=6379,
password='cache-lab-pass',
decode_responses=True,
)
# Set a small maxmemory for demonstration
r.config_set('maxmemory', '1mb')
r.config_set('maxmemory-policy', 'allkeys-lru')
print("Set maxmemory=1MB, policy=allkeys-lru")
# Fill cache until evictions happen
evicted_before = int(r.info('stats')['evicted_keys'])
for i in range(5000):
r.set(f"item:{i}", "x" * 200) # ~200 bytes each
evicted_after = int(r.info('stats')['evicted_keys'])
total_keys = r.dbsize()
print(f"Attempted to write 5000 keys")
print(f"Keys in Redis: {total_keys}")
print(f"Keys evicted: {evicted_after - evicted_before}")
print(f"Eviction policy working correctly: {evicted_after > evicted_before}")
# Reset maxmemory
r.config_set('maxmemory', '0')
r.flushall()
PYEOF
EOF
k wait --for=condition=ready pod/eviction-demo -n cache --timeout=60s
sleep 8
k logs eviction-demo -n cache
  • Redis cluster is successfully provisioned via CLI simulation
  • Cache-aside demo shows cache hits on second and third rounds
  • Stampede demo shows fewer DB queries with lock protection
  • Redis monitor reports memory, clients, and keyspace stats
  • Eviction demo shows allkeys-lru evicting keys when memory is full
Terminal window
kind delete cluster --name cache-lab

Next Module: Module 9.6: Search & Analytics Engines (OpenSearch / Elasticsearch) — Learn how to ingest Kubernetes logs into managed search engines, configure index lifecycle management, and optimize queries for operational analytics.