Module 9.4: Object Storage Patterns (S3 / GCS / Blob)
Complexity: [MEDIUM] | Time to Complete: 2h | Prerequisites: Module 9.1 (Databases), Kubernetes PersistentVolumes and CSI concepts
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Implement CSI-based object storage mounting (Mountpoint for S3, GCS FUSE, Azure Blob CSI) for Kubernetes workloads
- Configure lifecycle policies and intelligent tiering across S3, GCS, and Azure Blob for cost-optimized data pipelines
- Deploy object storage access patterns using presigned URLs, workload identity, and IRSA/Workload Identity integration
- Design object storage replication, versioning, and lifecycle strategies for robust data protection and disaster recovery
Why This Module Matters
Section titled “Why This Module Matters”In January 2024, a media streaming company stored 4.2 petabytes of video content in Amazon S3. Their transcoding pipeline ran on EKS — 60 pods processing uploaded videos into multiple formats. The architecture worked, but their S3 costs were $127,000 per month. A junior engineer noticed that 78% of the data had not been accessed in over 90 days. The team implemented S3 Lifecycle policies, moving cold content to S3 Glacier Instant Retrieval. Monthly costs dropped to $41,000 — a $86,000/month saving from a 15-line configuration change.
In the same cluster, the application team was generating pre-signed URLs for video playback. A misconfiguration set the URL expiration to 30 days instead of 4 hours. A security audit discovered that shared URLs were being forwarded and reused across the internet, effectively giving unauthenticated users perpetual access to premium content. The fix took five minutes; the brand damage took months to recover from.
Object storage is deceptively simple — “just upload a file.” But from Kubernetes, the integration patterns are rich and the pitfalls are expensive. This module teaches you how to access S3, GCS, and Azure Blob from pods using workload identity, CSI drivers for filesystem-style access, pre-signed URLs for secure client-side access, lifecycle policies for cost optimization, cross-region replication for disaster recovery, and bucket security hardening.
Pod-to-Storage Access Patterns
Section titled “Pod-to-Storage Access Patterns”There are three primary ways Kubernetes pods interact with object storage:
graph TD subgraph Pattern 1: SDK/API Access P1[Pod] -->|AWS SDK / gcloud SDK| API1[S3/GCS/Blob API] end
subgraph Pattern 2: CSI Driver Mount P2[Pod] -->|/mnt/data/| CSI[CSI Driver] CSI -->|Filesystem Illusion| API2[S3/GCS/Blob] end
subgraph Pattern 3: Pre-signed URL P3[Pod] -->|Generates URL| C[Client] C -->|Direct Upload/Download| API3[S3/GCS/Blob] endPattern 1: SDK Access with Workload Identity
Section titled “Pattern 1: SDK Access with Workload Identity”The most common and flexible pattern. Your application uses the cloud SDK to interact with the storage API directly.
# AWS: Pod with IRSA for S3 accessapiVersion: v1kind: ServiceAccountmetadata: name: storage-writer namespace: production annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/S3WriterRole---apiVersion: apps/v1kind: Deploymentmetadata: name: video-processor namespace: productionspec: replicas: 5 selector: matchLabels: app: video-processor template: metadata: labels: app: video-processor spec: serviceAccountName: storage-writer containers: - name: processor image: mycompany/video-processor:3.1.0 env: - name: S3_BUCKET value: video-content-prod - name: S3_REGION value: us-east-1 resources: requests: cpu: "2" memory: 4GiThe IAM policy for the role:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::video-content-prod", "arn:aws:s3:::video-content-prod/*" ] } ]}GCP Workload Identity for GCS
Section titled “GCP Workload Identity for GCS”apiVersion: v1kind: ServiceAccountmetadata: name: gcs-writer namespace: production annotations: iam.gke.io/gcp-service-account: gcs-writer@my-project.iam.gserviceaccount.com# Bind the Kubernetes SA to the GCP SAgcloud iam service-accounts add-iam-policy-binding \ gcs-writer@my-project.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:my-project.svc.id.goog[production/gcs-writer]"
# Grant GCS access to the GCP SAgcloud storage buckets add-iam-policy-binding gs://video-content-prod \ --member="serviceAccount:gcs-writer@my-project.iam.gserviceaccount.com" \ --role="roles/storage.objectUser"Azure Workload Identity for Blob
Section titled “Azure Workload Identity for Blob”apiVersion: v1kind: ServiceAccountmetadata: name: blob-writer namespace: production annotations: azure.workload.identity/client-id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890" labels: azure.workload.identity/use: "true"# Create federated credentialaz identity federated-credential create \ --identity-name blob-writer-identity \ --resource-group myRG \ --issuer "https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E" \ --subject system:serviceaccount:production:blob-writer
# Assign Storage Blob Data Contributor roleaz role assignment create \ --assignee-object-id $(az identity show -n blob-writer-identity -g myRG --query principalId -o tsv) \ --role "Storage Blob Data Contributor" \ --scope "/subscriptions/SUB_ID/resourceGroups/myRG/providers/Microsoft.Storage/storageAccounts/videostorage"CSI Drivers: Mounting Object Storage as a Filesystem
Section titled “CSI Drivers: Mounting Object Storage as a Filesystem”Sometimes your application expects a filesystem path, not an SDK. CSI drivers bridge this gap by presenting object storage as a POSIX-like mount.
Mountpoint for Amazon S3 CSI Driver
Section titled “Mountpoint for Amazon S3 CSI Driver”# Install the driver as an EKS add-onaws eks create-addon \ --cluster-name my-cluster \ --addon-name aws-mountpoint-s3-csi-driver \ --service-account-role-arn arn:aws:iam::123456789:role/S3CSIDriverRole# StorageClass for S3apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: s3-storageprovisioner: s3.csi.aws.comparameters: bucketName: data-pipeline-prodmountOptions: - allow-delete - region us-east-1 - prefix data/---# PersistentVolumeClaimapiVersion: v1kind: PersistentVolumeClaimmetadata: name: s3-data namespace: productionspec: accessModes: - ReadWriteMany storageClassName: s3-storage resources: requests: storage: 1Ti # Not enforced -- S3 is unlimited, but required by K8s API---# Pod using the mountapiVersion: v1kind: Podmetadata: name: data-processor namespace: productionspec: serviceAccountName: storage-writer containers: - name: processor image: mycompany/data-processor:1.0.0 volumeMounts: - name: s3-data mountPath: /mnt/data command: - /bin/sh - -c - | # Read files from S3 as if they were local ls /mnt/data/ cat /mnt/data/config.json
# Write files -- they appear in S3 echo '{"processed": true}' > /mnt/data/output/result.json volumes: - name: s3-data persistentVolumeClaim: claimName: s3-dataCSI Driver Limitations
Section titled “CSI Driver Limitations”| Feature | Mountpoint for S3 | GCS FUSE | Azure Blob CSI |
|---|---|---|---|
| Read performance | Good (sequential) | Good | Good |
| Write performance | Good (new files) | Moderate | Good |
| Random I/O | Poor (not a block device) | Poor | Poor |
| Rename/move | Not atomic | Not atomic | Not atomic |
| Hard links | Not supported | Not supported | Not supported |
| File locking | Not supported | Not supported | Not supported |
| Best for | Data pipelines, ML training data | Data analytics | Batch processing |
Critical warning: Object storage CSI mounts are NOT suitable for databases, caches, or any workload requiring random I/O, atomic operations, or POSIX compliance. Use them for read-heavy data pipelines and write-once-read-many workloads.
Stop and think: Your team is deploying a new PostgreSQL database to Kubernetes. A junior engineer suggests using the S3 CSI driver to store the data files “so we never run out of disk space.” What is the technical reason you must reject this proposal, and what should you use instead?
GCS FUSE CSI Driver
Section titled “GCS FUSE CSI Driver”# GKE: Enable GCS FUSE on the cluster# gcloud container clusters update my-cluster \# --update-addons GcsFuseCsiDriver=ENABLED
apiVersion: v1kind: Podmetadata: name: ml-trainer namespace: ml annotations: gke-gcsfuse/volumes: "true" gke-gcsfuse/cpu-limit: "500m" gke-gcsfuse/memory-limit: "256Mi"spec: serviceAccountName: gcs-writer containers: - name: trainer image: mycompany/ml-trainer:2.0.0 volumeMounts: - name: training-data mountPath: /data readOnly: true volumes: - name: training-data csi: driver: gcsfuse.csi.storage.gke.io readOnly: true volumeAttributes: bucketName: ml-training-data mountOptions: "implicit-dirs"Pre-Signed URLs: Secure Direct Access
Section titled “Pre-Signed URLs: Secure Direct Access”Pre-signed URLs allow clients to upload or download directly from object storage without passing through your Kubernetes pods. This offloads bandwidth from your cluster and reduces latency.
Architecture
Section titled “Architecture”sequenceDiagram participant C as Client participant P as K8s Pod participant S as S3/GCS/Blob
C->>P: POST /uploads Note over P: Generates pre-signed PUT URL P-->>C: Returns pre-signed URL C->>S: PUT file (direct upload) S-->>C: 200 OK Note over S: Triggers event notification Note over S: Processing pipeline triggersPause and predict: If a user uploads a 5GB video file directly through your Kubernetes API pod instead of using a pre-signed URL, what specific resource bottlenecks might occur in your cluster?
Generating Pre-Signed URLs
Section titled “Generating Pre-Signed URLs”# AWS S3 pre-signed URL generationimport boto3from datetime import timedelta
s3 = boto3.client('s3')
def generate_upload_url(filename, content_type): """Generate a pre-signed URL for direct client upload.""" key = f"uploads/{filename}"
url = s3.generate_presigned_url( 'put_object', Params={ 'Bucket': 'user-uploads-prod', 'Key': key, 'ContentType': content_type, 'ServerSideEncryption': 'aws:kms', }, ExpiresIn=3600, # 1 hour -- NOT 30 days! HttpMethod='PUT' ) return url
def generate_download_url(key): """Generate a pre-signed URL for client download.""" url = s3.generate_presigned_url( 'get_object', Params={ 'Bucket': 'user-uploads-prod', 'Key': key, }, ExpiresIn=14400, # 4 hours ) return url# GCS pre-signed URL generationfrom google.cloud import storagefrom datetime import timedelta
client = storage.Client()bucket = client.bucket('user-uploads-prod')
def generate_upload_url(filename, content_type): blob = bucket.blob(f"uploads/{filename}") url = blob.generate_signed_url( version="v4", expiration=timedelta(hours=1), method="PUT", content_type=content_type, ) return urlPre-Signed URL Security Best Practices
Section titled “Pre-Signed URL Security Best Practices”| Practice | Why |
|---|---|
| Set short expiration (1-4 hours for downloads, 15-60 min for uploads) | Limits exposure window if URL is leaked |
| Restrict Content-Type in upload URLs | Prevents uploading unexpected file types |
| Use separate buckets for uploads vs processed content | Isolates raw uploads from verified content |
| Require server-side encryption in the URL parameters | Ensures all uploads are encrypted at rest |
| Log all pre-signed URL generations | Audit trail for access tracking |
| Never expose bucket credentials; only expose URLs | Pre-signed URLs are scoped and temporary |
Lifecycle Policies for Cost Optimization
Section titled “Lifecycle Policies for Cost Optimization”Object storage costs are dominated by storage volume, not access. Moving infrequently accessed data to cheaper tiers can save 60-90%.
Storage Tier Comparison
Section titled “Storage Tier Comparison”| Tier | AWS | GCP | Azure | Cost (per GB/month) | Use Case |
|---|---|---|---|---|---|
| Hot | S3 Standard | Standard | Hot | $0.023 | Frequently accessed |
| Infrequent | S3 Standard-IA | Nearline | Cool | $0.0125 | Monthly access |
| Archive | S3 Glacier IR | Coldline | Cold | $0.004 | Quarterly access |
| Deep archive | S3 Glacier Deep | Archive | Archive | $0.00099 | Yearly/compliance |
AWS S3 Lifecycle Configuration
Section titled “AWS S3 Lifecycle Configuration”{ "Rules": [ { "ID": "optimize-video-storage", "Status": "Enabled", "Filter": { "Prefix": "videos/" }, "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 90, "StorageClass": "GLACIER_IR" }, { "Days": 365, "StorageClass": "DEEP_ARCHIVE" } ] }, { "ID": "cleanup-temp-uploads", "Status": "Enabled", "Filter": { "Prefix": "tmp-uploads/" }, "Expiration": { "Days": 7 }, "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 1 } } ]}aws s3api put-bucket-lifecycle-configuration \ --bucket video-content-prod \ --lifecycle-configuration file://lifecycle.jsonGCS Lifecycle
Section titled “GCS Lifecycle”cat > /tmp/gcs-lifecycle.json << 'EOF'{ "lifecycle": { "rule": [ { "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 30, "matchesPrefix": ["videos/"]} }, { "action": {"type": "SetStorageClass", "storageClass": "COLDLINE"}, "condition": {"age": 90, "matchesPrefix": ["videos/"]} }, { "action": {"type": "Delete"}, "condition": {"age": 7, "matchesPrefix": ["tmp-uploads/"]} } ] }}EOF
gcloud storage buckets update gs://video-content-prod \ --lifecycle-file=/tmp/gcs-lifecycle.jsonIncomplete Multipart Upload Cleanup
Section titled “Incomplete Multipart Upload Cleanup”One of the most overlooked cost leaks: incomplete multipart uploads. When a large upload fails midway, the partial parts sit in S3 forever, incurring storage charges. The lifecycle rule AbortIncompleteMultipartUpload cleans these up automatically.
# Check for incomplete multipart uploadsaws s3api list-multipart-uploads --bucket video-content-prod
# You may be shocked at how many orphaned parts existCross-Region Replication
Section titled “Cross-Region Replication”For disaster recovery or serving content from multiple regions, cross-region replication copies objects automatically.
AWS S3 Cross-Region Replication
Section titled “AWS S3 Cross-Region Replication”# Enable versioning (required for replication)aws s3api put-bucket-versioning \ --bucket video-content-prod \ --versioning-configuration Status=Enabled
aws s3api put-bucket-versioning \ --bucket video-content-dr \ --versioning-configuration Status=Enabled
# Create replication configurationcat > /tmp/replication.json << 'EOF'{ "Role": "arn:aws:iam::123456789:role/S3ReplicationRole", "Rules": [ { "ID": "dr-replication", "Status": "Enabled", "Filter": { "Prefix": "" }, "Destination": { "Bucket": "arn:aws:s3:::video-content-dr", "StorageClass": "STANDARD_IA", "ReplicationTime": { "Status": "Enabled", "Time": {"Minutes": 15} }, "Metrics": { "Status": "Enabled", "EventThreshold": {"Minutes": 15} } }, "DeleteMarkerReplication": { "Status": "Enabled" } } ]}EOF
aws s3api put-bucket-replication \ --bucket video-content-prod \ --replication-configuration file:///tmp/replication.jsonMulti-Region Access from Kubernetes
Section titled “Multi-Region Access from Kubernetes”When pods in different regions need the closest bucket:
# Region-specific ConfigMapapiVersion: v1kind: ConfigMapmetadata: name: storage-config namespace: productiondata: BUCKET_NAME: "video-content-prod" # US region # In EU cluster, this would be: "video-content-eu" BUCKET_REGION: "us-east-1"For AWS, S3 Multi-Region Access Points provide a single endpoint that automatically routes to the nearest bucket:
aws s3control create-multi-region-access-point \ --account-id 123456789 \ --details '{ "Name": "video-global", "Regions": [ {"Bucket": "video-content-prod"}, {"Bucket": "video-content-eu"}, {"Bucket": "video-content-ap"} ] }'Bucket Security Hardening
Section titled “Bucket Security Hardening”Defense-in-Depth Configuration
Section titled “Defense-in-Depth Configuration”# 1. Block all public access (do this first, always)aws s3api put-public-access-block \ --bucket video-content-prod \ --public-access-block-configuration \ BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# 2. Enable default encryption with KMSaws s3api put-bucket-encryption \ --bucket video-content-prod \ --server-side-encryption-configuration '{ "Rules": [{"ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "alias/s3-encryption-key" }, "BucketKeyEnabled": true}] }'
# 3. Enable access loggingaws s3api put-bucket-logging \ --bucket video-content-prod \ --bucket-logging-status '{ "LoggingEnabled": { "TargetBucket": "access-logs-prod", "TargetPrefix": "s3/video-content-prod/" } }'
# 4. Enable versioning (protects against accidental deletion)aws s3api put-bucket-versioning \ --bucket video-content-prod \ --versioning-configuration Status=Enabled
# 5. Require TLS (deny non-HTTPS requests)aws s3api put-bucket-policy --bucket video-content-prod \ --policy '{ "Version": "2012-10-17", "Statement": [{ "Sid": "DenyNonHTTPS", "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": [ "arn:aws:s3:::video-content-prod", "arn:aws:s3:::video-content-prod/*" ], "Condition": {"Bool": {"aws:SecureTransport": "false"}} }] }'Bucket Security Checklist
Section titled “Bucket Security Checklist”| Control | AWS | GCP | Azure |
|---|---|---|---|
| Block public access | Public Access Block | Uniform bucket-level access | Disable anonymous access |
| Encryption at rest | SSE-S3/SSE-KMS | Google-managed/CMEK | Microsoft-managed/CMK |
| Encryption in transit | Enforce HTTPS via bucket policy | HTTPS by default | Require secure transfer |
| Access logging | Server access logging | Cloud Audit Logs | Diagnostic logs |
| Versioning | Bucket versioning | Object versioning | Blob versioning |
| Immutability | Object Lock | Retention policies | Immutable storage |
Did You Know?
Section titled “Did You Know?”-
Amazon S3 stores over 350 trillion objects as of 2025 and handles tens of millions of requests per second. S3 was designed to provide 99.999999999% (11 nines) durability, meaning you would statistically lose one object per 10 million years if you stored 10 million objects.
-
Incomplete multipart uploads are a hidden cost bomb. A 2023 study by Vantage found that 15% of companies surveyed had over $10,000/month in charges from orphaned multipart upload parts. Most had no idea these partial uploads existed until they added lifecycle rules to clean them up.
-
GCS FUSE can cache frequently-read files on local SSD, reducing read latency from ~50ms (network) to ~1ms (local). This makes it practical for ML training workloads that read the same dataset files thousands of times per epoch. The cache is configured via annotations on the pod.
-
Azure Blob Storage supports “immutable storage” with legal hold and time-based retention that even a subscription owner cannot override. This is used by financial institutions for SEC 17a-4 compliance, where records must be stored in a non-erasable, non-rewritable format for specified retention periods.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Using CSI mount for database files | ”It mounts like a disk, right?” | CSI object storage mounts lack POSIX semantics; use EBS/PD for databases |
| Setting pre-signed URL expiration to 30 days | Copy-pasted from example code | Use 1-4 hours for downloads, 15-60 minutes for uploads |
| Not blocking public access on new buckets | Default is private, but one wrong policy makes it public | Enable account-level public access block as a guardrail |
| Ignoring incomplete multipart uploads | Not visible in normal S3 listings | Add AbortIncompleteMultipartUpload lifecycle rule to every bucket |
| Using IAM user access keys instead of workload identity | ”Quickest way to get it working” | Use IRSA (EKS), Workload Identity (GKE), or Workload Identity Federation (AKS) |
| Not enabling versioning before replication | Replication requires versioning, easy to forget | Script bucket creation to always enable versioning |
| Downloading large files through the pod when pre-signed URLs exist | Simpler code path | Generate pre-signed URLs to offload bandwidth; your pod should not proxy large files |
| No lifecycle policy on any bucket | ”We will clean up later” | Define lifecycle rules at bucket creation time; “later” never comes |
1. Your team is migrating three workloads: a legacy log analyzer that requires local file paths, a new Go microservice, and a heavy video-upload portal. Which object storage access pattern should you choose for each, and why?
For the legacy log analyzer, use a CSI driver mount because the application expects a POSIX-like filesystem interface and rewriting it to use an SDK might not be feasible. For the new Go microservice, use SDK/API access with Workload Identity, as this is the most flexible, secure, and native way to interact with object storage APIs. For the video-upload portal, use pre-signed URLs to allow clients to upload directly to the bucket. This offloads massive bandwidth requirements from your Kubernetes cluster, preventing node network saturation and reducing latency.
2. A developer proposes using the Mountpoint for S3 CSI driver to host a MySQL database's `/var/lib/mysql` directory to save on EBS costs. Why will this deployment immediately fail or cause data corruption?
Object storage CSI drivers present a filesystem interface, but they fundamentally lack critical POSIX semantics required by database engines. They do not support random I/O (seeking and modifying within files), atomic rename operations, or file locking, which are all mandatory for write-ahead logs and concurrency control. When MySQL attempts to perform an atomic write or lock a row file, the operation will either fail outright or silently complete without actual atomicity, leading to instantaneous data corruption. You must use block storage like EBS or Persistent Disk for databases.
3. After six months in production, your cloud bill shows S3 storage costs are double what the actual total size of your active objects should dictate. What silent mechanism likely caused this, and how do you permanently fix it?
The hidden cost is almost certainly caused by incomplete multipart uploads. When large file uploads fail or are interrupted mid-transfer, the partial chunks remain stored in the bucket indefinitely but are completely invisible to standard list-objects API calls. Because they take up physical space, the cloud provider continues to charge you for them month over month. To fix this permanently, you must configure a bucket lifecycle rule such as AbortIncompleteMultipartUpload set to 1-7 days, which automatically purges any orphaned upload fragments.
4. Your mobile app needs to download user-specific avatars from a private GCS bucket. A junior developer suggests embedding a read-only service account key in the app code. Why is this a severe security risk, and why are pre-signed URLs the correct architectural choice?
Embedding service account keys in client code is a critical vulnerability because malicious actors can extract the key, granting them permanent, unrestricted read access to potentially the entire bucket or project. Pre-signed URLs eliminate this risk by delegating access dynamically without exposing credentials. The URL encodes a cryptographic signature valid for only a specific object and a strict, limited time window (e.g., 15 minutes). Even if a pre-signed URL is intercepted, the blast radius is contained to a single file, and the access automatically expires.
5. You operate active-active Kubernetes clusters in `us-east-1` and `eu-central-1`. Applications in both clusters need to read from the same globally replicated dataset. How does an S3 Multi-Region Access Point prevent you from having to maintain region-specific ConfigMaps?
Without a Multi-Region Access Point (MRAP), your deployment manifests would need region-specific ConfigMaps injected to tell the US cluster to use the US bucket and the EU cluster to use the EU bucket. An MRAP solves this by providing a single, global endpoint ARN that you can hardcode into your application’s configuration. When a pod makes a request to the MRAP, AWS’s global network automatically routes the request to the lowest-latency replica bucket behind the scenes. This decouples your Kubernetes configuration from your cloud storage topology, vastly simplifying multi-region deployments.
6. Your application code is explicitly configured to use `https://` for all S3 API calls. Why do security auditors still require you to implement a `DenyNonHTTPS` bucket policy statement?
Relying solely on application configuration violates the principle of defense-in-depth, as a simple configuration drift, typo, or new tool (like an admin running a local script) could accidentally use HTTP. By enforcing TLS at the bucket policy level, you create an infrastructure-enforced guardrail that actively denies any unencrypted request regardless of the client’s configuration. This guarantees data in transit is protected and satisfies strict compliance frameworks (like HIPAA or PCI-DSS) that require systemic, rather than application-level, enforcement of encryption.
Hands-On Exercise: Object Storage Access Patterns with MinIO
Section titled “Hands-On Exercise: Object Storage Access Patterns with MinIO”This exercise uses MinIO (S3-compatible) running locally in a kind cluster to practice all three access patterns.
# Create kind clusterkind create cluster --name storage-lab
# Install MinIOhelm repo add minio https://charts.min.io/helm install minio minio/minio \ --namespace storage --create-namespace \ --set replicas=1 \ --set persistence.enabled=false \ --set rootUser=minioadmin \ --set rootPassword=minioadmin123 \ --set resources.requests.memory=256Mi \ --set mode=standalone
k wait --for=condition=ready pod -l app=minio -n storage --timeout=120sTask 1: SDK-Style Access from a Pod
Section titled “Task 1: SDK-Style Access from a Pod”Create a pod that uses the AWS CLI (configured for MinIO) to create a bucket and upload files.
Solution
# Create a Secret with MinIO credentialsk create secret generic minio-creds -n storage \ --from-literal=AWS_ACCESS_KEY_ID=minioadmin \ --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin123
# Run a pod with AWS CLIcat <<'EOF' | k apply -f -apiVersion: v1kind: Podmetadata: name: s3-client namespace: storagespec: restartPolicy: Never containers: - name: aws-cli image: amazon/aws-cli:2.22.0 command: - /bin/sh - -c - | # Configure endpoint export AWS_DEFAULT_REGION=us-east-1
# Create bucket aws --endpoint-url http://minio:9000 s3 mb s3://test-bucket
# Upload files echo "Hello from Kubernetes" > /tmp/hello.txt aws --endpoint-url http://minio:9000 s3 cp /tmp/hello.txt s3://test-bucket/hello.txt
# Create multiple files for i in $(seq 1 10); do echo "File content $i - $(date)" > /tmp/file-$i.txt aws --endpoint-url http://minio:9000 s3 cp /tmp/file-$i.txt s3://test-bucket/data/file-$i.txt done
# List bucket contents aws --endpoint-url http://minio:9000 s3 ls s3://test-bucket/ --recursive
echo "Upload complete!" sleep 300 envFrom: - secretRef: name: minio-credsEOF
k wait --for=condition=ready pod/s3-client -n storage --timeout=60sk logs s3-client -n storageTask 2: Generate Pre-Signed URLs
Section titled “Task 2: Generate Pre-Signed URLs”Create a pod that generates pre-signed download URLs for the uploaded files.
Solution
cat <<'EOF' | k apply -f -apiVersion: v1kind: Podmetadata: name: url-generator namespace: storagespec: restartPolicy: Never containers: - name: python image: python:3.12-slim command: - /bin/sh - -c - | pip install boto3 -q
python3 << 'PYEOF' import boto3
s3 = boto3.client( 's3', endpoint_url='http://minio:9000', aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin123', region_name='us-east-1' )
# Generate pre-signed download URL url = s3.generate_presigned_url( 'get_object', Params={'Bucket': 'test-bucket', 'Key': 'hello.txt'}, ExpiresIn=3600 ) print(f"Download URL (1h expiry): {url}")
# Generate pre-signed upload URL upload_url = s3.generate_presigned_url( 'put_object', Params={ 'Bucket': 'test-bucket', 'Key': 'uploads/new-file.txt', 'ContentType': 'text/plain' }, ExpiresIn=900 ) print(f"Upload URL (15m expiry): {upload_url}")
# List all objects and generate URLs response = s3.list_objects_v2(Bucket='test-bucket', Prefix='data/') for obj in response.get('Contents', []): url = s3.generate_presigned_url( 'get_object', Params={'Bucket': 'test-bucket', 'Key': obj['Key']}, ExpiresIn=3600 ) print(f"{obj['Key']}: {url[:80]}...") PYEOF envFrom: - secretRef: name: minio-credsEOF
k wait --for=condition=ready pod/url-generator -n storage --timeout=120sk logs url-generator -n storageTask 3: Implement Lifecycle-Like Cleanup
Section titled “Task 3: Implement Lifecycle-Like Cleanup”Create a CronJob that cleans up files older than a specified age (simulating lifecycle policies).
Solution
apiVersion: batch/v1kind: CronJobmetadata: name: storage-cleanup namespace: storagespec: schedule: "*/5 * * * *" jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: cleanup image: python:3.12-slim command: - /bin/sh - -c - | pip install boto3 -q python3 << 'PYEOF' import boto3 from datetime import datetime, timezone, timedelta
s3 = boto3.client( 's3', endpoint_url='http://minio:9000', aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin123', region_name='us-east-1' )
MAX_AGE = timedelta(minutes=10) now = datetime.now(timezone.utc)
response = s3.list_objects_v2(Bucket='test-bucket', Prefix='data/') deleted = 0 for obj in response.get('Contents', []): age = now - obj['LastModified'] if age > MAX_AGE: s3.delete_object(Bucket='test-bucket', Key=obj['Key']) print(f"Deleted: {obj['Key']} (age: {age})") deleted += 1
print(f"Cleanup complete: {deleted} objects deleted") PYEOFk apply -f /tmp/cleanup-cronjob.yamlTask 4: Verify Bucket Security
Section titled “Task 4: Verify Bucket Security”Write a script that checks bucket security settings.
Solution
cat <<'EOF' | k apply -f -apiVersion: v1kind: Podmetadata: name: security-audit namespace: storagespec: restartPolicy: Never containers: - name: auditor image: python:3.12-slim command: - /bin/sh - -c - | pip install boto3 -q python3 << 'PYEOF' import boto3
s3 = boto3.client( 's3', endpoint_url='http://minio:9000', aws_access_key_id='minioadmin', aws_secret_access_key='minioadmin123', region_name='us-east-1' )
bucket = 'test-bucket' print(f"=== Security Audit: {bucket} ===")
# Check versioning try: v = s3.get_bucket_versioning(Bucket=bucket) status = v.get('Status', 'Disabled') print(f"Versioning: {status}") if status != 'Enabled': print(" WARNING: Versioning is not enabled!") except Exception as e: print(f" Versioning check failed: {e}")
# Check encryption try: enc = s3.get_bucket_encryption(Bucket=bucket) print(f"Encryption: Enabled") except Exception: print("Encryption: Not configured") print(" WARNING: Default encryption not set!")
# Check bucket policy try: policy = s3.get_bucket_policy(Bucket=bucket) print(f"Bucket policy: Present") except Exception: print("Bucket policy: None") print(" INFO: No bucket policy (relying on IAM only)")
# List objects to verify access objects = s3.list_objects_v2(Bucket=bucket) count = objects.get('KeyCount', 0) print(f"Object count: {count}")
print("=== Audit Complete ===") PYEOFEOF
k wait --for=condition=ready pod/security-audit -n storage --timeout=120sk logs security-audit -n storageSuccess Criteria
Section titled “Success Criteria”- S3 client pod creates bucket and uploads 11 files
- Pre-signed URL generator produces valid download and upload URLs
- CronJob runs and reports cleanup activity
- Security audit pod reports versioning and encryption status
Cleanup
Section titled “Cleanup”kind delete cluster --name storage-labNext Module: Module 9.5: Advanced Caching Services (ElastiCache / Memorystore) — Learn Redis and Memcached architectures for Kubernetes workloads, caching strategies, cache stampede prevention, and using Envoy as a sidecar cache.