Module 1.4: Amazon S3 & Object Storage
Complexity: [MEDIUM] | Time to Complete: 2.5h | Prerequisites: Module 1.1
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure S3 bucket policies and access control lists to enforce least-privilege access on object storage
- Implement lifecycle policies to automate data tiering across S3 storage classes and reduce storage costs
- Deploy server-side encryption with KMS customer-managed keys and enforce encryption in transit
- Design presigned URL strategies and cross-account access patterns for secure, time-limited object sharing
Why This Module Matters
Section titled “Why This Module Matters”In 2017, a defense contractor accidentally exposed the personal information, security clearance details, and passwords of thousands of government employees. The data was not exfiltrated through a complex network breach. It was sitting in an Amazon S3 bucket named defense-contractor-internal-data. The bucket’s permissions had been casually modified by a developer trying to fix a deployment script, unintentionally granting READ access to the AllUsers group—effectively making it public to the entire internet. Anyone who could guess the URL could download the files. The data remained exposed for months before a security researcher found it.
Amazon Simple Storage Service (S3) is the foundational storage layer of the cloud. It is infinitely scalable, highly durable, and handles trillions of objects globally. Because it is so accessible and easy to use, it is the standard destination for application assets, database backups, massive data lakes, and static website hosting.
However, this accessibility is a double-edged sword. S3 sits squarely on the public internet by default (in terms of network routing, not permissions). A single misconfigured bucket policy can turn a private data repository into a public data breach instantly. In this module, you will learn the mechanics of object storage versus traditional file storage. You will master the security layers that protect S3 data, implement lifecycle rules to automate cost-saving archiving strategies, and learn how to generate secure, time-limited access mechanisms to share objects without exposing your buckets.
Object Storage vs. File Storage
Section titled “Object Storage vs. File Storage”If you have used a traditional operating system or a network attached storage (NAS) drive, you are familiar with File Storage. Data is organized in a hierarchical tree of nested directories and folders. Modifying a large file usually involves updating just the changed blocks on the disk.
S3 is Object Storage. It operates fundamentally differently:
- Flat Structure: There are no real directories or folders in S3. Everything is stored in a massive, flat container called a Bucket.
- Keys and Objects: Data is stored as an Object, consisting of the file data and its metadata. Every object is identified by a unique Key (the file path/name). When you see a path like
images/2023/photo.jpgin S3,images/2023/is not a folder; the entire stringimages/2023/photo.jpgis just a long key name. The console visually simulates folders for your convenience. - Immutability: Objects in S3 are immutable. You cannot open a 10GB video file in S3, edit the metadata, and save just the changes. If you modify an object, S3 completely overwrites the existing object with the new version.
Quick Comparison
Section titled “Quick Comparison”| Feature | File Storage (EFS/NFS) | Block Storage (EBS) | Object Storage (S3) |
|---|---|---|---|
| Structure | Hierarchical (dirs/files) | Raw blocks on a disk | Flat namespace (keys) |
| Access | NFS/SMB protocol | Mounted to one EC2 | HTTP REST API |
| Modify in place | Yes | Yes | No (full overwrite) |
| Max object size | Limited by disk | Limited by volume | 5 TB per object |
| Metadata | Basic (permissions, timestamps) | None (raw blocks) | Rich, custom key-value pairs |
| Typical use | Shared home dirs, CMS | Database volumes, OS disks | Backups, data lakes, static assets |
| Durability | Depends on config | 99.999% (within AZ) | 99.999999999% (11 nines) |
Think of it this way: EBS is a hard drive bolted to one server, EFS is a network file share everyone mounts, and S3 is a massive warehouse where you hand parcels to a clerk and get a receipt (the key) to retrieve them later.
S3 Security: Layers of Defense
Section titled “S3 Security: Layers of Defense”Because S3 buckets exist in a global namespace and are addressable via HTTP endpoints, securing them requires overlapping layers of authorization. S3 evaluates permissions using a combination of IAM policies and resource-based policies.
Here is how the full access evaluation flow works when a request hits S3:
┌──────────────────────┐ │ Incoming Request │ │ (GET /my-bucket/obj) │ └──────────┬───────────┘ │ ▼ ┌────────────────────────────────┐ │ S3 Block Public Access (BPA) │ │ Is the request public? │ │ Is BPA enabled? │ └───────────┬───────┬────────────┘ │ │ BPA blocks BPA allows (DENY) (not public or BPA off) │ │ ▼ ▼ DENIED ┌──────────────────────┐ │ Bucket Policy │ │ Explicit Deny? │ └──┬─────────┬─────────┘ │ │ Explicit No explicit DENY deny │ │ ▼ ▼ DENIED ┌──────────────────────┐ │ Bucket Policy │ │ Explicit Allow? │ └──┬─────────┬─────────┘ │ │ Explicit No bucket ALLOW policy match │ │ ▼ ▼ ALLOWED ┌──────────────────┐ (if │ IAM Policy │ same │ on the caller │ acct) └──┬──────┬────────┘ │ │ IAM Allow No IAM │ Allow ▼ ▼ ALLOWED DENIED
Key rules: - Explicit DENY always wins, anywhere in the chain - Cross-account: BOTH bucket policy AND caller IAM must Allow - Same account: Either bucket policy OR IAM Allow is sufficient - BPA is the master override for public access attempts1. S3 Block Public Access (BPA)
Section titled “1. S3 Block Public Access (BPA)”This is your master switch. BPA operates at the account or bucket level to override any policy that attempts to make data public. If BPA is turned on (and it is by default for all new buckets), even if an administrator writes a bucket policy explicitly granting s3:GetObject to * (everyone), S3 will block the request. Never disable Block Public Access on a bucket unless you are intentionally hosting public web assets.
BPA has four independent settings—you can toggle each one:
| Setting | What It Blocks |
|---|---|
BlockPublicAcls | Rejects PUT requests that include a public ACL |
IgnorePublicAcls | Ignores any existing public ACLs on the bucket/objects |
BlockPublicPolicy | Rejects bucket policies that grant public access |
RestrictPublicBuckets | Restricts access to buckets with public policies to only AWS services and authorized users |
Best practice: enable all four at the account level so no bucket in the entire account can ever go public accidentally.
# Enable BPA at the ACCOUNT level (recommended)aws s3control put-public-access-block \ --account-id $(aws sts get-caller-identity --query Account --output text) \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"Stop and think: If an S3 bucket has Block Public Access enabled at the account level, but a developer explicitly writes a Bucket Policy granting
s3:GetObjectto*(everyone), which rule wins when an anonymous user tries to download a file?
2. IAM Policies
Section titled “2. IAM Policies”As covered in Module 1.1, IAM policies are attached to the identity making the request (a User or a Role). If an EC2 instance has an IAM Role that allows s3:PutObject to a specific bucket, the instance can upload files.
Example: allow a role to read only from a specific prefix:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-data-bucket", "arn:aws:s3:::my-data-bucket/reports/*" ], "Condition": { "StringEquals": { "s3:prefix": "reports/" } } } ]}Notice the two ARN entries: one for the bucket itself (needed for ListBucket) and one for the objects inside it (needed for GetObject). Forgetting the bucket-level ARN is one of the most common IAM debugging headaches.
3. Bucket Policies
Section titled “3. Bucket Policies”A Bucket Policy is attached directly to the resource (the bucket itself). It is a JSON document that acts like a bouncer at the door of the bucket.
- Cross-Account Access: Bucket policies are essential for allowing users from other AWS accounts to read or write to your bucket.
- Enforcing Encryption: You can write a bucket policy that denies all
s3:PutObjectrequests unless the request includes a header enforcing AES256 server-side encryption. - IP Restriction: You can deny access to the bucket unless the request originates from your corporate VPN’s IP address range.
Example: enforce encryption on all uploads:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "DenyUnencryptedUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-secure-bucket/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "aws:kms" } } } ]}4. Access Control Lists (ACLs)
Section titled “4. Access Control Lists (ACLs)”ACLs are a legacy access control mechanism from before IAM existed. They apply to individual objects or the bucket. AWS strongly recommends disabling ACLs entirely (setting the bucket to “Bucket Owner Enforced”) and relying exclusively on IAM and Bucket Policies.
Bucket Policy vs. ACL vs. IAM — When to Use What
Section titled “Bucket Policy vs. ACL vs. IAM — When to Use What”| Aspect | IAM Policy | Bucket Policy | ACL |
|---|---|---|---|
| Attached to | Identity (user/role) | Resource (bucket) | Bucket or object |
| Cross-account | Only the caller side | Can grant to external principals | Can grant to other accounts |
| Max size | 6,144 chars (inline) | 20 KB | Fixed grantee list |
| Granularity | Any AWS action | S3 actions only | Read/Write/Full Control only |
| Conditional logic | Full Condition block | Full Condition block | None |
| AWS recommendation | Primary mechanism | Use for cross-account + resource constraints | Disable (legacy) |
| When to use | Controlling what your users can do | Controlling who can access your bucket | Almost never—only for S3 access logs |
Rule of thumb: Use IAM policies for same-account access control. Use bucket policies for cross-account access, IP restrictions, and encryption enforcement. Disable ACLs.
Pre-Signed URLs: Secure Temporary Access
Section titled “Pre-Signed URLs: Secure Temporary Access”Imagine you are building a photo-sharing application. Users upload private photos, and the app displays them.
The Bad Way: The web server downloads the image from S3 and streams it to the client. This bottlenecks the web server’s network and memory. The Insecure Way: You make the S3 bucket public so the client browser can load the image directly via the S3 URL. Now anyone can steal the photos.
The S3 Way: Pre-Signed URLs. Your backend application (which has an IAM Role with access to S3) uses the AWS SDK to generate a temporary, cryptographically signed URL. This URL grants access to download a specific object for a specific period (e.g., 5 minutes). The backend sends this URL to the frontend. The user’s browser uses the signed URL to download the image directly from S3. Once the 5 minutes expire, the URL becomes invalid.
Generating Pre-Signed URLs
Section titled “Generating Pre-Signed URLs”# Generate a pre-signed URL valid for 300 seconds (5 minutes)aws s3 presign s3://my-bucket/private/report.pdf --expires-in 300
# Output (example):# https://my-bucket.s3.amazonaws.com/private/report.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Expires=300&X-Amz-Signature=abc123...
# Generate a pre-signed URL for UPLOADING (PUT)aws s3 presign s3://my-bucket/uploads/user-photo.jpg \ --expires-in 3600
# Anyone with this URL can upload a file to that exact key for 1 hourImportant details about pre-signed URLs:
- The URL inherits the permissions of the IAM identity that generated it. If that identity loses access, existing pre-signed URLs stop working immediately.
- Maximum expiration: 7 days when signed by an IAM user, 36 hours when signed by STS temporary credentials (roles).
- Pre-signed URLs work for both GET (download) and PUT (upload) operations.
Stop and think: You generate a pre-signed URL valid for 7 days using your IAM User credentials. Two days later, your IAM User is deleted by an administrator. What happens when someone tries to use the URL on day 3?
Storage Classes and Lifecycle Rules
Section titled “Storage Classes and Lifecycle Rules”S3 offers different storage classes designed for different data access patterns. Why pay premium rates for data you rarely access?
Storage Class Comparison
Section titled “Storage Class Comparison”| Storage Class | Availability | Min Storage Duration | Min Object Size | Retrieval Time | Storage Cost (us-east-1) | Retrieval Cost | Best For |
|---|---|---|---|---|---|---|---|
| S3 Standard | 99.99% | None | None | Instant | ~$0.023/GB/mo | None | Active data, websites |
| S3 Intelligent-Tiering | 99.9% | None | None | Instant* | ~$0.023/GB/mo + monitoring fee | None | Unknown access patterns |
| S3 Standard-IA | 99.9% | 30 days | 128 KB | Instant | ~$0.0125/GB/mo | $0.01/GB | Backups, DR copies |
| S3 One Zone-IA | 99.5% | 30 days | 128 KB | Instant | ~$0.01/GB/mo | $0.01/GB | Reproducible infrequent data |
| S3 Glacier Instant | 99.9% | 90 days | 128 KB | Instant | ~$0.004/GB/mo | $0.03/GB | Archive with instant access |
| S3 Glacier Flexible | 99.99% | 90 days | None | 1-5 min to 12 hrs | ~$0.0036/GB/mo | $0.01-0.03/GB | Long-term archives |
| S3 Glacier Deep Archive | 99.99% | 180 days | None | 12-48 hours | ~$0.00099/GB/mo | $0.02/GB | Compliance, 7-10yr retention |
Note: Prices are approximate and vary by region. Check the AWS S3 Pricing page for current rates.
S3 Intelligent-Tiering deserves special attention. It automatically moves objects between an infrequent-access tier and a frequent-access tier based on usage patterns. It charges a small monthly monitoring fee per object (~$0.0025 per 1,000 objects) but can save significantly on large datasets with unpredictable access patterns. There is no retrieval fee.
Cost Example
Section titled “Cost Example”Suppose you store 10 TB of application logs:
| Strategy | Monthly Cost (approx) |
|---|---|
| All in S3 Standard | $235 |
| All in S3 Standard-IA | $128 |
| All in Glacier Deep Archive | $10 |
| Smart tiering with lifecycle (30/90/365 day transitions) | ~$40-80 depending on access |
That is a 75-95% cost reduction by using lifecycle rules intelligently.
Lifecycle Rules
Section titled “Lifecycle Rules”You don’t want to manually move data between these tiers. S3 Lifecycle Rules automate the process.
You can configure a rule that says:
- When log files are created, store them in S3 Standard.
- After 30 days, transition them to S3 Standard-IA.
- After 90 days, transition them to S3 Glacier Flexible Retrieval.
- After 365 days, permanently Delete the objects.
This automated tiering drastically reduces storage costs for historical data.
# View existing lifecycle rules on a bucketaws s3api get-bucket-lifecycle-configuration --bucket my-bucket
# Delete all lifecycle rules (careful!)aws s3api delete-bucket-lifecycle --bucket my-bucketLifecycle Rule Constraints
Section titled “Lifecycle Rule Constraints”There are ordering rules you must follow when transitioning between storage classes. S3 enforces a “waterfall” — you can only transition downward:
S3 Standard ├──► S3 Intelligent-Tiering ├──► S3 Standard-IA (min 30 days after creation) ├──► S3 One Zone-IA (min 30 days after creation) ├──► S3 Glacier Instant Retrieval (min 90 days after creation) ├──► S3 Glacier Flexible Retrieval └──► S3 Glacier Deep ArchiveYou cannot transition from Glacier back to Standard-IA via a lifecycle rule. To move data “upward,” you must restore and copy it manually.
Pause and predict: Look at the minimum object size for S3 Standard-IA (128 KB). If you configure a lifecycle rule to transition a bucket containing 10 million tiny 5 KB log files from Standard to Standard-IA, what do you expect will happen to your monthly storage bill?
Essential S3 CLI Commands
Section titled “Essential S3 CLI Commands”The AWS CLI provides two command families for S3:
aws s3— High-level commands (cp, sync, ls, mv, rm). These handle multipart uploads, retries, and parallelism automatically.aws s3api— Low-level API calls (put-object, get-object, put-bucket-policy). Full control, JSON input/output.
Copying Files
Section titled “Copying Files”# Upload a single fileaws s3 cp backup.tar.gz s3://my-bucket/backups/backup.tar.gz
# Download a fileaws s3 cp s3://my-bucket/backups/backup.tar.gz ./backup.tar.gz
# Copy between bucketsaws s3 cp s3://source-bucket/data.csv s3://dest-bucket/archive/data.csv
# Upload with a specific storage classaws s3 cp large-archive.tar.gz s3://my-bucket/archives/ \ --storage-class GLACIER
# Upload with server-side encryption (KMS)aws s3 cp secret-report.pdf s3://my-bucket/confidential/ \ --sse aws:kms \ --sse-kms-key-id alias/my-key
# Copy an entire directory (recursive)aws s3 cp ./logs/ s3://my-bucket/logs/ --recursive
# Copy with a filter — only .log filesaws s3 cp ./logs/ s3://my-bucket/logs/ --recursive \ --exclude "*" --include "*.log"Syncing Directories
Section titled “Syncing Directories”aws s3 sync is the workhorse for backups. It only copies files that are new or modified (based on size and timestamp), similar to rsync.
# Sync a local directory to S3aws s3 sync ./website/ s3://my-website-bucket/
# Sync from S3 to localaws s3 sync s3://my-bucket/data/ ./local-data/
# Sync and DELETE files in the destination that don't exist in source# (makes destination an exact mirror — use with caution!)aws s3 sync ./website/ s3://my-website-bucket/ --delete
# Dry run — see what WOULD happen without actually doing itaws s3 sync ./website/ s3://my-website-bucket/ --dryrun
# Sync only certain file typesaws s3 sync ./assets/ s3://my-bucket/assets/ \ --exclude "*" --include "*.jpg" --include "*.png"Listing and Inspecting
Section titled “Listing and Inspecting”# List all buckets in the accountaws s3 ls
# List objects in a bucket (top-level "folders")aws s3 ls s3://my-bucket/
# List objects recursively with sizesaws s3 ls s3://my-bucket/ --recursive --human-readable --summarize
# Get detailed metadata for a specific objectaws s3api head-object --bucket my-bucket --key reports/q4-summary.pdfManaging Buckets
Section titled “Managing Buckets”# Create a bucketaws s3 mb s3://my-new-bucket --region us-west-2
# Remove an EMPTY bucketaws s3 rb s3://my-empty-bucket
# Remove a bucket AND all its contents (destructive!)aws s3 rb s3://my-bucket --force
# Enable versioningaws s3api put-bucket-versioning \ --bucket my-bucket \ --versioning-configuration Status=Enabled
# Check versioning statusaws s3api get-bucket-versioning --bucket my-bucket
# Enable default encryption (SSE-S3)aws s3api put-bucket-encryption \ --bucket my-bucket \ --server-side-encryption-configuration '{ "Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}] }'Restoring from Glacier
Section titled “Restoring from Glacier”Objects in Glacier classes are not immediately downloadable. You must initiate a restore first.
# Initiate a restore (Expedited = 1-5 min, Standard = 3-5 hrs, Bulk = 5-12 hrs)aws s3api restore-object \ --bucket my-archive-bucket \ --key old-logs/app-2023.tar.gz \ --restore-request '{"Days": 7, "GlacierJobParameters": {"Tier": "Standard"}}'
# Check restore statusaws s3api head-object \ --bucket my-archive-bucket \ --key old-logs/app-2023.tar.gz
# The "Restore" header will show:# ongoing-request="true" → still restoring# ongoing-request="false", expiry-date="..." → ready to downloadS3 Encryption
Section titled “S3 Encryption”S3 offers multiple encryption options. Since January 2023, all new objects are encrypted by default with SSE-S3 (AES-256), even if you do not specify encryption settings.
| Encryption Type | Key Managed By | When to Use |
|---|---|---|
| SSE-S3 (AES-256) | AWS (fully managed) | Default, simplest option |
| SSE-KMS | AWS KMS (you control key policies) | Audit trail, key rotation, cross-account |
| SSE-C | You (provide key in every request) | Regulatory requirement to hold keys |
| Client-side | You (encrypt before upload) | Zero-trust, end-to-end encryption |
SSE-KMS is the most popular choice for enterprises because it integrates with CloudTrail (every key usage is logged) and supports automatic annual key rotation.
S3 Versioning Deep Dive
Section titled “S3 Versioning Deep Dive”When versioning is enabled, every overwrite or delete creates a new version rather than destroying data.
Key: reports/q4.pdf
Version Stack (newest first):┌─────────────────────────────────────────┐│ Delete Marker (no data) │ ← current "state" = deleted├─────────────────────────────────────────┤│ Version: abc789 (Final draft) │├─────────────────────────────────────────┤│ Version: def456 (Second draft) │├─────────────────────────────────────────┤│ Version: ghi123 (First upload) │└─────────────────────────────────────────┘
A standard GET returns 404 (delete marker).GET with ?versionId=abc789 returns the Final draft.DELETE the delete marker → restores abc789 as current.Important versioning behaviors:
- Versioning cannot be disabled once enabled. You can only suspend it (new objects get a null version ID, but existing versions remain).
- Suspended versioning still preserves previously created versions — it does not delete them.
- You pay for every stored version. A 1 GB file overwritten 100 times = 100 GB of storage.
- MFA Delete can require multi-factor authentication to delete versions or change versioning state.
Did You Know?
Section titled “Did You Know?”-
S3 provides “read-after-write” consistency for all PUTs and DELETEs. If you write a new object and immediately attempt to read it, S3 will return the new data. (Prior to December 2020, S3 was only eventually consistent, meaning immediate reads might return a 404 or an older version).
-
S3 supports static website hosting. By placing an
index.htmlfile in a bucket, turning off Block Public Access, and adding a public-read bucket policy, S3 acts as a globally distributed, highly available web server without provisioning any EC2 instances. -
S3 can process data in place using S3 Select and S3 Object Lambda. S3 Select lets you run SQL queries directly against CSV or JSON files stored in S3 — instead of downloading a 10 GB CSV and filtering locally, you send a
SELECT * FROM s3object WHERE status = 'error'query and S3 returns only the matching rows. This can reduce data transfer by 80-90%. -
S3 is designed for 99.999999999% (11 nines) durability. To put that in perspective: if you store 10 million objects in S3, you can statistically expect to lose a single object once every 10,000 years. S3 achieves this by automatically replicating every object across a minimum of 3 Availability Zones within a region.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
| Leaking data via public buckets | Turning off Block Public Access because “the application is throwing a 403 error and this makes it work.” | Never turn off BPA for private data. Fix the IAM policies or generate Pre-Signed URLs for the application. |
| Paying for millions of tiny files in IA | Moving massive amounts of 5KB log files to Standard-IA to save money. | Standard-IA has a minimum billable object size of 128KB. Moving a 5KB file charges you for 128KB, completely destroying the cost savings. Bundle tiny files before tiering. |
| Using S3 as a database | Because it has an API and stores JSON, developers try to use it for high-frequency transactional updates. | S3 is not optimized for rapid, sub-millisecond, concurrent transactional reads/writes. Use DynamoDB or RDS for transactional data. |
| Failing to manage versioning costs | Enabling versioning on a bucket that receives constant updates, keeping thousands of old versions forever. | S3 charges for every stored version. If you enable versioning, you MUST configure a Lifecycle Rule to expire noncurrent versions after a set period. |
| Bucket name collisions | Trying to create a bucket named test-bucket. | S3 bucket names must be globally unique across all AWS accounts in all regions. Use a naming convention involving your company name and account ID (e.g., acme-corp-123456789012-prod-backups). |
| Ignoring server-side encryption | Forgetting to check the encryption box. | Always enable default S3 Server-Side Encryption (SSE-S3 or SSE-KMS) to ensure data is encrypted at rest automatically. As of 2023, SSE-S3 is applied by default, but SSE-KMS is recommended for audit trails. |
| Forgetting both ARN forms in IAM policies | Writing an IAM policy with only the bucket ARN (arn:aws:s3:::my-bucket) or only the object ARN (arn:aws:s3:::my-bucket/*). | ListBucket requires the bucket ARN. GetObject/PutObject require the object ARN with /*. Always include both when granting read/write access. |
| Not setting lifecycle rules on incomplete multipart uploads | Large uploads that fail midway leave invisible fragments that cost money indefinitely. | Add a lifecycle rule to abort incomplete multipart uploads after 7 days: this is free storage savings that every bucket should have. |
Question 1: An auditor requires that your company keep application logs for exactly 7 years to meet compliance regulations. The logs are never accessed unless an audit occurs, at which point a 24-hour retrieval delay is perfectly acceptable. How should you store these logs most cost-effectively?
You should upload the logs directly to the S3 bucket and use a Lifecycle Rule to immediately transition them to the S3 Glacier Deep Archive storage class. Glacier Deep Archive offers the lowest possible storage cost, designed exactly for data that is rarely accessed but must be retained for compliance. Since the auditor accepts a 24-hour retrieval delay, the 12-48 hour retrieval time of Deep Archive perfectly fits the requirement. Finally, you would configure the lifecycle rule to permanently delete the objects after 2,555 days (7 years) to prevent paying for data you no longer legally need to keep.
Question 2: You attempt to attach a Bucket Policy to an S3 bucket granting public read access to a specific prefix (`images/`). The AWS Console throws an error and refuses to save the policy. What is the most likely cause?
The most likely cause is that S3 Block Public Access (BPA) is enabled on the bucket or at the account level. BPA acts as a master security override that actively prevents you from applying any bucket policy or ACL that grants public access. Even if you have full administrative privileges, S3 will reject the policy save attempt to protect you from accidental exposure. You would need to intentionally disable the specific BPA settings before the public policy could be successfully attached.
Question 3: A third-party data analytics company needs to upload a daily CSV file to a bucket in your AWS account. They provide you with their IAM Role ARN. How do you grant them access without creating an IAM user for them in your account?
You should create a Resource-Based Policy (a Bucket Policy) on your S3 bucket that grants them access. The policy will specify an Effect of Allow, the Action as s3:PutObject, and the Resource as your bucket ARN. Crucially, the Principal in the policy must be set to the AWS account ID or the specific IAM Role ARN provided by the third-party company. Keep in mind that for cross-account access to work, the permissions must be explicitly granted on both sides: your bucket policy must allow their role, and their account’s IAM policy must allow the role to upload to your bucket.
Question 4: You delete a 1GB video file from an S3 bucket that has Object Versioning enabled. You realize it was a mistake. Is the file gone permanently, and how do you recover it?
No, the file is not permanently deleted. Because Object Versioning is enabled on the bucket, a standard delete operation does not destroy the data; instead, it simply inserts a “Delete Marker” over the object. This marker becomes the current version, effectively hiding the object from standard list or download commands. To recover the file, you simply query the object’s version history and delete the “Delete Marker,” which immediately restores the previous version of the video file to the active state.
Question 5: Your team stores 50 million small JSON files (average 2 KB each) in S3 Standard. A cost optimization review suggests moving them to S3 Standard-IA. Will this save money?
No, implementing this change will likely increase your monthly storage costs significantly. S3 Standard-IA has a minimum billable object size constraint of 128 KB per object. Since your files average only 2 KB, AWS will bill each file as if it were 128 KB, effectively charging you for over 6 TB of storage instead of the actual 100 GB. Additionally, Standard-IA charges a per-GB retrieval fee, meaning you would pay every time a log is accessed. The correct cost-optimization approach would be to bundle these small JSON files into larger archives before moving them to an infrequent access tier.
Question 6: A user uploads an object to S3 via the console. At the exact same millisecond, a developer attempts to download that specific object via the AWS CLI. Will the developer get a 404 Not Found error, partial data, or the full object?
The developer will receive the full object without any corruption or partial data. S3 provides strong read-after-write consistency for all PUT operations, meaning that once S3 returns a success response for the upload, the data is immediately and fully available to all readers. If the developer’s request hits S3 the exact millisecond before the upload definitively completes, they will simply receive a 404 Not Found error. In neither case will S3 return partial data during an ongoing write operation, ensuring strict data integrity.
Question 7: Why is a Pre-Signed URL more secure than modifying a bucket policy to allow temporary access to a file?
A Pre-Signed URL is significantly more secure because it uses programmatic cryptography to generate a specific, time-bound signature for a single object operation. Modifying a bucket policy, on the other hand, affects the permissions of the bucket broadly and relies entirely on an administrator remembering to manually change the policy back later. Once the expiration time on a Pre-Signed URL passes, the signature becomes mathematically invalid, requiring no cleanup or state changes to the bucket itself. Furthermore, Pre-Signed URLs operate with the permissions of the identity that created them, meaning they never grant more access than the original user possessed.
Question 8: You enable versioning on a bucket and then later decide you no longer want it. You call `put-bucket-versioning` with `Status=Suspended`. What happens to the existing object versions?
All existing object versions will remain in the bucket exactly as they were, and they will continue to incur standard storage costs. Suspending versioning does not delete any previous versions; it only affects how new write operations are handled. Any new objects or overwrites will receive a null version ID instead of a unique cryptographic version ID. If you actually want to free up storage space and stop paying for the old data, you must explicitly delete the old versions or configure a lifecycle rule to automatically expire noncurrent versions.
Hands-On Exercise: Production Backup Bucket with Lifecycle Management
Section titled “Hands-On Exercise: Production Backup Bucket with Lifecycle Management”In this exercise, you will build a production-grade backup bucket with security hardening, versioning, lifecycle automation, and practice essential CLI operations. This mirrors what a real platform team configures for application backups.
Task 1: Create and Secure the Bucket
Section titled “Task 1: Create and Secure the Bucket”# Generate a unique bucket nameexport MY_BUCKET="dojo-backup-$(openssl rand -hex 6)"echo "Bucket name: $MY_BUCKET"
# Create the bucket (default region)aws s3 mb s3://$MY_BUCKET
# Create a customer-managed KMS keyexport KMS_KEY_ID=$(aws kms create-key \ --description "KMS key for Dojo Backup Bucket" \ --query 'KeyMetadata.KeyId' --output text)echo "Created KMS Key: $KMS_KEY_ID"
# Enforce Block Public Access (all four settings)aws s3api put-public-access-block \ --bucket $MY_BUCKET \ --public-access-block-configuration \ "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Enable default encryption (SSE-KMS)aws s3api put-bucket-encryption \ --bucket $MY_BUCKET \ --server-side-encryption-configuration '{ "Rules": [ { "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "'$KMS_KEY_ID'" }, "BucketKeyEnabled": true } ] }'
# Enable Object Versioningaws s3api put-bucket-versioning \ --bucket $MY_BUCKET \ --versioning-configuration Status=Enabled
# Verify all settingsecho "=== Block Public Access ==="aws s3api get-public-access-block --bucket $MY_BUCKETecho "=== Encryption ==="aws s3api get-bucket-encryption --bucket $MY_BUCKETecho "=== Versioning ==="aws s3api get-bucket-versioning --bucket $MY_BUCKETTask 2: Upload Backups and Observe Versioning
Section titled “Task 2: Upload Backups and Observe Versioning”Simulate a real backup workflow: daily database dumps that overwrite the same key.
# Create a simulated "daily backup" directorymkdir -p ./backup-exercise
# Day 1 backupecho '{"date": "2025-01-01", "records": 1000, "status": "healthy"}' > ./backup-exercise/db-dump.jsonaws s3 cp ./backup-exercise/db-dump.json s3://$MY_BUCKET/backups/daily/db-dump.jsonecho "Uploaded Day 1 backup"
# Day 2 backup (overwrites same key — versioning keeps Day 1)echo '{"date": "2025-01-02", "records": 1042, "status": "healthy"}' > ./backup-exercise/db-dump.jsonaws s3 cp ./backup-exercise/db-dump.json s3://$MY_BUCKET/backups/daily/db-dump.jsonecho "Uploaded Day 2 backup"
# Day 3 backup (corrupted!)echo '{"date": "2025-01-03", "records": -1, "status": "CORRUPTED"}' > ./backup-exercise/db-dump.jsonaws s3 cp ./backup-exercise/db-dump.json s3://$MY_BUCKET/backups/daily/db-dump.jsonecho "Uploaded Day 3 backup (corrupted)"
# View the current (corrupted) versionecho "=== Current version ==="aws s3 cp s3://$MY_BUCKET/backups/daily/db-dump.json -
# List ALL versions — note the VersionIdsecho "=== All versions ==="aws s3api list-object-versions \ --bucket $MY_BUCKET \ --prefix backups/daily/db-dump.json \ --query 'Versions[].{Key:Key,VersionId:VersionId,LastModified:LastModified,Size:Size}'Task 3: Recover from the Corrupted Backup
Section titled “Task 3: Recover from the Corrupted Backup”Roll back to the healthy Day 2 backup by downloading a specific version.
# Get the version ID of Day 2 (the second entry, index [1])DAY2_VERSION=$(aws s3api list-object-versions \ --bucket $MY_BUCKET \ --prefix backups/daily/db-dump.json \ --query 'Versions[1].VersionId' --output text)echo "Day 2 version ID: $DAY2_VERSION"
# Download the Day 2 version specificallyaws s3api get-object \ --bucket $MY_BUCKET \ --key backups/daily/db-dump.json \ --version-id $DAY2_VERSION \ ./backup-exercise/db-dump-day2-restored.json
echo "=== Restored Day 2 backup ==="cat ./backup-exercise/db-dump-day2-restored.json
# Re-upload the healthy version as the current versionaws s3 cp ./backup-exercise/db-dump-day2-restored.json \ s3://$MY_BUCKET/backups/daily/db-dump.jsonecho "Day 2 backup restored as current version"
# Verifyaws s3 cp s3://$MY_BUCKET/backups/daily/db-dump.json -Task 4: Sync a Local Directory and Generate a Pre-Signed URL
Section titled “Task 4: Sync a Local Directory and Generate a Pre-Signed URL”# Create some additional "application logs"for i in 1 2 3 4 5; do echo "Log entry $i: $(date -u +%Y-%m-%dT%H:%M:%SZ) - Application started" \ > ./backup-exercise/app-log-day$i.txtdone
# Sync the entire directory to S3 (only new/changed files are uploaded)aws s3 sync ./backup-exercise/ s3://$MY_BUCKET/logs/ \ --exclude "*.json"
# Verify with a recursive listingaws s3 ls s3://$MY_BUCKET/ --recursive --human-readable
# Generate a pre-signed URL to share one log file (valid 10 minutes)SIGNED_URL=$(aws s3 presign s3://$MY_BUCKET/logs/app-log-day1.txt --expires-in 600)echo "=== Pre-Signed URL (valid 10 min) ==="echo "$SIGNED_URL"
# Test the URL (should return the log content)curl -s "$SIGNED_URL"Task 5: Configure Production Lifecycle Rules
Section titled “Task 5: Configure Production Lifecycle Rules”Set up a comprehensive lifecycle policy with multiple rules.
cat << 'EOF' > lifecycle.json{ "Rules": [ { "ID": "TransitionDailyBackups", "Filter": { "Prefix": "backups/daily/" }, "Status": "Enabled", "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 90, "StorageClass": "GLACIER" } ], "NoncurrentVersionTransitions": [ { "NoncurrentDays": 30, "StorageClass": "GLACIER" } ], "NoncurrentVersionExpiration": { "NoncurrentDays": 365 } }, { "ID": "ExpireOldLogs", "Filter": { "Prefix": "logs/" }, "Status": "Enabled", "Transitions": [ { "Days": 14, "StorageClass": "STANDARD_IA" }, { "Days": 60, "StorageClass": "GLACIER" } ], "Expiration": { "Days": 180 } }, { "ID": "CleanupIncompleteUploads", "Filter": { "Prefix": "" }, "Status": "Enabled", "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 } } ]}EOF
# Apply the lifecycle configurationaws s3api put-bucket-lifecycle-configuration \ --bucket $MY_BUCKET \ --lifecycle-configuration file://lifecycle.json
# Verify — inspect each ruleaws s3api get-bucket-lifecycle-configuration --bucket $MY_BUCKETWhat these rules accomplish:
| Rule | What It Does |
|---|---|
| TransitionDailyBackups | Current backups: Standard -> IA at 30 days -> Glacier at 90 days. Old versions: Glacier at 30 days, deleted at 365 days. |
| ExpireOldLogs | Logs: Standard -> IA at 14 days -> Glacier at 60 days -> Deleted at 180 days. |
| CleanupIncompleteUploads | Aborts any multipart upload that has been in progress for more than 7 days (prevents hidden storage costs). |
Task 6: Apply a Bucket Policy (Enforce Encryption)
Section titled “Task 6: Apply a Bucket Policy (Enforce Encryption)”# Create a bucket policy that denies unencrypted uploadscat << EOF > bucket-policy.json{ "Version": "2012-10-17", "Statement": [ { "Sid": "DenyUnencryptedObjectUploads", "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::${MY_BUCKET}/*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "aws:kms" } } }, { "Sid": "DenyInsecureTransport", "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": [ "arn:aws:s3:::${MY_BUCKET}", "arn:aws:s3:::${MY_BUCKET}/*" ], "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ]}EOF
# Apply the bucket policyaws s3api put-bucket-policy \ --bucket $MY_BUCKET \ --policy file://bucket-policy.json
# Verifyaws s3api get-bucket-policy --bucket $MY_BUCKET --output text | python3 -m json.toolClean Up
Section titled “Clean Up”Because a versioned bucket contains hidden data, a simple rm --recursive won’t work easily from the standard CLI. We have to use the API to delete all versions.
# Python script to delete all versions and markerscat << EOF > empty_bucket.pyimport boto3import sys
bucket_name = sys.argv[1]s3 = boto3.resource('s3')bucket = s3.Bucket(bucket_name)print(f"Deleting all versions in {bucket_name}...")bucket.object_versions.delete()print(f"Deleting bucket {bucket_name}...")bucket.delete()print("Done!")EOF
# Run script to delete bucket and contentspython3 empty_bucket.py $MY_BUCKET
# Schedule KMS key deletion (minimum 7 days)aws kms schedule-key-deletion \ --key-id $KMS_KEY_ID \ --pending-window-in-days 7
# Clean up local filesrm -rf ./backup-exercise lifecycle.json bucket-policy.json empty_bucket.pySuccess Criteria
Section titled “Success Criteria”- I created a bucket with Block Public Access, a customer-managed KMS key for default encryption, and versioning enabled.
- I uploaded multiple versions of the same file and listed the version history.
- I recovered a specific previous version after a simulated data corruption.
- I synced a local directory to S3 and generated a working pre-signed URL.
- I applied a lifecycle configuration with multiple rules (transition, expiration, incomplete upload cleanup).
- I applied a bucket policy that enforces encryption and denies insecure (non-HTTPS) transport.
- I cleaned up all resources (bucket, local files, and KMS key).
Next Module
Section titled “Next Module”Now that you have mastery over compute and storage, it is time to route users to your applications globally. Head to Module 1.5: Route 53 & DNS.