Module 1.6: Elastic Container Registry (ECR)
Complexity: [MEDIUM]
Section titled “Complexity: [MEDIUM]”Time to Complete: 1 hour
Section titled “Time to Complete: 1 hour”Prerequisites
Section titled “Prerequisites”Before starting this module, you should have completed:
- Module 1.1: IAM & Security Foundations
- Docker fundamentals (building and tagging images)
- Docker installed locally and running
- AWS CLI configured with appropriate permissions
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Configure ECR repositories with immutable tagging and lifecycle policies to manage container image sprawl
- Implement cross-account and cross-region image replication for multi-region deployment pipelines
- Deploy ECR image scanning with vulnerability reporting and enforce scan-on-push policies
- Secure ECR access with IAM policies and VPC endpoints to keep image pulls off the public internet
Why This Module Matters
Section titled “Why This Module Matters”In January 2024, a mid-stage fintech startup pushed a routine update to their payment processing service. The deployment succeeded. Five minutes later, their monitoring exploded. The application was crashing on startup, throwing cryptic “exec format error” messages. The previous container image — the one that worked — had been overwritten because they were using the latest tag with mutable tagging enabled. Their CI pipeline had pushed an ARM64 image over the existing AMD64 image. No versioning. No immutability. No way to roll back except to rebuild from source, which took 22 minutes while their payment pipeline was down. Twenty-two minutes of lost transactions for a fintech company is the kind of thing that ends up in board meeting slides.
Container registries are one of those infrastructure components that seem boring until they break. They sit between your CI pipeline and your runtime environment, holding every version of every service your company runs. A misconfigured registry means you cannot deploy, cannot roll back, and cannot verify that what is running in production is what you think is running. AWS Elastic Container Registry (ECR) is Amazon’s managed container registry, deeply integrated with ECS, EKS, Lambda, and the rest of the AWS ecosystem.
In this module, you will learn how ECR works, how to configure it properly for production workloads, and how to avoid the mistakes that turn a routine deployment into a production incident. By the end, you will have built a complete image lifecycle — from building and pushing images with proper tagging, to configuring lifecycle policies that keep your registry clean and your costs under control.
ECR Architecture and Concepts
Section titled “ECR Architecture and Concepts”ECR is a fully managed Docker container registry. Unlike running your own registry (Docker Registry, Harbor, or Nexus), ECR handles storage, availability, encryption, and access control for you. Let us break down the key concepts.
Registries, Repositories, and Images
Section titled “Registries, Repositories, and Images”ECR Structure:
AWS Account (123456789012)|+-- ECR Registry (one per account per region) | Registry URL: 123456789012.dkr.ecr.us-east-1.amazonaws.com | +-- Repository: myapp/api | +-- Image: sha256:abc123... (tag: v1.2.0) | +-- Image: sha256:def456... (tag: v1.2.1) | +-- Image: sha256:ghi789... (tag: v1.3.0, tag: latest) | +-- Repository: myapp/worker | +-- Image: sha256:jkl012... (tag: v2.0.0) | +-- Image: sha256:mno345... (tag: v2.1.0) | +-- Repository: shared/nginx-base +-- Image: sha256:pqr678... (tag: 1.25-custom)Registry: One per AWS account per region. The URL format is always {account_id}.dkr.ecr.{region}.amazonaws.com. You cannot change this URL.
Repository: A collection of related container images, like a Git repository for code. Naming convention matters — use a slash-separated hierarchy like team/service or app/component.
Image: An individual container image, identified by its SHA256 digest and optionally by one or more tags. A single image can have multiple tags.
Public vs Private Repositories
Section titled “Public vs Private Repositories”ECR offers two flavors:
| Feature | ECR Private | ECR Public |
|---|---|---|
| URL format | {account_id}.dkr.ecr.{region}.amazonaws.com | public.ecr.aws/{alias} |
| Authentication | Required for pull and push | Required for push; pull is anonymous |
| Cost | $0.10/GB/month storage + data transfer | Free (up to limits) |
| Use case | Internal services, proprietary code | Open source projects, shared base images |
| Regions | All commercial regions | us-east-1 only (content delivered globally via CloudFront) |
| Vulnerability scanning | Basic + Enhanced (Inspector) | Not supported |
| Lifecycle policies | Yes | No |
For most DevOps workflows, you will use private repositories. Public ECR is excellent for distributing open-source tools or shared base images that external teams or customers need to pull.
# Create a private repositoryaws ecr create-repository \ --repository-name myapp/api \ --image-scanning-configuration scanOnPush=true \ --image-tag-mutability IMMUTABLE \ --encryption-configuration encryptionType=KMS
# Create a public repositoryaws ecr-public create-repository \ --repository-name my-oss-tool \ --catalog-data '{ "description": "My open source container tool", "architectures": ["x86-64", "ARM 64"], "operatingSystems": ["Linux"] }' \ --region us-east-1Authentication and Pushing Images
Section titled “Authentication and Pushing Images”ECR uses IAM for authentication, but Docker does not speak IAM natively. You need to exchange your IAM credentials for a Docker login token.
Getting Authenticated
Section titled “Getting Authenticated”# The standard way: pipe ECR token directly to docker loginaws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin \ 123456789012.dkr.ecr.us-east-1.amazonaws.com
# The token is valid for 12 hours# For CI/CD pipelines, refresh it at the start of each pipeline runFor ECS and EKS workloads, you do not need to handle authentication manually. ECS automatically pulls images from ECR using the task execution role. EKS nodes use the instance profile or IRSA (IAM Roles for Service Accounts) to authenticate.
Building and Pushing Images
Section titled “Building and Pushing Images”Here is the complete workflow from Dockerfile to ECR:
# Step 1: Build your image locallydocker build -t myapp/api:v1.3.0 .
# Step 2: Tag it for ECRdocker tag myapp/api:v1.3.0 \ 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp/api:v1.3.0
# Step 3: Push to ECRdocker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp/api:v1.3.0
# Verify the pushaws ecr describe-images \ --repository-name myapp/api \ --image-ids imageTag=v1.3.0For production CI/CD pipelines, here is a more robust script:
#!/bin/bashset -euo pipefail
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)REGION="us-east-1"REPO_NAME="myapp/api"REGISTRY="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"IMAGE_TAG="${GIT_SHA:-$(git rev-parse --short HEAD)}"
# Authenticateaws ecr get-login-password --region ${REGION} | \ docker login --username AWS --password-stdin ${REGISTRY}
# Build with cache from previous image (speeds up CI builds significantly)docker build \ --cache-from ${REGISTRY}/${REPO_NAME}:latest \ --build-arg BUILDKIT_INLINE_CACHE=1 \ -t ${REGISTRY}/${REPO_NAME}:${IMAGE_TAG} \ -t ${REGISTRY}/${REPO_NAME}:latest \ .
# Push both tagsdocker push ${REGISTRY}/${REPO_NAME}:${IMAGE_TAG}docker push ${REGISTRY}/${REPO_NAME}:latest
echo "Pushed ${REGISTRY}/${REPO_NAME}:${IMAGE_TAG}"Image Tagging Strategies and Immutability
Section titled “Image Tagging Strategies and Immutability”Tagging is where most teams get into trouble. Let us get this right.
Tag Immutability
Section titled “Tag Immutability”When tag immutability is enabled, once you push an image with a specific tag, that tag cannot be overwritten. This is critical for production safety.
# Enable immutability on an existing repositoryaws ecr put-image-tag-mutability \ --repository-name myapp/api \ --image-tag-mutability IMMUTABLE
# Now this will FAIL if v1.3.0 already exists:docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp/api:v1.3.0# Error: tag invalid: The image tag 'v1.3.0' already existsWith immutability enabled, you are guaranteed that v1.3.0 always refers to the exact same image. This makes rollbacks reliable and audit trails meaningful.
Tagging Strategy Comparison
Section titled “Tagging Strategy Comparison”| Strategy | Example | Pros | Cons |
|---|---|---|---|
| Semantic version | v1.3.0 | Human readable, clear progression | Requires version discipline |
| Git SHA | abc1234 | Ties image to exact source code | Not human readable |
| Both (recommended) | v1.3.0 + abc1234 | Best of both worlds | Slightly more complex CI |
latest only | latest | Simple | No versioning, cannot roll back, dangerous |
| Date-based | 2026-03-24-1432 | Chronological ordering | No semantic meaning |
Stop and think: Your CI pipeline successfully builds and pushes
myapp:latestto a mutable ECR repository, overwriting the previous image. Five minutes later, the new code triggers a critical bug in production. You try to roll back by updating your ECS service to restart its tasks, hoping it pulls the old image. What will actually happen, and why is this an incident-response nightmare?
The recommended approach for production: tag every image with both the semantic version and the Git SHA. Use latest only as a convenience pointer that also gets applied alongside the versioned tag.
# Recommended: apply multiple tagsdocker tag myapp/api:local \ ${REGISTRY}/${REPO_NAME}:v1.3.0docker tag myapp/api:local \ ${REGISTRY}/${REPO_NAME}:${GIT_SHA}docker tag myapp/api:local \ ${REGISTRY}/${REPO_NAME}:latest
# With immutability ON:# - v1.3.0 and ${GIT_SHA} are permanent, cannot be overwritten# - latest is NOT allowed with IMMUTABLE (it needs to change each push)# Solution: use IMMUTABLE repos without the 'latest' tag,# or use a separate mutable repo for 'latest'A practical note on immutability and latest: they are fundamentally incompatible. If you enable immutability, you cannot push latest more than once. Most teams choose one of two patterns:
- Enable immutability, never use
latest— deployments always reference explicit versions - Keep mutability, enforce versioning through CI policy — lint your pipeline to reject pushes without a version tag
Pattern 1 is safer. Pattern 2 is more convenient. Choose based on your team’s discipline.
Vulnerability Scanning
Section titled “Vulnerability Scanning”ECR provides two levels of vulnerability scanning: Basic and Enhanced.
Basic Scanning
Section titled “Basic Scanning”Basic scanning uses the open-source Clair engine to check for known CVEs in OS packages. It is included in ECR at no extra cost.
# Enable scan-on-push for a repositoryaws ecr put-image-scanning-configuration \ --repository-name myapp/api \ --image-scanning-configuration scanOnPush=true
# Manually trigger a scan on an existing imageaws ecr start-image-scan \ --repository-name myapp/api \ --image-id imageTag=v1.3.0
# Get scan resultsaws ecr describe-image-scan-findings \ --repository-name myapp/api \ --image-id imageTag=v1.3.0Enhanced Scanning
Section titled “Enhanced Scanning”Enhanced scanning uses Amazon Inspector and provides deeper analysis including application dependency vulnerabilities (not just OS packages). It costs extra but catches significantly more issues.
# Enable enhanced scanning at the registry levelaws ecr put-registry-scanning-configuration \ --scan-type ENHANCED \ --rules '[ { "scanFrequency": "CONTINUOUS_SCAN", "repositoryFilters": [ {"filter": "myapp/*", "filterType": "WILDCARD"} ] }, { "scanFrequency": "SCAN_ON_PUSH", "repositoryFilters": [ {"filter": "*", "filterType": "WILDCARD"} ] } ]'Interpreting Scan Results
Section titled “Interpreting Scan Results”# Get findings summaryaws ecr describe-image-scan-findings \ --repository-name myapp/api \ --image-id imageTag=v1.3.0 \ --query 'imageScanFindings.findingSeverityCounts'
# Example output:# {# "CRITICAL": 0,# "HIGH": 2,# "MEDIUM": 8,# "LOW": 15,# "INFORMATIONAL": 3# }
# Get detailed findings for critical and high severityaws ecr describe-image-scan-findings \ --repository-name myapp/api \ --image-id imageTag=v1.3.0 \ --query 'imageScanFindings.findings[?severity==`HIGH` || severity==`CRITICAL`]'A practical CI/CD gate using scan results:
#!/bin/bash# Gate deployment based on scan findingsCRITICAL=$(aws ecr describe-image-scan-findings \ --repository-name myapp/api \ --image-id imageTag=${IMAGE_TAG} \ --query 'imageScanFindings.findingSeverityCounts.CRITICAL // `0`' \ --output text)
HIGH=$(aws ecr describe-image-scan-findings \ --repository-name myapp/api \ --image-id imageTag=${IMAGE_TAG} \ --query 'imageScanFindings.findingSeverityCounts.HIGH // `0`' \ --output text)
if [ "${CRITICAL}" -gt 0 ]; then echo "BLOCKED: ${CRITICAL} critical vulnerabilities found" exit 1fi
if [ "${HIGH}" -gt 5 ]; then echo "WARNING: ${HIGH} high-severity vulnerabilities found (threshold: 5)" exit 1fi
echo "Scan passed: ${CRITICAL} critical, ${HIGH} high"Lifecycle Policies
Section titled “Lifecycle Policies”Without lifecycle policies, your ECR storage grows indefinitely. Every CI build pushes a new image, and old images accumulate. A team pushing 10 builds per day generates 300+ images per month per repository. At $0.10/GB/month, this adds up.
Lifecycle policies let you automatically expire old images based on rules you define.
Understanding Lifecycle Policy Rules
Section titled “Understanding Lifecycle Policy Rules”# Set a lifecycle policy that retains the last 10 tagged imagesaws ecr put-lifecycle-policy \ --repository-name myapp/api \ --lifecycle-policy-text '{ "rules": [ { "rulePriority": 1, "description": "Keep last 10 tagged images", "selection": { "tagStatus": "tagged", "tagPrefixList": ["v"], "countType": "imageCountMoreThan", "countNumber": 10 }, "action": { "type": "expire" } }, { "rulePriority": 2, "description": "Remove untagged images older than 3 days", "selection": { "tagStatus": "untagged", "countType": "sinceImagePushed", "countUnit": "days", "countNumber": 3 }, "action": { "type": "expire" } } ] }'Production-Grade Lifecycle Policy
Section titled “Production-Grade Lifecycle Policy”Here is a comprehensive policy that covers the typical scenarios:
aws ecr put-lifecycle-policy \ --repository-name myapp/api \ --lifecycle-policy-text '{ "rules": [ { "rulePriority": 1, "description": "Keep release images (v-prefixed) for 180 days", "selection": { "tagStatus": "tagged", "tagPrefixList": ["v"], "countType": "sinceImagePushed", "countUnit": "days", "countNumber": 180 }, "action": {"type": "expire"} }, { "rulePriority": 2, "description": "Keep only last 5 feature branch images", "selection": { "tagStatus": "tagged", "tagPrefixList": ["feature-", "fix-", "dev-"], "countType": "imageCountMoreThan", "countNumber": 5 }, "action": {"type": "expire"} }, { "rulePriority": 10, "description": "Remove untagged images after 1 day", "selection": { "tagStatus": "untagged", "countType": "sinceImagePushed", "countUnit": "days", "countNumber": 1 }, "action": {"type": "expire"} } ] }'Pause and predict: You have a policy with two rules. Rule 1 (Priority 1) keeps 5 images with the prefix
prod-. Rule 2 (Priority 2) expires all untagged images older than 7 days. You push an image with the tagprod-v2.0and immediately remove the tag because it was a mistake. 10 days later, will this image be deleted? Consider how ECR evaluates rules against image digests and tags.
Preview Before You Apply
Section titled “Preview Before You Apply”Lifecycle policies can be destructive — they delete images. Always preview first:
# Preview what a lifecycle policy WOULD delete (dry run)aws ecr get-lifecycle-policy-preview \ --repository-name myapp/api
# Start a preview (if no preview exists)aws ecr start-lifecycle-policy-preview \ --repository-name myapp/apiLifecycle Policy Rule Evaluation Order
Section titled “Lifecycle Policy Rule Evaluation Order”ECR evaluates lifecycle rules in a specific order. Understanding this prevents surprises:
Rule Evaluation Flow:
1. Rules are sorted by priority (lowest number = highest priority)2. Each image is evaluated against rules in priority order3. Once a rule matches an image, that image is "claimed" by that rule4. Later rules cannot affect images already claimed5. The action (expire or keep) is applied based on the matching rule
Example with our policy:- Image tagged "v1.3.0" pushed 90 days ago -> Matches Rule 1 (v-prefix, < 180 days) -> KEPT- Image tagged "v1.0.0" pushed 200 days ago -> Matches Rule 1 (v-prefix, > 180 days) -> EXPIRED- Image tagged "feature-auth-fix" (6th feature image) -> Matches Rule 2 (feature- prefix, > 5 count) -> EXPIRED- Untagged image pushed 2 days ago -> Matches Rule 10 (untagged, > 1 day) -> EXPIREDCross-Account and Cross-Region Sharing
Section titled “Cross-Account and Cross-Region Sharing”In multi-account AWS environments (which is the standard for any serious organization), you often need to share images between accounts. The typical pattern: a CI/CD account builds and pushes images, and deployment accounts pull them.
Cross-Account Access via Repository Policy
Section titled “Cross-Account Access via Repository Policy”# Allow another AWS account to pull imagesaws ecr set-repository-policy \ --repository-name myapp/api \ --policy-text '{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowCrossAccountPull", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::987654321098:root", "arn:aws:iam::111222333444:root" ] }, "Action": [ "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability" ] } ] }'Cross-Region Replication
Section titled “Cross-Region Replication”ECR supports automatic replication of images to other regions. This is essential for multi-region deployments to avoid cross-region image pulls during container startup (which add latency and data transfer costs).
# Configure replication to eu-west-1 and ap-southeast-1aws ecr put-replication-configuration \ --replication-configuration '{ "rules": [ { "destinations": [ { "region": "eu-west-1", "registryId": "123456789012" }, { "region": "ap-southeast-1", "registryId": "123456789012" } ], "repositoryFilters": [ { "filter": "myapp/", "filterType": "PREFIX_MATCH" } ] } ] }'Cross-Account and Cross-Region Together
Section titled “Cross-Account and Cross-Region Together”For organizations with separate accounts per environment and multiple regions:
# Replicate from CI account (123456789012) to:# - Production account (987654321098) in us-east-1 and eu-west-1aws ecr put-replication-configuration \ --replication-configuration '{ "rules": [ { "destinations": [ { "region": "us-east-1", "registryId": "987654321098" }, { "region": "eu-west-1", "registryId": "987654321098" } ] } ] }'The destination account must grant permission for the source account to replicate:
# Run this in the DESTINATION accountaws ecr put-registry-policy \ --policy-text '{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowReplicationFrom", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": [ "ecr:CreateRepository", "ecr:ReplicateImage" ], "Resource": "arn:aws:ecr:*:987654321098:repository/*" } ] }'Securing ECR with VPC Endpoints (AWS PrivateLink)
Section titled “Securing ECR with VPC Endpoints (AWS PrivateLink)”By default, when your ECS tasks, EKS worker nodes, or EC2 instances pull images from ECR, the traffic travels over the public internet. This requires your subnets to have a NAT Gateway (which incurs data processing charges) or an Internet Gateway (which requires public IP addresses).
For enhanced security and to reduce NAT Gateway costs, you can configure VPC Endpoints (AWS PrivateLink) for ECR. This keeps all container image pull traffic entirely within the AWS private network.
To use ECR privately, you must create two types of VPC endpoints:
- ECR API Endpoint:
com.amazonaws.region.ecr.api(Used for authentication and API calls likeDescribeRepositories) - ECR Docker Routing Layer Endpoint:
com.amazonaws.region.ecr.dkr(Used for the actual Dockerpullandpushoperations)
Because ECR stores image layers in S3, you must also create an S3 Gateway Endpoint (com.amazonaws.region.s3) in your VPC routing table. When the Docker daemon pulls an image layer, ECR provides a pre-signed S3 URL, and the actual layer data flows through the S3 Gateway Endpoint.
# Example: Creating the ECR Docker endpoint (requires a security group that allows inbound HTTPS from your compute nodes)aws ec2 create-vpc-endpoint \ --vpc-id vpc-12345678 \ --vpc-endpoint-type Interface \ --service-name com.amazonaws.us-east-1.ecr.dkr \ --subnet-ids subnet-11112222 subnet-33334444 \ --security-group-ids sg-55556666If you block public internet access in your private subnets and forget the S3 Gateway Endpoint, your ECS tasks will authenticate successfully but hang indefinitely in the “PENDING” state while trying to download the image layers.
Did You Know?
Section titled “Did You Know?”-
ECR stores images in S3 under the hood, but you cannot see or access the S3 buckets directly. Each image layer is stored as an individual S3 object, deduplicated across all repositories in the same account and region. If five of your repositories use the same base layer (like
ubuntu:22.04), that layer is stored only once. This deduplication can reduce your storage costs by 40-60% for organizations with many similar images. -
The ECR credential helper eliminates manual
docker logincalls. Installamazon-ecr-credential-helperand configure Docker to use it, and everydocker pullanddocker pushcommand against ECR automatically authenticates using your AWS credentials. GitHub Actions, GitLab CI, and Jenkins all have native ECR integration that uses this same mechanism under the hood. -
ECR pull-through cache lets your ECR registry act as a proxy for public registries like Docker Hub, Quay.io, and GitHub Container Registry. When your workloads pull
docker.io/library/nginx:1.25, ECR intercepts the request, caches the image locally, and serves subsequent pulls from the cache. This protects you from Docker Hub rate limits (100 pulls per 6 hours for anonymous users) and reduces external network dependencies. -
Amazon Inspector’s enhanced scanning for ECR can detect vulnerabilities in 15+ programming languages, not just OS packages. This includes npm, pip, Maven, NuGet, Go modules, Rust crates, and more. A single Node.js application image might have 3 OS-level vulnerabilities but 28 application-level ones. Basic scanning would only catch the 3.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
Using latest tag as the only tag | It is the Docker default and seems simple | Always tag with a version or Git SHA. Treat latest as a convenience alias, never as the deployment target |
| Forgetting to authenticate before pushing | ECR tokens expire after 12 hours | Add aws ecr get-login-password to the start of every CI pipeline. Use the credential helper for local development |
| Not enabling scan-on-push | It is not the default when creating repositories | Create a script or Terraform module that always enables scanning. Gate deployments on scan results |
| No lifecycle policy | Teams do not think about storage costs until the bill arrives | Apply lifecycle policies at repository creation time. A default policy that keeps the last 20 tagged images and removes untagged images after 3 days works for most teams |
| Mutable tags in production | It is the ECR default, and immutability seems restrictive | Enable IMMUTABLE tag mutability for all production repositories. Mutable tags make rollbacks unreliable |
| Pulling images cross-region in production | The image exists in us-east-1 but the ECS cluster is in eu-west-1 | Configure ECR replication to all regions where you deploy containers. Cross-region pulls add 200-500ms to container startup and incur data transfer charges |
| Repository names that do not match service names | Ad hoc naming without convention | Establish a naming convention (e.g., {team}/{service}) and enforce it in CI. Consistent naming prevents confusion when you have 50+ repositories |
| Not setting repository policies for cross-account access | Developers use root-account credentials or overly broad IAM policies | Use ECR repository policies for cross-account pull access. Keep IAM policies for push access controlled by the CI/CD account |
1. Your security team mandates that all application dependencies (like npm packages and Python wheels) must be scanned for vulnerabilities before deployment. You enable ECR Basic scanning, but the security team reports that it is missing known vulnerabilities in your Node.js application. Why is this happening, and what must you change?
ECR Basic scanning uses the open-source Clair engine, which only checks for known CVEs in operating system packages (like those installed via apt or yum). It cannot look inside application-level dependency files like package.json or requirements.txt. To satisfy the security team’s mandate, you must upgrade to Amazon Inspector Enhanced scanning. Enhanced scanning analyzes both OS packages and application dependencies across over 15 programming languages, catching vulnerabilities that Basic scanning completely ignores.
2. Your CI pipeline is configured to tag every build with both the Git SHA and the `latest` tag. After enabling ECR image tag immutability for your production repositories to improve security, your pipeline suddenly starts failing on the push step. Why is the pipeline failing, and what must you do to fix it?
Image tag immutability means that once a tag is applied to a specific image digest, it cannot be reassigned to a different image. The latest tag is designed to be a moving pointer that gets overwritten with every new build. Because these concepts are fundamentally incompatible, your pipeline fails when it tries to overwrite the latest tag from the previous build. To fix this, you must either remove the latest tag from your CI pipeline and deploy using the immutable Git SHA tags, or maintain a separate mutable repository specifically for the latest pointer.
3. You have an ECR lifecycle policy that keeps the last 10 images tagged with "v" prefix and removes untagged images after 3 days. You push an image tagged v1.5.0 and also tag it as "latest". Later, the v1.5.0 tag is removed by the lifecycle policy (it becomes the 11th oldest). What happens to the "latest" tag?
This is a subtle but important behavior in ECR. Lifecycle policies operate on the underlying image digest, not on individual tags attached to that image. If an image has multiple tags and the lifecycle policy matches one of those tags for expiration, the entire image and all of its associated tags are deleted. Consequently, when v1.5.0 is expired, the underlying image is deleted, which also strips away the “latest” tag that pointed to it. This demonstrates why relying on latest as a deployment reference is highly dangerous in a repository with lifecycle policies.
4. During a major traffic spike, your EKS cluster scales up rapidly, launching 50 new pods at once. The pods fail to start, and the Kubernetes events show 'Too Many Requests' errors from Docker Hub while trying to pull a public Nginx base image. How would implementing an ECR pull-through cache prevent this outage?
Docker Hub imposes strict rate limits on image pulls based on the IP address or authenticated user (e.g., 100 pulls per 6 hours for anonymous users). When 50 pods attempt to pull the Nginx image simultaneously from Docker Hub, you instantly exhaust your limit, causing the pulls to be throttled and the pods to fail. An ECR pull-through cache acts as a local proxy; the first pull goes to Docker Hub and caches the image in your ECR registry. All subsequent pod scaling events pull from the local ECR cache, which is not subject to Docker Hub rate limits, thereby ensuring reliable and fast container startups.
5. Your organization has 30 different microservices, and you enforce a standard where every service uses the exact same `ubuntu:22.04` base image. Your finance department is concerned that storing 30 copies of a heavy OS image in ECR will cause storage costs to skyrocket. Why is their concern unfounded, and how does ECR handle this under the hood?
Container images are composed of multiple filesystem layers, and ECR stores each of these layers individually based on their SHA256 digest. When multiple repositories within the same account and region push images that share identical base layers (like the standard ubuntu:22.04 image), ECR recognizes the duplicate hashes. Instead of storing 30 redundant copies, ECR stores the base layer only once in its underlying S3 bucket and creates references to it for each repository. You are only billed for the unique layer storage, meaning standardizing on a single base image actually drastically reduces your overall storage costs.
6. Your primary infrastructure is in `us-east-1`, but you recently deployed a disaster recovery ECS cluster in `eu-west-1`. The disaster recovery tasks are configured to pull their container images from the existing ECR registry in `us-east-1`. During a simulated failover, you notice that tasks take significantly longer to start and you incur unexpected AWS data transfer charges. Why did this happen and what is the proper architectural fix?
Pulling container images across AWS regions introduces substantial network latency, which directly increases the time it takes for your ECS tasks to download the image and start. Additionally, AWS charges for inter-region data transfer, meaning every cross-region image pull increases your monthly bill. To resolve both the performance degradation and the cost issue, you must configure ECR cross-region replication. By setting your us-east-1 registry to automatically replicate images to a registry in eu-west-1, the disaster recovery tasks can perform local image pulls, eliminating the cross-region network delay and data transfer fees.
Hands-On Exercise: Build, Push, Scan, and Lifecycle
Section titled “Hands-On Exercise: Build, Push, Scan, and Lifecycle”In this exercise, you will create an ECR repository, build and push images with proper tagging, run vulnerability scans, configure lifecycle policies, and set up cross-account access.
# Set your variablesexport ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)export REGION="us-east-1"export REGISTRY="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"export REPO_NAME="kubedojo/ecr-exercise"Task 1: Create a Repository with Best-Practice Settings
Section titled “Task 1: Create a Repository with Best-Practice Settings”Create an ECR repository with scanning enabled and immutable tags.
Solution
aws ecr create-repository \ --repository-name ${REPO_NAME} \ --image-scanning-configuration scanOnPush=true \ --image-tag-mutability IMMUTABLE \ --encryption-configuration encryptionType=AES256 \ --region ${REGION}
# Verify the repository settingsaws ecr describe-repositories \ --repository-names ${REPO_NAME} \ --query 'repositories[0].{Name:repositoryName,URI:repositoryUri,ScanOnPush:imageScanningConfiguration.scanOnPush,TagMutability:imageTagMutability}'Task 2: Build and Push a Test Image
Section titled “Task 2: Build and Push a Test Image”Create a simple Dockerfile, build it, and push to ECR with proper tagging.
Solution
# Create a temporary directory for our test imagemkdir -p /tmp/ecr-exercise && cd /tmp/ecr-exercise
# Create a simple Dockerfilecat > Dockerfile <<'DOCKERFILE'FROM python:3.12-slimLABEL maintainer="kubedojo"LABEL version="1.0.0"
RUN pip install flask==3.0.0COPY app.py /app/app.pyWORKDIR /appEXPOSE 8080CMD ["python", "app.py"]DOCKERFILE
# Create a simple appcat > app.py <<'PYTHON'from flask import Flask, jsonifyapp = Flask(__name__)
@app.route("/health")def health(): return jsonify({"status": "healthy"})
@app.route("/")def index(): return jsonify({"message": "ECR Exercise App", "version": "1.0.0"})
if __name__ == "__main__": app.run(host="0.0.0.0", port=8080)PYTHON
# Authenticateaws ecr get-login-password --region ${REGION} | \ docker login --username AWS --password-stdin ${REGISTRY}
# Build and tag with versiondocker build -t ${REGISTRY}/${REPO_NAME}:v1.0.0 .
# Pushdocker push ${REGISTRY}/${REPO_NAME}:v1.0.0
# Verifyaws ecr describe-images \ --repository-name ${REPO_NAME} \ --query 'imageDetails[*].{Tags:imageTags,Pushed:imagePushedAt,Size:imageSizeInBytes,Digest:imageDigest}'Task 3: Push Multiple Versions (Simulate CI/CD History)
Section titled “Task 3: Push Multiple Versions (Simulate CI/CD History)”Build and push several versions to test lifecycle policies later.
Solution
# Push versions v1.1.0 through v1.12.0 (we will use lifecycle to prune)for i in $(seq 1 12); do # Modify the app version to create different layers sed -i "s/version\": \"[0-9.]*\"/version\": \"1.${i}.0\"/" app.py
docker build -t ${REGISTRY}/${REPO_NAME}:v1.${i}.0 . docker push ${REGISTRY}/${REPO_NAME}:v1.${i}.0
echo "Pushed v1.${i}.0"done
# List all images in the repositoryaws ecr describe-images \ --repository-name ${REPO_NAME} \ --query 'sort_by(imageDetails, &imagePushedAt)[*].{Tag:imageTags[0],Pushed:imagePushedAt}' \ --output tableTask 4: Check Vulnerability Scan Results
Section titled “Task 4: Check Vulnerability Scan Results”Review the scan results for the latest image.
Solution
# Wait for scan to complete (scan-on-push was enabled)echo "Waiting for scan to complete..."aws ecr wait image-scan-complete \ --repository-name ${REPO_NAME} \ --image-id imageTag=v1.12.0
# Get scan findings summaryaws ecr describe-image-scan-findings \ --repository-name ${REPO_NAME} \ --image-id imageTag=v1.12.0 \ --query '{ Status: imageScanStatus.status, Counts: imageScanFindings.findingSeverityCounts, CompletedAt: imageScanFindings.imageScanCompletedAt }'
# Get detailed high/critical findingsaws ecr describe-image-scan-findings \ --repository-name ${REPO_NAME} \ --image-id imageTag=v1.12.0 \ --query 'imageScanFindings.findings[?severity==`HIGH` || severity==`CRITICAL`].{Name:name,Severity:severity,Description:description,URI:uri}' \ --output tableTask 5: Apply a Lifecycle Policy to Retain Only the Last 10 Images
Section titled “Task 5: Apply a Lifecycle Policy to Retain Only the Last 10 Images”Configure a lifecycle policy that keeps the 10 most recent tagged images and removes the rest.
Solution
# First, preview what the policy would deleteaws ecr put-lifecycle-policy \ --repository-name ${REPO_NAME} \ --lifecycle-policy-text '{ "rules": [ { "rulePriority": 1, "description": "Keep only the last 10 versioned images", "selection": { "tagStatus": "tagged", "tagPrefixList": ["v"], "countType": "imageCountMoreThan", "countNumber": 10 }, "action": { "type": "expire" } }, { "rulePriority": 2, "description": "Remove untagged images after 1 day", "selection": { "tagStatus": "untagged", "countType": "sinceImagePushed", "countUnit": "days", "countNumber": 1 }, "action": { "type": "expire" } } ] }'
# Verify the policy was appliedaws ecr get-lifecycle-policy \ --repository-name ${REPO_NAME} \ --query 'lifecyclePolicyText' --output text | python3 -m json.tool
# Note: Lifecycle policies run asynchronously (within 24 hours)# To see what WOULD be deleted immediately, use:aws ecr start-lifecycle-policy-preview \ --repository-name ${REPO_NAME}
# Check preview results (may take a few minutes)aws ecr get-lifecycle-policy-preview \ --repository-name ${REPO_NAME} \ --query 'previewResults[*].{Tag:imageTags[0],Action:action.type,AppliedRule:appliedAction.rulePriority}'Task 6: Configure Cross-Account Repository Access
Section titled “Task 6: Configure Cross-Account Repository Access”Apply a repository policy that allows a simulated deployment account (e.g., AWS Account ID 999988887777) to pull images from your repository.
Solution
# Create a policy JSON filecat > repo-policy.json <<'POLICY'{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowCrossAccountPull", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::999988887777:root" }, "Action": [ "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "ecr:BatchCheckLayerAvailability" ] } ]}POLICY
# Apply the policyaws ecr set-repository-policy \ --repository-name ${REPO_NAME} \ --policy-text file://repo-policy.json
# Verify the policy is appliedaws ecr get-repository-policy \ --repository-name ${REPO_NAME} \ --query 'policyText' --output text | python3 -m json.toolTask 7: Clean Up
Section titled “Task 7: Clean Up”Remove all resources created during this exercise.
Solution
# Delete all images in the repository first (required before repo deletion)IMAGE_IDS=$(aws ecr list-images \ --repository-name ${REPO_NAME} \ --query 'imageIds[*]' --output json)
aws ecr batch-delete-image \ --repository-name ${REPO_NAME} \ --image-ids "${IMAGE_IDS}"
# Delete the repositoryaws ecr delete-repository \ --repository-name ${REPO_NAME} \ --force
# Clean up local Docker imagesdocker images --filter "reference=${REGISTRY}/${REPO_NAME}" -q | \ xargs -r docker rmi
# Clean up temporary filesrm -rf /tmp/ecr-exerciserm -f repo-policy.json
echo "Cleanup complete"Success Criteria
Section titled “Success Criteria”- ECR repository created with scan-on-push and immutable tags
- Successfully authenticated and pushed a versioned image
- Multiple image versions pushed (simulating CI/CD history)
- Vulnerability scan results reviewed and interpreted
- Lifecycle policy applied that retains only the last 10 images
- Configured cross-account repository access
- All resources cleaned up
Next Module
Section titled “Next Module”Next up: Module 1.7: Elastic Container Service (ECS) & Fargate — Now that you can store container images, it is time to run them. You will learn to deploy containers on AWS using ECS with both EC2 and Fargate launch types, integrate with load balancers, and debug running containers with ECS Exec.