Module 2.4: Union Filesystems
Linux Foundations | Complexity:
[MEDIUM]| Time: 25-30 min
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Required: Module 1.3: Filesystem Hierarchy
- Required: Module 2.1: Linux Namespaces (mount namespace concept)
- Helpful: Understanding of container images
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Explain how overlay filesystems enable container image layers
- Trace a file read/write through the overlay stack (lowerdir, upperdir, merged)
- Debug storage issues in containers by inspecting the overlay mount
- Compare OverlayFS with other union filesystem implementations and explain why OverlayFS won
Why This Module Matters
Section titled “Why This Module Matters”Every time you pull a container image, run docker build, or start a Kubernetes pod, union filesystems are at work. They make containers efficient by:
- Sharing common layers — 100 containers from the same image don’t need 100 copies
- Copy-on-write — Only changed files use additional storage
- Fast startup — No need to copy entire image for each container
Understanding union filesystems helps you:
- Optimize images — Know why layer order matters
- Debug storage issues — Why is my container using so much space?
- Understand image caching — Why did Docker rebuild this layer?
- Troubleshoot container filesystem problems — Why can’t I see my file?
Did You Know?
Section titled “Did You Know?”-
OverlayFS merged into the Linux kernel in 2014 (kernel 3.18). Before that, containers used AUFS, which never made it into the mainline kernel—Docker had to patch kernels to use it.
-
A single layer can be shared by thousands of containers — If you run 1000 containers from the same base image, you have ONE copy of the base layer, not 1000. This is why container density is so high.
-
Each Dockerfile instruction creates a layer — But only instructions that modify the filesystem (RUN, COPY, ADD) create meaningful layers. ENV and LABEL create metadata-only layers.
-
The container’s writable layer is ephemeral — When the container is removed, the layer is gone. This is why volumes exist—to persist data beyond container lifecycle.
What Is a Union Filesystem?
Section titled “What Is a Union Filesystem?”A union filesystem merges multiple directories (layers) into a single unified view.
Stop and think: If you delete a file in a container that originated from the base image, how does the filesystem remember it’s deleted without actually modifying the read-only base image?
┌─────────────────────────────────────────────────────────────────┐│ UNION FILESYSTEM VIEW ││ ││ What the container sees: ││ ┌─────────────────────┐ ││ │ / │ ││ │ ├── bin/ │ ││ │ ├── etc/nginx/ │ ││ │ ├── var/log/ │ ││ │ └── app/myapp │ ││ └─────────────────────┘ ││ ▲ ││ │ ││ ┌───────────────┴───────────────┐ ││ │ Union/Merge Operation │ ││ └───────────────────────────────┘ ││ ▲ ▲ ▲ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Container │ │ Image Layer │ │ Base Layer │ ││ │ Layer (RW) │ │ (RO) │ │ (RO) │ ││ │ │ │ │ │ │ ││ │ /var/log/ │ │ /etc/nginx/ │ │ /bin/ │ ││ │ app.log │ │ nginx.conf │ │ bash │ ││ │ /app/myapp │ │ │ │ ls │ ││ │ (modified) │ │ │ │ │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ upperdir lowerdir lowerdir │└─────────────────────────────────────────────────────────────────┘Key Concepts
Section titled “Key Concepts”| Concept | Description |
|---|---|
| Layer | A directory containing filesystem changes |
| Lower layers | Read-only base layers (image) |
| Upper layer | Read-write container layer |
| Merged view | What the container sees |
| Copy-on-write | Copies file to upper layer when modified |
| Whiteout | Marks deleted files (without removing from lower) |
OverlayFS
Section titled “OverlayFS”OverlayFS is the default storage driver for Docker and containerd.
How OverlayFS Works
Section titled “How OverlayFS Works”┌─────────────────────────────────────────────────────────────────┐│ OVERLAYFS ││ ││ Mount command: ││ mount -t overlay overlay -o \ ││ lowerdir=/lower1:/lower2, \ ││ upperdir=/upper, \ ││ workdir=/work \ ││ /merged ││ ││ ┌────────────────────────────────────────────────────────┐ ││ │ /merged (unified view) │ ││ └────────────────────────────────────────────────────────┘ ││ ▲ ││ ┌───────────────────────┼───────────────────────┐ ││ │ │ │ ││ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ││ │/upper │ │/work │ │/lower1 │ │/lower2 │ ││ │(RW) │ │(scratch│ │(RO) │ │(RO) │ ││ │ │ │ space) │ │ │ │ │ ││ └────────┘ └────────┘ └────────┘ └────────┘ ││ ││ Changes go Temporary Image layers (read-only, ││ here operations stacked with lower1 on top) │└─────────────────────────────────────────────────────────────────┘OverlayFS Operations
Section titled “OverlayFS Operations”| Operation | What Happens |
|---|---|
| Read | Return file from highest layer that has it |
| Write (new file) | Create in upper layer |
| Write (existing) | Copy from lower to upper, then modify (COW) |
| Delete | Create “whiteout” file in upper layer |
| Rename dir | Complex; may copy entire directory |
Try This: Create an Overlay Mount
Section titled “Try This: Create an Overlay Mount”# Create directoriesmkdir -p /tmp/overlay/{lower,upper,work,merged}
# Add some files to lowerecho "base file" > /tmp/overlay/lower/base.txtecho "will be modified" > /tmp/overlay/lower/modify.txtecho "will be deleted" > /tmp/overlay/lower/delete.txt
# Mount overlaysudo mount -t overlay overlay \ -o lowerdir=/tmp/overlay/lower,upperdir=/tmp/overlay/upper,workdir=/tmp/overlay/work \ /tmp/overlay/merged
# View merged filesystemls /tmp/overlay/merged/# Shows: base.txt delete.txt modify.txt
# Read from lower layercat /tmp/overlay/merged/base.txt# Output: base file
# Create new file (goes to upper)echo "new file" > /tmp/overlay/merged/new.txtls /tmp/overlay/upper/# Shows: new.txt
# Modify existing (copy-on-write)echo "modified content" > /tmp/overlay/merged/modify.txtls /tmp/overlay/upper/# Shows: modify.txt new.txt
# Delete file (creates whiteout)rm /tmp/overlay/merged/delete.txtls -la /tmp/overlay/upper/# Shows: delete.txt (whiteout character device)
# Original still exists in lowerls /tmp/overlay/lower/# Shows: base.txt delete.txt modify.txt
# Cleanupsudo umount /tmp/overlay/mergedrm -rf /tmp/overlayContainer Image Layers
Section titled “Container Image Layers”Pause and predict: If you change a single line of code in your application, which layers of the Docker image will need to be rebuilt?
Anatomy of an Image
Section titled “Anatomy of an Image”┌─────────────────────────────────────────────────────────────────┐│ DOCKER IMAGE STRUCTURE ││ ││ docker pull nginx:alpine ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Layer 5: COPY nginx.conf /etc/nginx/ (2KB) │ ││ ├─────────────────────────────────────────────────────────┤ ││ │ Layer 4: RUN apk add nginx (10MB) │ ││ ├─────────────────────────────────────────────────────────┤ ││ │ Layer 3: RUN apk update (5MB) │ ││ ├─────────────────────────────────────────────────────────┤ ││ │ Layer 2: ENV PATH=/usr/local/sbin:... (metadata only) │ ││ ├─────────────────────────────────────────────────────────┤ ││ │ Layer 1: Alpine base image (5MB) │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ Total: ~22MB (but shared with other alpine-based images) │└─────────────────────────────────────────────────────────────────┘Viewing Image Layers
Section titled “Viewing Image Layers”# See layersdocker history nginx:alpine
# Detailed layer infodocker inspect nginx:alpine | jq '.[0].RootFS.Layers'
# Layer storage locationls /var/lib/docker/overlay2/Layer Sharing
Section titled “Layer Sharing”Container 1 (nginx:alpine) Container 2 (nginx:alpine)┌─────────────────────────┐ ┌─────────────────────────┐│ Container Layer (RW) │ │ Container Layer (RW) ││ /var/log/nginx/access.. │ │ /var/cache/nginx/... │└───────────┬─────────────┘ └───────────┬─────────────┘ │ │ └──────────┬──────────────────┘ │ ▼ SHARED (one copy!) ┌─────────────────────────┐ │ nginx:alpine layers │ │ (read-only) │ └─────────────────────────┘100 containers from nginx:alpine = 1 copy of image + 100 thin container layers
Copy-on-Write (COW)
Section titled “Copy-on-Write (COW)”How COW Works
Section titled “How COW Works”┌─────────────────────────────────────────────────────────────────┐│ COPY-ON-WRITE ││ ││ BEFORE MODIFICATION: ││ ┌───────────────┐ ││ │ Upper (empty) │ ││ ├───────────────┤ ││ │ Lower │ ││ │ nginx.conf │ ← Container reads from here ││ └───────────────┘ ││ ││ DURING MODIFICATION: ││ 1. Copy nginx.conf from lower to upper ││ 2. Modify the copy in upper ││ ││ AFTER MODIFICATION: ││ ┌───────────────┐ ││ │ Upper │ ││ │ nginx.conf │ ← Container now reads modified version ││ ├───────────────┤ ││ │ Lower │ ││ │ nginx.conf │ ← Still exists, unchanged ││ └───────────────┘ ││ ││ Lower layer is NEVER modified (other containers use it!) │└─────────────────────────────────────────────────────────────────┘COW Performance Implications
Section titled “COW Performance Implications”| Operation | Performance |
|---|---|
| Reading small file | Fast (direct read) |
| Reading large file | Fast (direct read) |
| Writing new small file | Fast (write to upper) |
| Modifying small file | Medium (copy + write) |
| Modifying large file | SLOW (full copy + write) |
| Modifying file frequently | Can be slow (consider volume) |
Best Practice: For frequently modified files, use volumes instead of container layer.
Dockerfile Layer Optimization
Section titled “Dockerfile Layer Optimization”Bad: Creates Many Large Layers
Section titled “Bad: Creates Many Large Layers”FROM ubuntu:22.04RUN apt-get updateRUN apt-get install -y python3RUN apt-get install -y python3-pipRUN rm -rf /var/lib/apt/lists/* # Too late! Previous layers have itEach RUN creates a layer. The rm in the last layer doesn’t reduce image size—the files still exist in earlier layers!
Good: Single Optimized Layer
Section titled “Good: Single Optimized Layer”FROM ubuntu:22.04RUN apt-get update && \ apt-get install -y python3 python3-pip && \ rm -rf /var/lib/apt/lists/* # Same layer, so files are never storedLayer Ordering Matters
Section titled “Layer Ordering Matters”# BAD: Copy code before installing dependencies# Every code change invalidates pip install layerFROM python:3.11COPY . /app # Changes frequentlyRUN pip install -r /app/requirements.txt # Reinstalled every time!
# GOOD: Install dependencies firstFROM python:3.11COPY requirements.txt /app/ # Changes rarelyRUN pip install -r /app/requirements.txt # Cached!COPY . /app # Only this layer rebuilds.dockerignore
Section titled “.dockerignore”.gitnode_modules__pycache__*.pyc.env*.logStorage Drivers
Section titled “Storage Drivers”Available Drivers
Section titled “Available Drivers”| Driver | Used By | Backing Filesystem |
|---|---|---|
| overlay2 | Default | xfs, ext4 |
| btrfs | Some systems | btrfs |
| zfs | Some systems | zfs |
| devicemapper | Legacy RHEL | Any |
| vfs | Testing only | Any |
Check Your Driver
Section titled “Check Your Driver”# Dockerdocker info | grep "Storage Driver"
# containerdcat /etc/containerd/config.toml | grep snapshotter
# Podmanpodman info | grep graphDriverNameStorage Location
Section titled “Storage Location”# Docker layer storagels /var/lib/docker/overlay2/
# Each directory is a layer# l/ contains shortened symlinks for path length# diff/ contains actual layer contents# merged/ is the union view (for running containers)# work/ is overlay work directoryTroubleshooting Storage
Section titled “Troubleshooting Storage”Container Using Too Much Space
Section titled “Container Using Too Much Space”# Check container sizesdocker ps -s
# SIZE: Virtual = image + writable layer# SIZE: Actual writable layer
# Find large files in containerdocker exec container-id du -sh /* 2>/dev/null | sort -h | tail -10
# Check what's in writable layerdocker diff container-id# A = Added# C = Changed# D = DeletedImage Layer Analysis
Section titled “Image Layer Analysis”# See layer sizesdocker history --no-trunc nginx:alpine
# Use dive tool for detailed analysis# https://github.com/wagoodman/divedive nginx:alpineDisk Full Issues
Section titled “Disk Full Issues”# Docker disk usagedocker system df
# Detailed breakdowndocker system df -v
# Clean updocker system prune # Remove unused datadocker system prune -a # Also remove unused imagesdocker builder prune # Clear build cacheCommon Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Multiple RUN commands | Bloated image | Combine into single RUN |
| rm in separate layer | Files still in earlier layer | Delete in same layer |
| Wrong COPY order | Cache invalidation | Copy dependencies first |
| Writing to container layer | Slow, data lost on restart | Use volumes |
| Not using .dockerignore | Large context, slow builds | Exclude unnecessary files |
| Forgetting layer caching | Slow rebuilds | Order Dockerfile by change frequency |
Question 1
Section titled “Question 1”Scenario: You are deploying a microservices architecture and you spin up 100 replica pods of your Node.js application using the exact same container image. Your infrastructure team is concerned about storage capacity, assuming each 500MB container will consume 50GB total. Why is their assumption incorrect and what actually happens at the storage level?
Show Answer
Their assumption is incorrect because container runtimes utilize union filesystems to share read-only layers across all instances. When you start 100 containers from the same image, the runtime only keeps a single 500MB copy of the base image on disk. Each of the 100 containers simply gets a thin, empty read-write layer placed on top of those shared read-only layers. Therefore, the total storage consumed initially will be just slightly over 500MB, saving massive amounts of disk space.
Question 2
Section titled “Question 2”Scenario: A developer execs into a running container to troubleshoot an issue and uses vim to append a single line to a 2GB log file located in a lower image layer. Suddenly, the monitoring system alerts that the container’s disk usage has spiked by 2GB. What specific mechanism caused this storage spike, and what exactly happened under the hood?
Show Answer
The storage spike was caused by the copy-on-write (COW) mechanism inherent to union filesystems. Because the lower layers of a container image are strictly read-only, the runtime cannot modify the 2GB log file in place. Instead, the moment the developer saves the file, the entire 2GB file is copied up from the read-only layer into the container’s ephemeral read-write layer. The modification is then applied to this new copy, resulting in an additional 2GB of storage being consumed on the host disk.
Question 3
Section titled “Question 3”Scenario: A junior engineer submits a pull request with the following Dockerfile snippet, claiming they have optimized the image size by cleaning up the apt cache. However, the CI/CD pipeline shows the image size hasn’t decreased at all. Why did this optimization fail, and how must the syntax change to actually reduce the image size?
RUN apt-get updateRUN apt-get install -y curlRUN rm -rf /var/lib/apt/lists/*Show Answer
The optimization failed because each RUN instruction in a Dockerfile creates and commits a brand-new, immutable filesystem layer. By the time the third RUN instruction executes the rm command, the package cache has already been permanently baked into the layers created by the first two instructions. The rm command simply creates a “whiteout” file in the third layer to hide the cache, but the data still exists in the underlying layers and consumes space. To fix this, all three commands must be chained together using && within a single RUN instruction so the cache is deleted before the layer is committed.
Question 4
Section titled “Question 4”Scenario: Your team has deployed a stateful database inside a container without configuring any external volume mounts. After a routine node reboot, the container restarts, but the database is completely empty and all customer records are gone. Based on how union filesystems manage the container lifecycle, why did this data loss occur?
Show Answer
The data loss occurred because the container’s read-write layer is strictly ephemeral and tightly coupled to the lifecycle of that specific container instance. When the container process terminates or is removed, the union filesystem simply discards the writable upper layer where all the database changes were being stored. A restarted container is actually a brand-new container instance with a fresh, empty read-write layer placed over the original image. To persist data beyond a container’s lifecycle, you must bypass the union filesystem entirely by mounting an external volume to the host filesystem.
Question 5
Section titled “Question 5”Scenario: You are designing a high-throughput application that constantly updates millions of small temporary files per second. When running this app locally on your laptop, it performs fine, but inside a container without volumes, the disk I/O latency becomes unacceptably high. Why does the union filesystem cause a performance bottleneck in this specific write-heavy scenario?
Show Answer
The performance bottleneck occurs because union filesystems impose significant overhead for copy-on-write and namespace merging operations. Every time a new file is created or an existing file from a lower layer is modified, the filesystem must intercept the call and manage the allocation in the upper read-write layer. When this happens millions of times per second, the metadata operations and copy overhead overwhelm the storage driver compared to native filesystem speeds. For extremely high-throughput or write-heavy workloads, you must use volume mounts which write directly to the host filesystem, bypassing the overlay driver entirely.
Hands-On Exercise
Section titled “Hands-On Exercise”Exploring Union Filesystems
Section titled “Exploring Union Filesystems”Objective: Understand layers, COW, and container storage.
Environment: Linux with Docker installed
Part 1: Create a Manual Overlay
Section titled “Part 1: Create a Manual Overlay”# 1. Create directoriesmkdir -p /tmp/overlay-test/{lower,upper,work,merged}
# 2. Add content to lowerecho "original file" > /tmp/overlay-test/lower/readme.txtmkdir /tmp/overlay-test/lower/subdirecho "nested file" > /tmp/overlay-test/lower/subdir/nested.txt
# 3. Mount overlaysudo mount -t overlay overlay \ -o lowerdir=/tmp/overlay-test/lower,upperdir=/tmp/overlay-test/upper,workdir=/tmp/overlay-test/work \ /tmp/overlay-test/merged
# 4. Explorels -la /tmp/overlay-test/merged/
# 5. Create new fileecho "new content" > /tmp/overlay-test/merged/newfile.txt
# 6. Check upper layerls /tmp/overlay-test/upper/# newfile.txt is here!
# 7. Modify existing fileecho "modified" > /tmp/overlay-test/merged/readme.txtls /tmp/overlay-test/upper/# readme.txt copied here (COW)
# 8. Delete a filerm /tmp/overlay-test/merged/subdir/nested.txtls -la /tmp/overlay-test/upper/subdir/# Whiteout file created
# 9. Cleanupsudo umount /tmp/overlay-test/mergedrm -rf /tmp/overlay-testPart 2: Examine Docker Layers
Section titled “Part 2: Examine Docker Layers”# 1. Pull an imagedocker pull alpine:3.18
# 2. View layersdocker history alpine:3.18
# 3. Inspect layer IDsdocker inspect alpine:3.18 | jq '.[0].RootFS.Layers'
# 4. Find storage locationdocker info | grep "Docker Root Dir"
# 5. List overlay directoriessudo ls /var/lib/docker/overlay2/ | head -10Part 3: Container Layer in Action
Section titled “Part 3: Container Layer in Action”# 1. Start containerdocker run -d --name test-overlay alpine sleep 3600
# 2. Check initial sizedocker ps -s --filter name=test-overlay
# 3. Write to containerdocker exec test-overlay sh -c 'dd if=/dev/zero of=/bigfile bs=1M count=50'
# 4. Check size againdocker ps -s --filter name=test-overlay# SIZE should show ~50MB now
# 5. See what changeddocker diff test-overlay# Shows: A /bigfile
# 6. Find container layerCONTAINER_ID=$(docker inspect test-overlay --format '{{.Id}}')sudo ls /var/lib/docker/overlay2/ | grep -i ${CONTAINER_ID:0:12} || \ echo "Layer is at: $(docker inspect test-overlay --format '{{.GraphDriver.Data.UpperDir}}')"
# 7. Cleanupdocker rm -f test-overlayPart 4: Dockerfile Layer Optimization
Section titled “Part 4: Dockerfile Layer Optimization”# 1. Create bad Dockerfilemkdir /tmp/dockerfile-test && cd /tmp/dockerfile-testcat > Dockerfile.bad << 'EOF'FROM alpine:3.18RUN apk updateRUN apk add curlRUN rm -rf /var/cache/apk/*EOF
# 2. Build and check sizedocker build -f Dockerfile.bad -t bad-layers .docker images bad-layers
# 3. Create good Dockerfilecat > Dockerfile.good << 'EOF'FROM alpine:3.18RUN apk update && apk add curl && rm -rf /var/cache/apk/*EOF
# 4. Build and comparedocker build -f Dockerfile.good -t good-layers .docker images | grep layers# good-layers should be smaller
# 5. Compare layersdocker history bad-layersdocker history good-layers
# 6. Cleanupdocker rmi bad-layers good-layersrm -rf /tmp/dockerfile-testSuccess Criteria
Section titled “Success Criteria”- Created manual overlay mount and understood COW
- Examined Docker image layers
- Observed container layer growth
- Compared optimized vs unoptimized Dockerfiles
Key Takeaways
Section titled “Key Takeaways”-
Union filesystems merge layers — Multiple read-only layers plus one read-write layer
-
Layer sharing is the magic — Thousands of containers can share the same base layers
-
Copy-on-write for efficiency — Files only copied when modified
-
Dockerfile order matters — Put frequently changing content last for cache efficiency
-
Container layer is ephemeral — Use volumes for persistent data
What’s Next?
Section titled “What’s Next?”Congratulations! You’ve completed Container Primitives. You now understand that containers are:
- Namespaces (isolation)
- Cgroups (limits)
- Capabilities/LSMs (security)
- Union filesystems (efficient storage)
Next, move to Section 3: Networking to learn how Linux networking underpins container and Kubernetes networking.