Module 1.2: Docker Fundamentals
Complexity:
[MEDIUM]- Hands-on practice requiredTime to Complete: 45-50 minutes
Prerequisites: Module 1 (What Are Containers?)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Build a Docker image from a Dockerfile and explain what each instruction does
- Run containers with port mapping, environment variables, and volume mounts
- Debug a failing container by reading logs and exec-ing into it
- Explain the image layer system and why layer ordering matters for build speed
Why This Module Matters
Section titled “Why This Module Matters”Docker is the most common tool for building container images. Even though Kubernetes doesn’t use Docker as its runtime anymore, Docker remains the standard for:
- Building container images
- Local development
- Testing and debugging
You need “just enough” Docker to understand Kubernetes—not Docker mastery.
Installing Docker
Section titled “Installing Docker”# Install Docker Desktop# Download from https://docker.com/products/docker-desktop# Or use Homebrew:brew install --cask dockerLinux (Ubuntu/Debian)
Section titled “Linux (Ubuntu/Debian)”# Official installationcurl -fsSL https://get.docker.com -o get-docker.shsudo sh get-docker.shsudo usermod -aG docker $USER# Log out and back in for group changesVerify Installation
Section titled “Verify Installation”docker --version# Docker version 29.x.x, build xxxxx
docker run hello-world# Should show "Hello from Docker!" messageYour First Container
Section titled “Your First Container”Pause and predict: When you run the command below, what happens if port 8080 on your host machine is already in use by another application? The command will fail because Docker cannot bind to an already occupied host port. You would need to choose a different host port, like
-p 8081:80.
# Run nginx (a web server)docker run -d --name my-nginx -p 8080:80 nginx
# What happened:# - Pulled nginx image from Docker Hub# - Created a container from the image# - Started the container in detached mode (-d)# - Named the container "my-nginx" (--name)# - Mapped port 8080 (host) to port 80 (container)# Test itcurl http://localhost:8080# Returns nginx welcome page HTML
# View running containersdocker ps# CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES# a1b2c3d4e5f6 nginx "/docker-entrypoint.…" Up 10 seconds 0.0.0.0:8080->80/tcp my-nginx
# Stop the containerdocker stop my-nginx
# Remove the containerdocker rm my-nginxEssential Docker Commands
Section titled “Essential Docker Commands”Container Lifecycle
Section titled “Container Lifecycle”# Run a containerdocker run [OPTIONS] IMAGE [COMMAND]
# Common options:docker run -d nginx # Detached (background)docker run -it ubuntu bash # Interactive terminaldocker run -p 8080:80 nginx # Port mappingdocker run -v /host/path:/container/path nginx # Volume mountdocker run --name myapp nginx # Named containerdocker run -e MY_VAR=value nginx # Environment variabledocker run --rm nginx # Remove when stopped
# Container managementdocker ps # List running containersdocker ps -a # List all containersdocker stop CONTAINER # Stop gracefullydocker kill CONTAINER # Force stopdocker rm CONTAINER # Remove stopped containerdocker rm -f CONTAINER # Force remove (stop + rm)Inspecting Containers
Section titled “Inspecting Containers”Debugging mindset: When a container isn’t behaving right, these three commands are your debugging toolkit — in this order:
docker logs(what happened?),docker exec -it ... bash(let me look inside),docker inspect(show me the full configuration). This same pattern applies in Kubernetes later:kubectl logs,kubectl exec,kubectl describe.
# View logsdocker logs CONTAINERdocker logs -f CONTAINER # Follow (tail)docker logs --tail 100 CONTAINER # Last 100 lines
# Execute command in running containerdocker exec -it CONTAINER bash # Interactive shelldocker exec CONTAINER ls /app # Run command
# Inspect container detailsdocker inspect CONTAINER # Full JSON detailsdocker stats # Resource usagedocker top CONTAINER # Running processesImage Management
Section titled “Image Management”# Pull imagesdocker pull nginxdocker pull nginx:1.25docker pull gcr.io/project/image:tag
# Note: Docker images are content-addressable. The SHA256 hash of an image is its true identifier. Tags are just human-readable aliases.
# List imagesdocker images
# Remove imagesdocker rmi nginxdocker image prune # Remove unused images
# Build images (we'll cover this next)docker build -t myapp:v1 .Building Container Images
Section titled “Building Container Images”The Dockerfile
Section titled “The Dockerfile”A Dockerfile is a text file with instructions to build an image:
# Base imageFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Copy dependency fileCOPY requirements.txt .
# Install dependencies (use --no-cache-dir to save space by omitting downloaded wheels)RUN pip install --no-cache-dir -r requirements.txt
# Copy application codeCOPY . .
# Expose port (documentation)EXPOSE 8000
# Default commandCMD ["python", "app.py"]Build and Run
Section titled “Build and Run”# Build imagedocker build -t myapp:v1 .
# Run container from imagedocker run -d -p 8000:8000 myapp:v1Dockerfile Instructions
Section titled “Dockerfile Instructions”| Instruction | Purpose |
|---|---|
FROM | Base image to build upon |
WORKDIR | Set working directory |
COPY | Copy files from host to image |
ADD | Like COPY but can extract archives and fetch URLs |
RUN | Execute command during build |
ENV | Set environment variable |
EXPOSE | Document which port the app uses |
CMD | Default command when container starts |
ENTRYPOINT | Command that always runs (CMD becomes arguments) |
Practical Example: Python Web App
Section titled “Practical Example: Python Web App”Create a simple Python application:
app.py
from flask import Flaskimport os
app = Flask(__name__)
@app.route('/')def hello(): name = os.getenv('NAME', 'World') return f'Hello, {name}!'
if __name__ == '__main__': app.run(host='0.0.0.0', port=8000)requirements.txt
flask==3.0.0War Story: The 5GB Docker Context A developer once typed
docker build .in their home directory instead of the project directory. Because they lacked a.dockerignorefile, Docker dutifully attempted to copy their entireDocuments,Downloads, andPicturesfolders into the Docker build context. The build hung for 20 minutes before crashing the machine’s memory. Always use a.dockerignorefile to exclude.git,node_modules, and local environment files to keep your build context small and fast.
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies first (better caching)COPY requirements.txt .# Use --no-cache-dir to prevent pip from storing downloaded packages, keeping the image smallRUN pip install --no-cache-dir -r requirements.txt
# Copy application codeCOPY app.py .
EXPOSE 8000
CMD ["python", "app.py"]Build and Run
# Builddocker build -t hello-flask:v1 .
# Rundocker run -d -p 8000:8000 -e NAME=Docker hello-flask:v1
# Testcurl http://localhost:8000# Hello, Docker!
# Cleanupdocker rm -f $(docker ps -q --filter ancestor=hello-flask:v1)Layer Caching
Section titled “Layer Caching”Stop and think: Think about this — if you change one line of your Python code, do you want Docker to reinstall all your dependencies (which might take 2 minutes)? Or just copy the changed file (which takes 1 second)? The answer depends entirely on the ORDER of instructions in your Dockerfile. Read the BAD vs GOOD example below and see if you can spot why ordering matters.
Docker caches layers for faster builds. Order matters:
# BAD: Code changes invalidate dependency cacheFROM python:3.11-slimWORKDIR /appCOPY . . # Any change busts cacheRUN pip install -r requirements.txt # Reinstalls every time!CMD ["python", "app.py"]
# GOOD: Dependencies cached separatelyFROM python:3.11-slimWORKDIR /appCOPY requirements.txt . # Only changes when deps changeRUN pip install -r requirements.txt # Cached unless deps changeCOPY . . # App changes don't bust pip cacheCMD ["python", "app.py"]Docker Compose (Local Multi-Container)
Section titled “Docker Compose (Local Multi-Container)”Stop and think: If you have a web application and a database, why is it a bad idea to put them both in the same Dockerfile and container? How would you scale the web application independently of the database if they were coupled together?
For local development with multiple services:
compose.yaml
# version: '3.8' # Obsolete in modern Compose, left for backward compatibility
services: web: build: . ports: - "8000:8000" environment: - DATABASE_URL=postgres://db:5432/mydb depends_on: - db
db: image: postgres:15 environment: - POSTGRES_DB=mydb - POSTGRES_PASSWORD=secret volumes: - db_data:/var/lib/postgresql/data
volumes: db_data:# Start all servicesdocker compose up -d
# View logsdocker compose logs -f
# Stop all servicesdocker compose down
# Stop and remove volumesdocker compose down -vNote: Docker Compose is for local development. Kubernetes replaces it for production.
Visualization: Docker Workflow
Section titled “Visualization: Docker Workflow”flowchart TD A[Dockerfile] -->|1. WRITE| B[docker build] B -->|2. BUILD| C[(Image: myapp:v1)] C -->|3. PUSH optional| D[docker push] D --> E[(Registry: Docker Hub, ECR)] C -->|4. RUN| F[docker run] F --> G{{Container: running instance}}Best Practices
Section titled “Best Practices”Base Image Comparison
Section titled “Base Image Comparison”Choosing the right base image is critical for security and size:
| Base Image Type | Example | Size | Security Surface | Best For |
|---|---|---|---|---|
| Full OS | ubuntu:22.04 | ~70MB | Large (contains full utilities) | Complex legacy apps needing many system dependencies |
| Slim | python:3.11-slim | ~40MB | Medium (stripped down Debian) | Most standard applications, good balance of size and compatibility |
| Alpine | python:3.11-alpine | ~15MB | Small (uses musl instead of glibc) | Ultra-small images, but can cause compilation issues with C-extensions |
| Distroless | gcr.io/distroless/static | ~2MB | Minimal (no shell or package manager) | Production deployments of compiled languages (Go, Rust) |
Image Size
Section titled “Image Size”# BAD: Full OS, huge imageFROM ubuntu:22.04RUN apt-get update && apt-get install -y python3 python3-pipCOPY . .RUN pip3 install -r requirements.txt
# GOOD: Slim base, smaller imageFROM python:3.11-slimCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txtCOPY . .
# BETTER: Alpine (tiny base)FROM python:3.11-alpine# Note: Alpine uses apk, not apt, and musl instead of glibcSecurity
Section titled “Security”# BAD: Running as rootFROM python:3.11-slimCOPY . .CMD ["python", "app.py"] # Runs as root!
# GOOD: Non-root userFROM python:3.11-slimRUN useradd -m appuserWORKDIR /appCOPY --chown=appuser:appuser . .USER appuserCMD ["python", "app.py"]One Process Per Container
Section titled “One Process Per Container”# BAD: Multiple services in one containerContainer: nginx + python app + redis
# GOOD: Separate containersContainer 1: nginxContainer 2: python appContainer 3: redis(Use Docker Compose or Kubernetes to orchestrate)Did You Know?
Section titled “Did You Know?”- Multi-stage builds reduce image size dramatically. Build in one stage, copy only artifacts to final stage.
- The
latesttag is not special. Docker doesn’t automatically update it. It’s just a convention that means “whatever was last pushed.” - Docker Desktop is not Docker. Docker (the tool) is free. Docker Desktop (the GUI/VM for Mac/Windows) has licensing requirements for businesses.
- BuildKit is the default builder. Since Docker 23.0, the BuildKit engine is the default, offering parallel execution, better caching, and improved performance over the legacy builder.
- Docker uses containerd under the hood. Docker Engine acts as a client that delegates container execution to
containerdandrunc(the industry-standard OCI runtimes).
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Hurts | Solution |
|---|---|---|
Using latest tag | Unpredictable versions | Use specific tags (:1.25.3) |
| Running as root | Security risk | Add USER instruction |
| Ignoring layer order | Slow rebuilds | Put changing things last |
| Copying everything | Large images, secrets leaked | Use .dockerignore |
| Not cleaning up | Disk fills up | docker system prune regularly |
| Hardcoding secrets | Passwords leaked in image | Use build secrets or inject at runtime |
| Using ADD instead of COPY | Unexpected archive extraction | Default to COPY unless extracting a tarball |
| Ignoring multi-stage builds | Compilers left in production image | Separate build tools from runtime environment |
Exercise: Spot the Bugs
Section titled “Exercise: Spot the Bugs”Stop and think: Look at the following Dockerfile. There are at least three major anti-patterns or bugs. Can you identify them before reading the answers?
FROM node:18COPY . .RUN npm installCMD npm startView the answers
- Missing .dockerignore or selective COPY: Copying
.to.before runningnpm installmeans the localnode_modulesfolder might be copied over, which is bloated and platform-specific. Also, any code change invalidates thenpm installcache. - Ignoring layer caching: It should
COPY package*.json ./first, thenRUN npm install, and thenCOPY . .. This ensures dependencies are only reinstalled when the package.json changes. - Using a full OS base image:
node:18is nearly 1GB. It should usenode:18-slimornode:18-alpineto reduce the attack surface and image pull time. - Running as root: There is no
USERspecified, meaning the Node process runs as the root user inside the container, which is a security risk.
-
Scenario: You are building a Node.js application. Every time you change a single line of CSS, Docker takes 3 minutes to rebuild the image because it reinstalls all NPM packages. How can you rewrite your Dockerfile to fix this build time issue?
Answer
You should separate the copying of your dependency files from your source code. First, `COPY package.json` and `RUN npm install`. Then, use a separate `COPY . .` command for the rest of your source code. This leverages Docker's layer cache: Docker will only rerun the `npm install` layer if the `package.json` file changes, reducing your CSS code changes to a near-instantaneous build. -
Scenario: You have a database container running locally, but when you restart it, all your saved users disappear. Which Docker run flag are you missing, and how does it solve the problem?
Answer
You are missing a volume mount, specifically using the `-v` or `--mount` flag (e.g., `-v db_data:/var/lib/postgresql/data`). By default, containers are ephemeral, meaning any data written to the container's writable layer is destroyed when the container is removed. A volume persists data outside the container's lifecycle directly on the host machine. This ensures your database records survive container restarts or replacements without being permanently lost. -
Scenario: Your web application container starts successfully, but when you visit
localhost:8080, it returns a generic 500 Internal Server Error instead of your application content. What are the first two Docker commands you should run to diagnose this issue, and what are you looking for?Answer
The very first command should be `docker logs` to check the application's standard output and standard error streams for Python/Node stack traces or crash reports. If the logs are inconclusive, the next step is `docker exec -it sh` to get an interactive shell inside the running container. Once inside, you can manually inspect configuration files, verify environment variables, or run local curl commands to see if the internal application process is actually listening on the expected port. This two-step process isolates whether the issue is a crash during startup or a runtime misconfiguration. -
Scenario: A developer attempts to reduce an image size by writing
RUN rm -rf /tmp/large-datain a step immediately following theRUN curl -o /tmp/large-data https://example.com/filestep. However, the final image size does not decrease. Why did the image size remain exactly the same despite deleting the file?Answer
Docker images are built using a union file system where each instruction in the Dockerfile (like `RUN`, `COPY`, `ADD`) creates a new, immutable layer. When the developer downloaded the file, it was permanently baked into that specific layer. The subsequent `RUN rm` command creates a *new* layer that marks the file as deleted, hiding it from the final container view, but the underlying data still exists in the previous layer, consuming disk space. To actually save space, the download and deletion must happen in the exact same `RUN` instruction connected by `&&`. -
Scenario: You are deploying a compiled Go binary to production. Your security team mandates that the container image must contain absolutely zero shell utilities (like
bashorcurl) to minimize the attack surface in case of a breach. Which base image type (Full OS, Slim, Alpine, or Distroless) is the correct choice, and why?Answer
The correct choice is a Distroless base image (such as `gcr.io/distroless/static`). Distroless images contain only your application and its runtime dependencies, stripping away package managers, shells, and standard Unix utilities entirely. Alpine is small but still includes a package manager (`apk`) and a shell (`sh`), which a malicious actor could use if they gained remote code execution. By using Distroless, you guarantee that even if the application is compromised, the attacker has no built-in tools to pivot or download malware. -
Scenario: You have a Docker Compose file defining a
frontendservice and abackendservice. Your frontend code is trying to fetch data usinghttp://localhost:5000/api, but the connection is refused, even though both containers are running. Why islocalhostfailing here, and what should the frontend use instead?Answer
In the context of a container, `localhost` refers exclusively to the container's own internal network interface, not the host machine or other containers. The frontend container is looking for a backend process running inside itself, which doesn't exist. Instead, the frontend should use `http://backend:5000/api` because Docker Compose automatically sets up an internal DNS network where service names resolve to the correct container IP addresses. This internal DNS makes service-to-service communication seamless without needing to hardcode IP addresses. -
Scenario: A container running a batch processing script crashes with an Out of Memory error and is now in an
Exitedstate. You want to rundocker exec -it <container_id> bashto inspect the temporary files it left behind. What happens when you run this command, and what is the proper way to access those files?Answer
The `docker exec` command will fail because it can only spawn new processes inside a container that is currently in a `Running` state. You cannot execute a shell inside a stopped container. To access the files, you should use `docker cp:/path/to/files ./local-dir` to copy the files out to your host machine. Alternatively, you could commit the stopped container to a new image using `docker commit` and then `docker run` a new interactive container from that image to explore its state. -
Scenario: You have a local
archive.tar.gzfile containing static assets that need to be placed directly into the/var/www/htmldirectory of your container image. Should you useADD archive.tar.gz /var/www/html/orCOPY archive.tar.gz /var/www/html/, and what is the functional difference between the two?Answer
You should use `ADD` in this specific scenario because it has a built-in auto-extraction feature for local tar archives. `ADD archive.tar.gz /var/www/html/` will automatically unpack the contents of the zip file directly into the target directory during the build. If you used `COPY`, it would simply move the compressed `.tar.gz` file exactly as it is into the directory, forcing you to write an additional `RUN tar -xzf` command (and require the `tar` utility in the image) to extract it. However, for standard file copying, `COPY` is generally preferred as its behavior is more transparent and predictable.
Hands-On Challenge
Section titled “Hands-On Challenge”Level 1: The Basics (Build and Run)
Section titled “Level 1: The Basics (Build and Run)”Task: Build and run a custom web server image.
- Create a directory and inside it, create an
index.htmlfile with the text “Hello KubeDojo!”. - Create a
Dockerfilethat usesnginx:alpineas the base image. - Copy your
index.htmlto/usr/share/nginx/html/index.htmlinside the image. - Build the image as
dojo-web:v1. - Run the container in detached mode, mapping port 8080 on your host to port 80 in the container.
- Verify by running
curl http://localhost:8080.
View Solution
mkdir dojo-web && cd dojo-webecho "Hello KubeDojo!" > index.html
cat <<EOF > DockerfileFROM nginx:alpineCOPY index.html /usr/share/nginx/html/index.htmlEOF
docker build -t dojo-web:v1 .docker run -d --name dojo-web-container -p 8080:80 dojo-web:v1
# Checkpoint verificationcurl http://localhost:8080Level 2: Intermediate (Environment and Logs)
Section titled “Level 2: Intermediate (Environment and Logs)”Task: Debug a failing container using logs and environment variables.
- Run
docker run -d --name db postgres:15. - Check its status with
docker ps -a. Notice it exited immediately. - Check why it failed using
docker logs db. (Hint: it complains about a missing password). - Remove the failed container.
- Run it again, this time passing the required environment variable
POSTGRES_PASSWORD=secret. - Verify it stays running.
View Solution
docker run -d --name db postgres:15
# Checkpoint: verify it exiteddocker ps -a
# View logs to find the errordocker logs db
# Cleanupdocker rm db
# Run with correct environment variabledocker run -d --name db -e POSTGRES_PASSWORD=secret postgres:15
# Checkpoint: verify it stays runningdocker psLevel 3: Advanced (Optimization and Exec)
Section titled “Level 3: Advanced (Optimization and Exec)”Task: Optimize a build and explore the running container.
- Write a Dockerfile that installs
curlin anubuntubase image. - Ensure you use
apt-get update && apt-get install -y curlin a singleRUNinstruction. Explain why this is important for layer caching. - Build and run the container interactively (
-it) with thebashcommand. - Inside the container, prove you are running as root by typing
whoami. - Type
exitto leave. Notice the container stops. How would you keep it running in the background and execute into it later?
View Solution
cat <<EOF > DockerfileFROM ubuntu:22.04RUN apt-get update && apt-get install -y curlEOF
docker build -t my-ubuntu-curl .
# Run interactivelydocker run -it --name test-curl my-ubuntu-curl bash
# Inside container:# whoami# exit
# To run in background and exec later:docker run -d --name bg-curl my-ubuntu-curl sleep infinitydocker exec -it bg-curl bashExplanation for Layer Caching:
Combining apt-get update and apt-get install ensures that the package index is never cached independently of the packages being installed. If they were separate layers, adding a new package to the install list later would use a stale, cached update layer, potentially leading to “package not found” errors.
Summary
Section titled “Summary”Essential Docker for Kubernetes:
Commands:
docker run- Start containersdocker ps- List containersdocker logs- View outputdocker exec- Run commands in containersdocker build- Create imagesdocker push/pull- Share images
Dockerfile basics:
FROM- Base imageCOPY- Add filesRUN- Execute during buildCMD- Default runtime command
Best practices:
- Use specific image tags
- Optimize layer caching
- Run as non-root
- One process per container
Next Module
Section titled “Next Module”Module 1.3: What Is Kubernetes? - High-level overview of container orchestration.