Skip to content

Module 1.2: Docker Fundamentals

Complexity: [MEDIUM] - Hands-on practice required

Time to Complete: 45-50 minutes

Prerequisites: Module 1 (What Are Containers?)


After this module, you will be able to:

  • Build a Docker image from a Dockerfile and explain what each instruction does
  • Run containers with port mapping, environment variables, and volume mounts
  • Debug a failing container by reading logs and exec-ing into it
  • Explain the image layer system and why layer ordering matters for build speed

Docker is the most common tool for building container images. Even though Kubernetes doesn’t use Docker as its runtime anymore, Docker remains the standard for:

  • Building container images
  • Local development
  • Testing and debugging

You need “just enough” Docker to understand Kubernetes—not Docker mastery.


Terminal window
# Install Docker Desktop
# Download from https://docker.com/products/docker-desktop
# Or use Homebrew:
brew install --cask docker
Terminal window
# Official installation
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Log out and back in for group changes
Terminal window
docker --version
# Docker version 29.x.x, build xxxxx
docker run hello-world
# Should show "Hello from Docker!" message

Pause and predict: When you run the command below, what happens if port 8080 on your host machine is already in use by another application? The command will fail because Docker cannot bind to an already occupied host port. You would need to choose a different host port, like -p 8081:80.

Terminal window
# Run nginx (a web server)
docker run -d --name my-nginx -p 8080:80 nginx
# What happened:
# - Pulled nginx image from Docker Hub
# - Created a container from the image
# - Started the container in detached mode (-d)
# - Named the container "my-nginx" (--name)
# - Mapped port 8080 (host) to port 80 (container)
Terminal window
# Test it
curl http://localhost:8080
# Returns nginx welcome page HTML
# View running containers
docker ps
# CONTAINER ID IMAGE COMMAND STATUS PORTS NAMES
# a1b2c3d4e5f6 nginx "/docker-entrypoint.…" Up 10 seconds 0.0.0.0:8080->80/tcp my-nginx
# Stop the container
docker stop my-nginx
# Remove the container
docker rm my-nginx

Terminal window
# Run a container
docker run [OPTIONS] IMAGE [COMMAND]
# Common options:
docker run -d nginx # Detached (background)
docker run -it ubuntu bash # Interactive terminal
docker run -p 8080:80 nginx # Port mapping
docker run -v /host/path:/container/path nginx # Volume mount
docker run --name myapp nginx # Named container
docker run -e MY_VAR=value nginx # Environment variable
docker run --rm nginx # Remove when stopped
# Container management
docker ps # List running containers
docker ps -a # List all containers
docker stop CONTAINER # Stop gracefully
docker kill CONTAINER # Force stop
docker rm CONTAINER # Remove stopped container
docker rm -f CONTAINER # Force remove (stop + rm)

Debugging mindset: When a container isn’t behaving right, these three commands are your debugging toolkit — in this order: docker logs (what happened?), docker exec -it ... bash (let me look inside), docker inspect (show me the full configuration). This same pattern applies in Kubernetes later: kubectl logs, kubectl exec, kubectl describe.

Terminal window
# View logs
docker logs CONTAINER
docker logs -f CONTAINER # Follow (tail)
docker logs --tail 100 CONTAINER # Last 100 lines
# Execute command in running container
docker exec -it CONTAINER bash # Interactive shell
docker exec CONTAINER ls /app # Run command
# Inspect container details
docker inspect CONTAINER # Full JSON details
docker stats # Resource usage
docker top CONTAINER # Running processes
Terminal window
# Pull images
docker pull nginx
docker pull nginx:1.25
docker pull gcr.io/project/image:tag
# Note: Docker images are content-addressable. The SHA256 hash of an image is its true identifier. Tags are just human-readable aliases.
# List images
docker images
# Remove images
docker rmi nginx
docker image prune # Remove unused images
# Build images (we'll cover this next)
docker build -t myapp:v1 .

A Dockerfile is a text file with instructions to build an image:

# Base image
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Copy dependency file
COPY requirements.txt .
# Install dependencies (use --no-cache-dir to save space by omitting downloaded wheels)
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port (documentation)
EXPOSE 8000
# Default command
CMD ["python", "app.py"]
Terminal window
# Build image
docker build -t myapp:v1 .
# Run container from image
docker run -d -p 8000:8000 myapp:v1
InstructionPurpose
FROMBase image to build upon
WORKDIRSet working directory
COPYCopy files from host to image
ADDLike COPY but can extract archives and fetch URLs
RUNExecute command during build
ENVSet environment variable
EXPOSEDocument which port the app uses
CMDDefault command when container starts
ENTRYPOINTCommand that always runs (CMD becomes arguments)

Create a simple Python application:

app.py

from flask import Flask
import os
app = Flask(__name__)
@app.route('/')
def hello():
name = os.getenv('NAME', 'World')
return f'Hello, {name}!'
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000)

requirements.txt

flask==3.0.0

War Story: The 5GB Docker Context A developer once typed docker build . in their home directory instead of the project directory. Because they lacked a .dockerignore file, Docker dutifully attempted to copy their entire Documents, Downloads, and Pictures folders into the Docker build context. The build hung for 20 minutes before crashing the machine’s memory. Always use a .dockerignore file to exclude .git, node_modules, and local environment files to keep your build context small and fast.

Dockerfile

FROM python:3.11-slim
WORKDIR /app
# Install dependencies first (better caching)
COPY requirements.txt .
# Use --no-cache-dir to prevent pip from storing downloaded packages, keeping the image small
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY app.py .
EXPOSE 8000
CMD ["python", "app.py"]

Build and Run

Terminal window
# Build
docker build -t hello-flask:v1 .
# Run
docker run -d -p 8000:8000 -e NAME=Docker hello-flask:v1
# Test
curl http://localhost:8000
# Hello, Docker!
# Cleanup
docker rm -f $(docker ps -q --filter ancestor=hello-flask:v1)

Stop and think: Think about this — if you change one line of your Python code, do you want Docker to reinstall all your dependencies (which might take 2 minutes)? Or just copy the changed file (which takes 1 second)? The answer depends entirely on the ORDER of instructions in your Dockerfile. Read the BAD vs GOOD example below and see if you can spot why ordering matters.

Docker caches layers for faster builds. Order matters:

# BAD: Code changes invalidate dependency cache
FROM python:3.11-slim
WORKDIR /app
COPY . . # Any change busts cache
RUN pip install -r requirements.txt # Reinstalls every time!
CMD ["python", "app.py"]
# GOOD: Dependencies cached separately
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt . # Only changes when deps change
RUN pip install -r requirements.txt # Cached unless deps change
COPY . . # App changes don't bust pip cache
CMD ["python", "app.py"]

Stop and think: If you have a web application and a database, why is it a bad idea to put them both in the same Dockerfile and container? How would you scale the web application independently of the database if they were coupled together?

For local development with multiple services:

compose.yaml

# version: '3.8' # Obsolete in modern Compose, left for backward compatibility
services:
web:
build: .
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgres://db:5432/mydb
depends_on:
- db
db:
image: postgres:15
environment:
- POSTGRES_DB=mydb
- POSTGRES_PASSWORD=secret
volumes:
- db_data:/var/lib/postgresql/data
volumes:
db_data:
Terminal window
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# Stop all services
docker compose down
# Stop and remove volumes
docker compose down -v

Note: Docker Compose is for local development. Kubernetes replaces it for production.


flowchart TD
A[Dockerfile] -->|1. WRITE| B[docker build]
B -->|2. BUILD| C[(Image: myapp:v1)]
C -->|3. PUSH optional| D[docker push]
D --> E[(Registry: Docker Hub, ECR)]
C -->|4. RUN| F[docker run]
F --> G{{Container: running instance}}

Choosing the right base image is critical for security and size:

Base Image TypeExampleSizeSecurity SurfaceBest For
Full OSubuntu:22.04~70MBLarge (contains full utilities)Complex legacy apps needing many system dependencies
Slimpython:3.11-slim~40MBMedium (stripped down Debian)Most standard applications, good balance of size and compatibility
Alpinepython:3.11-alpine~15MBSmall (uses musl instead of glibc)Ultra-small images, but can cause compilation issues with C-extensions
Distrolessgcr.io/distroless/static~2MBMinimal (no shell or package manager)Production deployments of compiled languages (Go, Rust)
# BAD: Full OS, huge image
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY . .
RUN pip3 install -r requirements.txt
# GOOD: Slim base, smaller image
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# BETTER: Alpine (tiny base)
FROM python:3.11-alpine
# Note: Alpine uses apk, not apt, and musl instead of glibc
# BAD: Running as root
FROM python:3.11-slim
COPY . .
CMD ["python", "app.py"] # Runs as root!
# GOOD: Non-root user
FROM python:3.11-slim
RUN useradd -m appuser
WORKDIR /app
COPY --chown=appuser:appuser . .
USER appuser
CMD ["python", "app.py"]
# BAD: Multiple services in one container
Container: nginx + python app + redis
# GOOD: Separate containers
Container 1: nginx
Container 2: python app
Container 3: redis
(Use Docker Compose or Kubernetes to orchestrate)

  • Multi-stage builds reduce image size dramatically. Build in one stage, copy only artifacts to final stage.
  • The latest tag is not special. Docker doesn’t automatically update it. It’s just a convention that means “whatever was last pushed.”
  • Docker Desktop is not Docker. Docker (the tool) is free. Docker Desktop (the GUI/VM for Mac/Windows) has licensing requirements for businesses.
  • BuildKit is the default builder. Since Docker 23.0, the BuildKit engine is the default, offering parallel execution, better caching, and improved performance over the legacy builder.
  • Docker uses containerd under the hood. Docker Engine acts as a client that delegates container execution to containerd and runc (the industry-standard OCI runtimes).

MistakeWhy It HurtsSolution
Using latest tagUnpredictable versionsUse specific tags (:1.25.3)
Running as rootSecurity riskAdd USER instruction
Ignoring layer orderSlow rebuildsPut changing things last
Copying everythingLarge images, secrets leakedUse .dockerignore
Not cleaning upDisk fills updocker system prune regularly
Hardcoding secretsPasswords leaked in imageUse build secrets or inject at runtime
Using ADD instead of COPYUnexpected archive extractionDefault to COPY unless extracting a tarball
Ignoring multi-stage buildsCompilers left in production imageSeparate build tools from runtime environment

Stop and think: Look at the following Dockerfile. There are at least three major anti-patterns or bugs. Can you identify them before reading the answers?

FROM node:18
COPY . .
RUN npm install
CMD npm start
View the answers
  1. Missing .dockerignore or selective COPY: Copying . to . before running npm install means the local node_modules folder might be copied over, which is bloated and platform-specific. Also, any code change invalidates the npm install cache.
  2. Ignoring layer caching: It should COPY package*.json ./ first, then RUN npm install, and then COPY . .. This ensures dependencies are only reinstalled when the package.json changes.
  3. Using a full OS base image: node:18 is nearly 1GB. It should use node:18-slim or node:18-alpine to reduce the attack surface and image pull time.
  4. Running as root: There is no USER specified, meaning the Node process runs as the root user inside the container, which is a security risk.

  1. Scenario: You are building a Node.js application. Every time you change a single line of CSS, Docker takes 3 minutes to rebuild the image because it reinstalls all NPM packages. How can you rewrite your Dockerfile to fix this build time issue?

    Answer You should separate the copying of your dependency files from your source code. First, `COPY package.json` and `RUN npm install`. Then, use a separate `COPY . .` command for the rest of your source code. This leverages Docker's layer cache: Docker will only rerun the `npm install` layer if the `package.json` file changes, reducing your CSS code changes to a near-instantaneous build.
  2. Scenario: You have a database container running locally, but when you restart it, all your saved users disappear. Which Docker run flag are you missing, and how does it solve the problem?

    Answer You are missing a volume mount, specifically using the `-v` or `--mount` flag (e.g., `-v db_data:/var/lib/postgresql/data`). By default, containers are ephemeral, meaning any data written to the container's writable layer is destroyed when the container is removed. A volume persists data outside the container's lifecycle directly on the host machine. This ensures your database records survive container restarts or replacements without being permanently lost.
  3. Scenario: Your web application container starts successfully, but when you visit localhost:8080, it returns a generic 500 Internal Server Error instead of your application content. What are the first two Docker commands you should run to diagnose this issue, and what are you looking for?

    Answer The very first command should be `docker logs ` to check the application's standard output and standard error streams for Python/Node stack traces or crash reports. If the logs are inconclusive, the next step is `docker exec -it sh` to get an interactive shell inside the running container. Once inside, you can manually inspect configuration files, verify environment variables, or run local curl commands to see if the internal application process is actually listening on the expected port. This two-step process isolates whether the issue is a crash during startup or a runtime misconfiguration.
  4. Scenario: A developer attempts to reduce an image size by writing RUN rm -rf /tmp/large-data in a step immediately following the RUN curl -o /tmp/large-data https://example.com/file step. However, the final image size does not decrease. Why did the image size remain exactly the same despite deleting the file?

    Answer Docker images are built using a union file system where each instruction in the Dockerfile (like `RUN`, `COPY`, `ADD`) creates a new, immutable layer. When the developer downloaded the file, it was permanently baked into that specific layer. The subsequent `RUN rm` command creates a *new* layer that marks the file as deleted, hiding it from the final container view, but the underlying data still exists in the previous layer, consuming disk space. To actually save space, the download and deletion must happen in the exact same `RUN` instruction connected by `&&`.
  5. Scenario: You are deploying a compiled Go binary to production. Your security team mandates that the container image must contain absolutely zero shell utilities (like bash or curl) to minimize the attack surface in case of a breach. Which base image type (Full OS, Slim, Alpine, or Distroless) is the correct choice, and why?

    Answer The correct choice is a Distroless base image (such as `gcr.io/distroless/static`). Distroless images contain only your application and its runtime dependencies, stripping away package managers, shells, and standard Unix utilities entirely. Alpine is small but still includes a package manager (`apk`) and a shell (`sh`), which a malicious actor could use if they gained remote code execution. By using Distroless, you guarantee that even if the application is compromised, the attacker has no built-in tools to pivot or download malware.
  6. Scenario: You have a Docker Compose file defining a frontend service and a backend service. Your frontend code is trying to fetch data using http://localhost:5000/api, but the connection is refused, even though both containers are running. Why is localhost failing here, and what should the frontend use instead?

    Answer In the context of a container, `localhost` refers exclusively to the container's own internal network interface, not the host machine or other containers. The frontend container is looking for a backend process running inside itself, which doesn't exist. Instead, the frontend should use `http://backend:5000/api` because Docker Compose automatically sets up an internal DNS network where service names resolve to the correct container IP addresses. This internal DNS makes service-to-service communication seamless without needing to hardcode IP addresses.
  7. Scenario: A container running a batch processing script crashes with an Out of Memory error and is now in an Exited state. You want to run docker exec -it <container_id> bash to inspect the temporary files it left behind. What happens when you run this command, and what is the proper way to access those files?

    Answer The `docker exec` command will fail because it can only spawn new processes inside a container that is currently in a `Running` state. You cannot execute a shell inside a stopped container. To access the files, you should use `docker cp :/path/to/files ./local-dir` to copy the files out to your host machine. Alternatively, you could commit the stopped container to a new image using `docker commit` and then `docker run` a new interactive container from that image to explore its state.
  8. Scenario: You have a local archive.tar.gz file containing static assets that need to be placed directly into the /var/www/html directory of your container image. Should you use ADD archive.tar.gz /var/www/html/ or COPY archive.tar.gz /var/www/html/, and what is the functional difference between the two?

    Answer You should use `ADD` in this specific scenario because it has a built-in auto-extraction feature for local tar archives. `ADD archive.tar.gz /var/www/html/` will automatically unpack the contents of the zip file directly into the target directory during the build. If you used `COPY`, it would simply move the compressed `.tar.gz` file exactly as it is into the directory, forcing you to write an additional `RUN tar -xzf` command (and require the `tar` utility in the image) to extract it. However, for standard file copying, `COPY` is generally preferred as its behavior is more transparent and predictable.

Task: Build and run a custom web server image.

  1. Create a directory and inside it, create an index.html file with the text “Hello KubeDojo!”.
  2. Create a Dockerfile that uses nginx:alpine as the base image.
  3. Copy your index.html to /usr/share/nginx/html/index.html inside the image.
  4. Build the image as dojo-web:v1.
  5. Run the container in detached mode, mapping port 8080 on your host to port 80 in the container.
  6. Verify by running curl http://localhost:8080.
View Solution
Terminal window
mkdir dojo-web && cd dojo-web
echo "Hello KubeDojo!" > index.html
cat <<EOF > Dockerfile
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/index.html
EOF
docker build -t dojo-web:v1 .
docker run -d --name dojo-web-container -p 8080:80 dojo-web:v1
# Checkpoint verification
curl http://localhost:8080

Level 2: Intermediate (Environment and Logs)

Section titled “Level 2: Intermediate (Environment and Logs)”

Task: Debug a failing container using logs and environment variables.

  1. Run docker run -d --name db postgres:15.
  2. Check its status with docker ps -a. Notice it exited immediately.
  3. Check why it failed using docker logs db. (Hint: it complains about a missing password).
  4. Remove the failed container.
  5. Run it again, this time passing the required environment variable POSTGRES_PASSWORD=secret.
  6. Verify it stays running.
View Solution
Terminal window
docker run -d --name db postgres:15
# Checkpoint: verify it exited
docker ps -a
# View logs to find the error
docker logs db
# Cleanup
docker rm db
# Run with correct environment variable
docker run -d --name db -e POSTGRES_PASSWORD=secret postgres:15
# Checkpoint: verify it stays running
docker ps

Task: Optimize a build and explore the running container.

  1. Write a Dockerfile that installs curl in an ubuntu base image.
  2. Ensure you use apt-get update && apt-get install -y curl in a single RUN instruction. Explain why this is important for layer caching.
  3. Build and run the container interactively (-it) with the bash command.
  4. Inside the container, prove you are running as root by typing whoami.
  5. Type exit to leave. Notice the container stops. How would you keep it running in the background and execute into it later?
View Solution
Terminal window
cat <<EOF > Dockerfile
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl
EOF
docker build -t my-ubuntu-curl .
# Run interactively
docker run -it --name test-curl my-ubuntu-curl bash
# Inside container:
# whoami
# exit
# To run in background and exec later:
docker run -d --name bg-curl my-ubuntu-curl sleep infinity
docker exec -it bg-curl bash

Explanation for Layer Caching: Combining apt-get update and apt-get install ensures that the package index is never cached independently of the packages being installed. If they were separate layers, adding a new package to the install list later would use a stale, cached update layer, potentially leading to “package not found” errors.


Essential Docker for Kubernetes:

Commands:

  • docker run - Start containers
  • docker ps - List containers
  • docker logs - View output
  • docker exec - Run commands in containers
  • docker build - Create images
  • docker push/pull - Share images

Dockerfile basics:

  • FROM - Base image
  • COPY - Add files
  • RUN - Execute during build
  • CMD - Default runtime command

Best practices:

  • Use specific image tags
  • Optimize layer caching
  • Run as non-root
  • One process per container

Module 1.3: What Is Kubernetes? - High-level overview of container orchestration.