Skip to content

Module 1.2: Jobs and CronJobs

Hands-On Lab Available
K8s Cluster intermediate 30 min
Launch Lab ↗

Opens in Killercoda in a new tab

Complexity: [MEDIUM] - Essential CKAD skill with specific patterns

Time to Complete: 45-50 minutes

Prerequisites: Module 1.1 (Container Images), understanding of Pods


After completing this module, you will be able to:

  • Create Jobs and CronJobs with correct completion counts, parallelism, and backoff limits
  • Configure CronJob schedules, concurrency policies, and history limits
  • Debug failed Jobs by inspecting pod logs, events, and restart behavior
  • Compare Jobs vs CronJobs and choose the right resource for one-off vs recurring batch workloads

Not every workload runs forever. Backups run once. Reports generate hourly. Data migrations complete and exit. These are batch workloads, and Kubernetes handles them with Jobs and CronJobs.

The CKAD heavily tests Jobs because they’re a core developer task. You’ll see questions like:

  • “Create a Job that runs to completion”
  • “Create a CronJob that runs every 5 minutes”
  • “Fix a failing Job”
  • “Configure parallel Jobs”

The Factory Shift Analogy

Deployments are like permanent factory staff—they clock in and stay until fired. Jobs are like contractors hired for specific tasks—they come in, complete the work, and leave. CronJobs are like scheduled maintenance crews—they arrive at specific times (every night, every Monday), do their job, and depart.


A Job creates one or more Pods and ensures they run to successful completion.

Terminal window
# Simple job
k create job backup --image=busybox -- echo "Backup complete"
# Job with a shell command
k create job report --image=busybox -- /bin/sh -c "date; echo 'Report generated'"
# Generate YAML
k create job backup --image=busybox --dry-run=client -o yaml -- echo "done" > job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: backup-job
spec:
template:
spec:
containers:
- name: backup
image: busybox
command: ["sh", "-c", "echo 'Backing up data' && sleep 10"]
restartPolicy: Never # or OnFailure
backoffLimit: 4 # Retry attempts
ttlSecondsAfterFinished: 100 # Auto-cleanup
PropertyPurposeDefault
restartPolicyWhat to do on failureMust be Never or OnFailure
backoffLimitMax retry attempts6
activeDeadlineSecondsMax job runtimeNone (runs forever)
ttlSecondsAfterFinishedAuto-delete after completionNone (keep forever)
completionsRequired successful completions1
parallelismMax parallel pods1

Pause and predict: A Job requires restartPolicy to be set to either Never or OnFailure. Why can’t you use Always — the default for Deployments? Think about what a Job is supposed to do, then read the explanation.

# Never: Don't restart failed containers (create new pod)
restartPolicy: Never
# Pod fails → New pod created (up to backoffLimit)
# OnFailure: Restart failed container in same pod
restartPolicy: OnFailure
# Container fails → Same pod restarts container

Run one pod, succeed once:

apiVersion: batch/v1
kind: Job
metadata:
name: single-job
spec:
template:
spec:
containers:
- name: worker
image: busybox
command: ["echo", "Single task done"]
restartPolicy: Never

Pattern 2: Multiple Completions (Sequential)

Section titled “Pattern 2: Multiple Completions (Sequential)”

Run task N times, one at a time:

apiVersion: batch/v1
kind: Job
metadata:
name: sequential-job
spec:
completions: 5 # Run 5 times
parallelism: 1 # One at a time
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo Task $JOB_COMPLETION_INDEX"]
restartPolicy: Never

Run multiple pods simultaneously:

apiVersion: batch/v1
kind: Job
metadata:
name: parallel-job
spec:
completions: 10 # 10 total completions
parallelism: 3 # 3 pods at a time
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo Processing batch && sleep 5"]
restartPolicy: Never

Pattern 4: Work Queue (Parallelism Without Completions)

Section titled “Pattern 4: Work Queue (Parallelism Without Completions)”

Process items until queue is empty:

apiVersion: batch/v1
kind: Job
metadata:
name: queue-job
spec:
parallelism: 3 # 3 workers
# No completions: workers process until they exit 0
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "process-queue && exit 0"]
restartPolicy: Never

CronJobs run Jobs on a schedule.

Terminal window
# Every minute
k create cronjob minute-task --image=busybox --schedule="* * * * *" -- echo "Every minute"
# Every hour at minute 30
k create cronjob hourly-task --image=busybox --schedule="30 * * * *" -- date
# Daily at midnight
k create cronjob daily-cleanup --image=busybox --schedule="0 0 * * *" -- echo "Daily cleanup"
# Generate YAML
k create cronjob backup --image=busybox --schedule="0 2 * * *" --dry-run=client -o yaml -- /backup.sh > cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *" # 2 AM daily
concurrencyPolicy: Forbid # Don't overlap
successfulJobsHistoryLimit: 3 # Keep last 3 successful
failedJobsHistoryLimit: 1 # Keep last 1 failed
startingDeadlineSeconds: 200 # Max delay to start
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: busybox
command: ["sh", "-c", "echo 'Backup at $(date)'"]
restartPolicy: OnFailure
┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday = 0)
│ │ │ │ │
* * * * *
ScheduleMeaning
* * * * *Every minute
*/5 * * * *Every 5 minutes
0 * * * *Every hour (at minute 0)
0 */2 * * *Every 2 hours
0 0 * * *Daily at midnight
0 0 * * 0Weekly on Sunday at midnight
0 0 1 * *Monthly on the 1st at midnight
30 4 * * 1-54:30 AM on weekdays

Stop and think: You have a CronJob that runs a database backup every hour, but sometimes the backup takes 75 minutes. What happens when the next scheduled run triggers while the previous one is still running? What policy would you choose: Allow, Forbid, or Replace?

What happens if a new schedule triggers while a Job is still running?

spec:
concurrencyPolicy: Allow # Run concurrent (default)
# or
concurrencyPolicy: Forbid # Skip if previous still running
# or
concurrencyPolicy: Replace # Kill previous, start new
PolicyBehaviorUse Case
AllowRun concurrent jobsIndependent tasks
ForbidSkip if previous runningAvoid resource contention
ReplaceStop previous, start newLatest data matters

How long a Job can be delayed before it’s considered missed:

spec:
startingDeadlineSeconds: 100 # Must start within 100s of schedule

If a Job can’t start within this window (cluster issues, resource constraints), it’s skipped.


Terminal window
# List jobs
k get jobs
# List cronjobs
k get cronjobs
# Get job pods
k get pods -l job-name=my-job
# Check job status
k describe job my-job
# Watch job completion
k get job my-job -w
Terminal window
# Get logs from job's pod
k logs job/my-job
# Get logs from specific pod
k logs my-job-abc12
# Follow logs
k logs -f job/my-job
Terminal window
# Create job from cronjob immediately
k create job manual-backup --from=cronjob/daily-backup
Terminal window
# Delete job
k delete job my-job
# Delete cronjob (also deletes jobs it created)
k delete cronjob my-cronjob
# Delete completed jobs older than TTL
# (Automatic if ttlSecondsAfterFinished is set)

Terminal window
# Check status
k describe job my-job
# Common issues:
# - Container command exits non-zero
# - Image pull fails
# - Resource limits too low
# - restartPolicy not set correctly
# Check pod logs
k logs $(k get pods -l job-name=my-job -o jsonpath='{.items[0].metadata.name}')

What would happen if: You create a Job with backoffLimit: 6 (the default) and restartPolicy: Never. The container’s script has a bug that always exits with code 1. How many pods will Kubernetes create before giving up?

Terminal window
# Check backoffLimit
k get job my-job -o jsonpath='{.spec.backoffLimit}'
# If hitting limit, check why pods fail
k describe pods -l job-name=my-job
Terminal window
# Check cronjob status
k describe cronjob my-cronjob
# Check last schedule time
k get cronjob my-cronjob -o jsonpath='{.status.lastScheduleTime}'
# Check if suspended
k get cronjob my-cronjob -o jsonpath='{.spec.suspend}'
# Resume if suspended
k patch cronjob my-cronjob -p '{"spec":{"suspend":false}}'

  • Jobs track completions with a completion index. In indexed completion mode, each pod knows its index via the JOB_COMPLETION_INDEX environment variable. This is useful for processing sharded data.

  • CronJobs use UTC by default. If you set schedule: "0 9 * * *", it runs at 9 AM UTC, not your local time. Some clusters support timezone annotations.

  • The activeDeadlineSeconds applies to the entire Job runtime. If a Job takes longer than this, Kubernetes terminates it—even if tasks are still running successfully.


MistakeWhy It HurtsSolution
restartPolicy: AlwaysInvalid for JobsUse Never or OnFailure
Forgetting backoffLimitJob retries foreverSet a reasonable limit
Wrong cron syntaxJob never runsValidate with crontab.guru
No ttlSecondsAfterFinishedCompleted jobs accumulateSet auto-cleanup
Overlapping CronJobsResource contentionUse concurrencyPolicy: Forbid

  1. A developer writes a Job YAML with restartPolicy: Always and runs kubectl apply. What happens, and what should they use instead?

    Answer The API server rejects the Job with a validation error. Jobs require `restartPolicy` set to either `Never` or `OnFailure` -- never `Always`. The reason is that Jobs are designed to run to completion and exit. `Always` would restart the container forever, defeating the purpose of a Job. Use `Never` if you want a new pod on each failure (easier to debug via separate pod logs), or `OnFailure` if you want the same pod to retry (uses fewer resources and preserves pod identity).
  2. Your operations team needs a log cleanup script to run at 4:30 AM on weekdays only. Write the CronJob schedule expression and explain what concurrency policy you’d choose if the cleanup sometimes takes over 24 hours.

    Answer The schedule is `"30 4 * * 1-5"` -- minute 30, hour 4, any day of month, any month, Monday through Friday (1-5). If cleanup can exceed 24 hours, use `concurrencyPolicy: Forbid` to skip the next scheduled run while the current one is still going. `Replace` would kill the long-running cleanup mid-operation, potentially leaving data in an inconsistent state. `Allow` would stack up concurrent cleanups competing for the same resources.
  3. You need to process 100 images through a thumbnail generator. Each image takes about 10 seconds. You want to finish as fast as possible but your cluster can only handle 5 extra pods at a time. How do you configure the Job?

    Answer Set `completions: 100` and `parallelism: 5`. Kubernetes will run 5 pods simultaneously, and as each completes, it launches another to maintain 5 active pods until all 100 completions are reached. Total time is roughly 100/5 * 10 seconds = ~200 seconds (about 3.3 minutes), compared to ~1000 seconds (16.7 minutes) if run sequentially. Each pod can use the `JOB_COMPLETION_INDEX` environment variable to know which image to process.
  4. Your CronJob runs every 5 minutes, but you notice completed Job pods are piling up — there are now 200+ finished pods cluttering your namespace. What two settings should you add to prevent this?

    Answer Add `successfulJobsHistoryLimit: 3` and `failedJobsHistoryLimit: 1` to the CronJob spec to retain only recent Job history. Additionally, add `ttlSecondsAfterFinished: 100` to the Job template spec so completed Job pods are automatically garbage-collected after 100 seconds. The history limits control how many CronJob-created Jobs are kept, while TTL controls when individual Job pods are cleaned up. Without these, Kubernetes keeps all completed Jobs indefinitely by default.

Task: Create a backup system with Jobs and CronJobs.

Part 1: One-time Job

Terminal window
# Create a job that simulates a database backup
k create job db-backup --image=busybox -- sh -c "echo 'Backing up database' && sleep 5 && echo 'Backup complete'"
# Watch completion
k get job db-backup -w
# Check logs
k logs job/db-backup

Part 2: Scheduled CronJob

Terminal window
# Create cronjob for hourly cleanup
k create cronjob hourly-cleanup \
--image=busybox \
--schedule="0 * * * *" \
-- sh -c "echo 'Cleanup at $(date)'"
# Manually trigger for testing
k create job manual-cleanup --from=cronjob/hourly-cleanup
# Check results
k get jobs
k logs job/manual-cleanup

Part 3: Parallel Job

# Create parallel-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-process
spec:
completions: 6
parallelism: 2
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo Processing item $JOB_COMPLETION_INDEX && sleep 3"]
restartPolicy: Never
Terminal window
k apply -f parallel-job.yaml
k get pods -l job-name=parallel-process -w

Cleanup:

Terminal window
k delete job db-backup parallel-process
k delete job manual-cleanup
k delete cronjob hourly-cleanup

Drill 1: Basic Job Creation (Target: 2 minutes)

Section titled “Drill 1: Basic Job Creation (Target: 2 minutes)”
Terminal window
# Create a job that:
# - Named: hello-job
# - Runs busybox
# - Echoes "Hello from job"
k create job hello-job --image=busybox -- echo "Hello from job"
# Verify completion
k get job hello-job
# Check logs
k logs job/hello-job
# Cleanup
k delete job hello-job

Drill 2: CronJob with Schedule (Target: 2 minutes)

Section titled “Drill 2: CronJob with Schedule (Target: 2 minutes)”
Terminal window
# Create a cronjob that:
# - Named: every-minute
# - Runs every minute
# - Prints current date
k create cronjob every-minute --image=busybox --schedule="* * * * *" -- date
# Wait 1 minute and check
sleep 65
k get jobs
# Check logs of triggered job
k logs job/$(k get jobs -o jsonpath='{.items[0].metadata.name}')
# Cleanup
k delete cronjob every-minute

Drill 3: Job with Retry (Target: 3 minutes)

Section titled “Drill 3: Job with Retry (Target: 3 minutes)”
Terminal window
# Create a job that fails and retries
cat << 'EOF' | k apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: retry-job
spec:
backoffLimit: 3
template:
spec:
containers:
- name: fail
image: busybox
command: ["sh", "-c", "echo 'Trying...' && exit 1"]
restartPolicy: Never
EOF
# Watch retries
k get pods -l job-name=retry-job -w
# Check job status
k describe job retry-job | grep -A5 Conditions
# Cleanup
k delete job retry-job
Terminal window
# Create a parallel job
cat << 'EOF' | k apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: parallel
spec:
completions: 5
parallelism: 2
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo Worker done && sleep 2"]
restartPolicy: Never
EOF
# Watch parallel execution
k get pods -l job-name=parallel -w
# Verify all completed
k get job parallel
# Cleanup
k delete job parallel

Drill 5: CronJob with Concurrency (Target: 3 minutes)

Section titled “Drill 5: CronJob with Concurrency (Target: 3 minutes)”
Terminal window
# Create cronjob that forbids overlap
cat << 'EOF' | k apply -f -
apiVersion: batch/v1
kind: CronJob
metadata:
name: no-overlap
spec:
schedule: "* * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: worker
image: busybox
command: ["sh", "-c", "echo 'Start' && sleep 90 && echo 'Done'"]
restartPolicy: Never
EOF
# Check policy
k get cronjob no-overlap -o jsonpath='{.spec.concurrencyPolicy}'
# Wait 2 minutes and verify only 1 job runs
sleep 120
k get jobs -l job-name=no-overlap
# Cleanup
k delete cronjob no-overlap

Drill 6: Complete Backup Solution (Target: 8 minutes)

Section titled “Drill 6: Complete Backup Solution (Target: 8 minutes)”

Build a full backup system:

Terminal window
# 1. Create configmap with backup script
k create configmap backup-script --from-literal=script.sh='#!/bin/sh
echo "Starting backup at $(date)"
echo "Compressing data..."
sleep 3
echo "Uploading to storage..."
sleep 2
echo "Backup complete at $(date)"
'
# 2. Create CronJob using the script
cat << 'EOF' | k apply -f -
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-system
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
ttlSecondsAfterFinished: 300
template:
spec:
containers:
- name: backup
image: busybox
command: ["sh", "/scripts/script.sh"]
volumeMounts:
- name: scripts
mountPath: /scripts
restartPolicy: OnFailure
volumes:
- name: scripts
configMap:
name: backup-script
EOF
# 3. Test with manual trigger
k create job test-backup --from=cronjob/backup-system
# 4. Check logs
k logs job/test-backup
# 5. Verify history limits
k get cronjob backup-system -o jsonpath='{.spec.successfulJobsHistoryLimit}'
# Cleanup
k delete cronjob backup-system
k delete job test-backup
k delete configmap backup-script

Module 1.3: Multi-Container Pods - Sidecar, init, and ambassador patterns.