Module 1.2: Jobs and CronJobs
Complexity:
[MEDIUM]- Essential CKAD skill with specific patternsTime to Complete: 45-50 minutes
Prerequisites: Module 1.1 (Container Images), understanding of Pods
Learning Outcomes
Section titled “Learning Outcomes”After completing this module, you will be able to:
- Create Jobs and CronJobs with correct completion counts, parallelism, and backoff limits
- Configure CronJob schedules, concurrency policies, and history limits
- Debug failed Jobs by inspecting pod logs, events, and restart behavior
- Compare Jobs vs CronJobs and choose the right resource for one-off vs recurring batch workloads
Why This Module Matters
Section titled “Why This Module Matters”Not every workload runs forever. Backups run once. Reports generate hourly. Data migrations complete and exit. These are batch workloads, and Kubernetes handles them with Jobs and CronJobs.
The CKAD heavily tests Jobs because they’re a core developer task. You’ll see questions like:
- “Create a Job that runs to completion”
- “Create a CronJob that runs every 5 minutes”
- “Fix a failing Job”
- “Configure parallel Jobs”
The Factory Shift Analogy
Deployments are like permanent factory staff—they clock in and stay until fired. Jobs are like contractors hired for specific tasks—they come in, complete the work, and leave. CronJobs are like scheduled maintenance crews—they arrive at specific times (every night, every Monday), do their job, and depart.
Jobs: One-Time Tasks
Section titled “Jobs: One-Time Tasks”A Job creates one or more Pods and ensures they run to successful completion.
Creating Jobs Imperatively
Section titled “Creating Jobs Imperatively”# Simple jobk create job backup --image=busybox -- echo "Backup complete"
# Job with a shell commandk create job report --image=busybox -- /bin/sh -c "date; echo 'Report generated'"
# Generate YAMLk create job backup --image=busybox --dry-run=client -o yaml -- echo "done" > job.yamlJob YAML Structure
Section titled “Job YAML Structure”apiVersion: batch/v1kind: Jobmetadata: name: backup-jobspec: template: spec: containers: - name: backup image: busybox command: ["sh", "-c", "echo 'Backing up data' && sleep 10"] restartPolicy: Never # or OnFailure backoffLimit: 4 # Retry attempts ttlSecondsAfterFinished: 100 # Auto-cleanupKey Job Properties
Section titled “Key Job Properties”| Property | Purpose | Default |
|---|---|---|
restartPolicy | What to do on failure | Must be Never or OnFailure |
backoffLimit | Max retry attempts | 6 |
activeDeadlineSeconds | Max job runtime | None (runs forever) |
ttlSecondsAfterFinished | Auto-delete after completion | None (keep forever) |
completions | Required successful completions | 1 |
parallelism | Max parallel pods | 1 |
Pause and predict: A Job requires
restartPolicyto be set to eitherNeverorOnFailure. Why can’t you useAlways— the default for Deployments? Think about what a Job is supposed to do, then read the explanation.
restartPolicy Explained
Section titled “restartPolicy Explained”# Never: Don't restart failed containers (create new pod)restartPolicy: Never# Pod fails → New pod created (up to backoffLimit)
# OnFailure: Restart failed container in same podrestartPolicy: OnFailure# Container fails → Same pod restarts containerJob Patterns
Section titled “Job Patterns”Pattern 1: Single Completion (Default)
Section titled “Pattern 1: Single Completion (Default)”Run one pod, succeed once:
apiVersion: batch/v1kind: Jobmetadata: name: single-jobspec: template: spec: containers: - name: worker image: busybox command: ["echo", "Single task done"] restartPolicy: NeverPattern 2: Multiple Completions (Sequential)
Section titled “Pattern 2: Multiple Completions (Sequential)”Run task N times, one at a time:
apiVersion: batch/v1kind: Jobmetadata: name: sequential-jobspec: completions: 5 # Run 5 times parallelism: 1 # One at a time template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo Task $JOB_COMPLETION_INDEX"] restartPolicy: NeverPattern 3: Parallel Processing
Section titled “Pattern 3: Parallel Processing”Run multiple pods simultaneously:
apiVersion: batch/v1kind: Jobmetadata: name: parallel-jobspec: completions: 10 # 10 total completions parallelism: 3 # 3 pods at a time template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo Processing batch && sleep 5"] restartPolicy: NeverPattern 4: Work Queue (Parallelism Without Completions)
Section titled “Pattern 4: Work Queue (Parallelism Without Completions)”Process items until queue is empty:
apiVersion: batch/v1kind: Jobmetadata: name: queue-jobspec: parallelism: 3 # 3 workers # No completions: workers process until they exit 0 template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "process-queue && exit 0"] restartPolicy: NeverCronJobs: Scheduled Tasks
Section titled “CronJobs: Scheduled Tasks”CronJobs run Jobs on a schedule.
Creating CronJobs Imperatively
Section titled “Creating CronJobs Imperatively”# Every minutek create cronjob minute-task --image=busybox --schedule="* * * * *" -- echo "Every minute"
# Every hour at minute 30k create cronjob hourly-task --image=busybox --schedule="30 * * * *" -- date
# Daily at midnightk create cronjob daily-cleanup --image=busybox --schedule="0 0 * * *" -- echo "Daily cleanup"
# Generate YAMLk create cronjob backup --image=busybox --schedule="0 2 * * *" --dry-run=client -o yaml -- /backup.sh > cronjob.yamlCronJob YAML Structure
Section titled “CronJob YAML Structure”apiVersion: batch/v1kind: CronJobmetadata: name: daily-backupspec: schedule: "0 2 * * *" # 2 AM daily concurrencyPolicy: Forbid # Don't overlap successfulJobsHistoryLimit: 3 # Keep last 3 successful failedJobsHistoryLimit: 1 # Keep last 1 failed startingDeadlineSeconds: 200 # Max delay to start jobTemplate: spec: template: spec: containers: - name: backup image: busybox command: ["sh", "-c", "echo 'Backup at $(date)'"] restartPolicy: OnFailureCron Schedule Format
Section titled “Cron Schedule Format”┌───────────── minute (0 - 59)│ ┌───────────── hour (0 - 23)│ │ ┌───────────── day of month (1 - 31)│ │ │ ┌───────────── month (1 - 12)│ │ │ │ ┌───────────── day of week (0 - 6) (Sunday = 0)│ │ │ │ │* * * * *Common Schedules
Section titled “Common Schedules”| Schedule | Meaning |
|---|---|
* * * * * | Every minute |
*/5 * * * * | Every 5 minutes |
0 * * * * | Every hour (at minute 0) |
0 */2 * * * | Every 2 hours |
0 0 * * * | Daily at midnight |
0 0 * * 0 | Weekly on Sunday at midnight |
0 0 1 * * | Monthly on the 1st at midnight |
30 4 * * 1-5 | 4:30 AM on weekdays |
CronJob Policies
Section titled “CronJob Policies”Stop and think: You have a CronJob that runs a database backup every hour, but sometimes the backup takes 75 minutes. What happens when the next scheduled run triggers while the previous one is still running? What policy would you choose:
Allow,Forbid, orReplace?
concurrencyPolicy
Section titled “concurrencyPolicy”What happens if a new schedule triggers while a Job is still running?
spec: concurrencyPolicy: Allow # Run concurrent (default) # or concurrencyPolicy: Forbid # Skip if previous still running # or concurrencyPolicy: Replace # Kill previous, start new| Policy | Behavior | Use Case |
|---|---|---|
Allow | Run concurrent jobs | Independent tasks |
Forbid | Skip if previous running | Avoid resource contention |
Replace | Stop previous, start new | Latest data matters |
startingDeadlineSeconds
Section titled “startingDeadlineSeconds”How long a Job can be delayed before it’s considered missed:
spec: startingDeadlineSeconds: 100 # Must start within 100s of scheduleIf a Job can’t start within this window (cluster issues, resource constraints), it’s skipped.
Managing Jobs and CronJobs
Section titled “Managing Jobs and CronJobs”Checking Status
Section titled “Checking Status”# List jobsk get jobs
# List cronjobsk get cronjobs
# Get job podsk get pods -l job-name=my-job
# Check job statusk describe job my-job
# Watch job completionk get job my-job -wViewing Logs
Section titled “Viewing Logs”# Get logs from job's podk logs job/my-job
# Get logs from specific podk logs my-job-abc12
# Follow logsk logs -f job/my-jobManual Trigger
Section titled “Manual Trigger”# Create job from cronjob immediatelyk create job manual-backup --from=cronjob/daily-backupCleanup
Section titled “Cleanup”# Delete jobk delete job my-job
# Delete cronjob (also deletes jobs it created)k delete cronjob my-cronjob
# Delete completed jobs older than TTL# (Automatic if ttlSecondsAfterFinished is set)Troubleshooting Jobs
Section titled “Troubleshooting Jobs”Job Won’t Complete
Section titled “Job Won’t Complete”# Check statusk describe job my-job
# Common issues:# - Container command exits non-zero# - Image pull fails# - Resource limits too low# - restartPolicy not set correctly
# Check pod logsk logs $(k get pods -l job-name=my-job -o jsonpath='{.items[0].metadata.name}')What would happen if: You create a Job with
backoffLimit: 6(the default) andrestartPolicy: Never. The container’s script has a bug that always exits with code 1. How many pods will Kubernetes create before giving up?
Job Keeps Retrying
Section titled “Job Keeps Retrying”# Check backoffLimitk get job my-job -o jsonpath='{.spec.backoffLimit}'
# If hitting limit, check why pods failk describe pods -l job-name=my-jobCronJob Not Running
Section titled “CronJob Not Running”# Check cronjob statusk describe cronjob my-cronjob
# Check last schedule timek get cronjob my-cronjob -o jsonpath='{.status.lastScheduleTime}'
# Check if suspendedk get cronjob my-cronjob -o jsonpath='{.spec.suspend}'
# Resume if suspendedk patch cronjob my-cronjob -p '{"spec":{"suspend":false}}'Did You Know?
Section titled “Did You Know?”-
Jobs track completions with a completion index. In indexed completion mode, each pod knows its index via the
JOB_COMPLETION_INDEXenvironment variable. This is useful for processing sharded data. -
CronJobs use UTC by default. If you set
schedule: "0 9 * * *", it runs at 9 AM UTC, not your local time. Some clusters support timezone annotations. -
The
activeDeadlineSecondsapplies to the entire Job runtime. If a Job takes longer than this, Kubernetes terminates it—even if tasks are still running successfully.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Hurts | Solution |
|---|---|---|
restartPolicy: Always | Invalid for Jobs | Use Never or OnFailure |
Forgetting backoffLimit | Job retries forever | Set a reasonable limit |
| Wrong cron syntax | Job never runs | Validate with crontab.guru |
No ttlSecondsAfterFinished | Completed jobs accumulate | Set auto-cleanup |
| Overlapping CronJobs | Resource contention | Use concurrencyPolicy: Forbid |
-
A developer writes a Job YAML with
restartPolicy: Alwaysand runskubectl apply. What happens, and what should they use instead?Answer
The API server rejects the Job with a validation error. Jobs require `restartPolicy` set to either `Never` or `OnFailure` -- never `Always`. The reason is that Jobs are designed to run to completion and exit. `Always` would restart the container forever, defeating the purpose of a Job. Use `Never` if you want a new pod on each failure (easier to debug via separate pod logs), or `OnFailure` if you want the same pod to retry (uses fewer resources and preserves pod identity). -
Your operations team needs a log cleanup script to run at 4:30 AM on weekdays only. Write the CronJob schedule expression and explain what concurrency policy you’d choose if the cleanup sometimes takes over 24 hours.
Answer
The schedule is `"30 4 * * 1-5"` -- minute 30, hour 4, any day of month, any month, Monday through Friday (1-5). If cleanup can exceed 24 hours, use `concurrencyPolicy: Forbid` to skip the next scheduled run while the current one is still going. `Replace` would kill the long-running cleanup mid-operation, potentially leaving data in an inconsistent state. `Allow` would stack up concurrent cleanups competing for the same resources. -
You need to process 100 images through a thumbnail generator. Each image takes about 10 seconds. You want to finish as fast as possible but your cluster can only handle 5 extra pods at a time. How do you configure the Job?
Answer
Set `completions: 100` and `parallelism: 5`. Kubernetes will run 5 pods simultaneously, and as each completes, it launches another to maintain 5 active pods until all 100 completions are reached. Total time is roughly 100/5 * 10 seconds = ~200 seconds (about 3.3 minutes), compared to ~1000 seconds (16.7 minutes) if run sequentially. Each pod can use the `JOB_COMPLETION_INDEX` environment variable to know which image to process. -
Your CronJob runs every 5 minutes, but you notice completed Job pods are piling up — there are now 200+ finished pods cluttering your namespace. What two settings should you add to prevent this?
Answer
Add `successfulJobsHistoryLimit: 3` and `failedJobsHistoryLimit: 1` to the CronJob spec to retain only recent Job history. Additionally, add `ttlSecondsAfterFinished: 100` to the Job template spec so completed Job pods are automatically garbage-collected after 100 seconds. The history limits control how many CronJob-created Jobs are kept, while TTL controls when individual Job pods are cleaned up. Without these, Kubernetes keeps all completed Jobs indefinitely by default.
Hands-On Exercise
Section titled “Hands-On Exercise”Task: Create a backup system with Jobs and CronJobs.
Part 1: One-time Job
# Create a job that simulates a database backupk create job db-backup --image=busybox -- sh -c "echo 'Backing up database' && sleep 5 && echo 'Backup complete'"
# Watch completionk get job db-backup -w
# Check logsk logs job/db-backupPart 2: Scheduled CronJob
# Create cronjob for hourly cleanupk create cronjob hourly-cleanup \ --image=busybox \ --schedule="0 * * * *" \ -- sh -c "echo 'Cleanup at $(date)'"
# Manually trigger for testingk create job manual-cleanup --from=cronjob/hourly-cleanup
# Check resultsk get jobsk logs job/manual-cleanupPart 3: Parallel Job
# Create parallel-job.yamlapiVersion: batch/v1kind: Jobmetadata: name: parallel-processspec: completions: 6 parallelism: 2 template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo Processing item $JOB_COMPLETION_INDEX && sleep 3"] restartPolicy: Neverk apply -f parallel-job.yamlk get pods -l job-name=parallel-process -wCleanup:
k delete job db-backup parallel-processk delete job manual-cleanupk delete cronjob hourly-cleanupPractice Drills
Section titled “Practice Drills”Drill 1: Basic Job Creation (Target: 2 minutes)
Section titled “Drill 1: Basic Job Creation (Target: 2 minutes)”# Create a job that:# - Named: hello-job# - Runs busybox# - Echoes "Hello from job"
k create job hello-job --image=busybox -- echo "Hello from job"
# Verify completionk get job hello-job
# Check logsk logs job/hello-job
# Cleanupk delete job hello-jobDrill 2: CronJob with Schedule (Target: 2 minutes)
Section titled “Drill 2: CronJob with Schedule (Target: 2 minutes)”# Create a cronjob that:# - Named: every-minute# - Runs every minute# - Prints current date
k create cronjob every-minute --image=busybox --schedule="* * * * *" -- date
# Wait 1 minute and checksleep 65k get jobs
# Check logs of triggered jobk logs job/$(k get jobs -o jsonpath='{.items[0].metadata.name}')
# Cleanupk delete cronjob every-minuteDrill 3: Job with Retry (Target: 3 minutes)
Section titled “Drill 3: Job with Retry (Target: 3 minutes)”# Create a job that fails and retriescat << 'EOF' | k apply -f -apiVersion: batch/v1kind: Jobmetadata: name: retry-jobspec: backoffLimit: 3 template: spec: containers: - name: fail image: busybox command: ["sh", "-c", "echo 'Trying...' && exit 1"] restartPolicy: NeverEOF
# Watch retriesk get pods -l job-name=retry-job -w
# Check job statusk describe job retry-job | grep -A5 Conditions
# Cleanupk delete job retry-jobDrill 4: Parallel Job (Target: 4 minutes)
Section titled “Drill 4: Parallel Job (Target: 4 minutes)”# Create a parallel jobcat << 'EOF' | k apply -f -apiVersion: batch/v1kind: Jobmetadata: name: parallelspec: completions: 5 parallelism: 2 template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo Worker done && sleep 2"] restartPolicy: NeverEOF
# Watch parallel executionk get pods -l job-name=parallel -w
# Verify all completedk get job parallel
# Cleanupk delete job parallelDrill 5: CronJob with Concurrency (Target: 3 minutes)
Section titled “Drill 5: CronJob with Concurrency (Target: 3 minutes)”# Create cronjob that forbids overlapcat << 'EOF' | k apply -f -apiVersion: batch/v1kind: CronJobmetadata: name: no-overlapspec: schedule: "* * * * *" concurrencyPolicy: Forbid jobTemplate: spec: template: spec: containers: - name: worker image: busybox command: ["sh", "-c", "echo 'Start' && sleep 90 && echo 'Done'"] restartPolicy: NeverEOF
# Check policyk get cronjob no-overlap -o jsonpath='{.spec.concurrencyPolicy}'
# Wait 2 minutes and verify only 1 job runssleep 120k get jobs -l job-name=no-overlap
# Cleanupk delete cronjob no-overlapDrill 6: Complete Backup Solution (Target: 8 minutes)
Section titled “Drill 6: Complete Backup Solution (Target: 8 minutes)”Build a full backup system:
# 1. Create configmap with backup scriptk create configmap backup-script --from-literal=script.sh='#!/bin/shecho "Starting backup at $(date)"echo "Compressing data..."sleep 3echo "Uploading to storage..."sleep 2echo "Backup complete at $(date)"'
# 2. Create CronJob using the scriptcat << 'EOF' | k apply -f -apiVersion: batch/v1kind: CronJobmetadata: name: backup-systemspec: schedule: "*/5 * * * *" concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: ttlSecondsAfterFinished: 300 template: spec: containers: - name: backup image: busybox command: ["sh", "/scripts/script.sh"] volumeMounts: - name: scripts mountPath: /scripts restartPolicy: OnFailure volumes: - name: scripts configMap: name: backup-scriptEOF
# 3. Test with manual triggerk create job test-backup --from=cronjob/backup-system
# 4. Check logsk logs job/test-backup
# 5. Verify history limitsk get cronjob backup-system -o jsonpath='{.spec.successfulJobsHistoryLimit}'
# Cleanupk delete cronjob backup-systemk delete job test-backupk delete configmap backup-scriptNext Module
Section titled “Next Module”Module 1.3: Multi-Container Pods - Sidecar, init, and ambassador patterns.