Module 7.3: Practical Scripts
Shell Scripting | Complexity:
[MEDIUM]| Time: 25-30 min
Prerequisites
Section titled “Prerequisites”Before starting this module:
- Required: Module 7.1: Bash Fundamentals
- Required: Module 7.2: Text Processing
- Helpful: Experience with operational tasks
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Write production-ready scripts with logging, error handling, and configuration
- Automate common sysadmin tasks (log rotation, health checks, deployment scripts)
- Design scripts that are idempotent (safe to run multiple times)
- Test scripts systematically with edge cases and validate output
Why This Module Matters
Section titled “Why This Module Matters”Writing a script that works once is easy. Writing a script that works reliably in production is harder. This module covers patterns that make scripts maintainable, debuggable, and safe.
Understanding practical scripting helps you:
- Write reliable automation — Scripts that don’t break at 3 AM
- Debug issues faster — Proper logging and error messages
- Maintain scripts — Code others (and future you) can understand
- Handle edge cases — Empty inputs, missing files, network failures
The difference between a hack and automation is error handling.
Did You Know?
Section titled “Did You Know?”-
Most production scripts are under 100 lines — Long scripts should be refactored into multiple scripts or a proper programming language.
-
ShellCheck finds 90% of bugs — A static analysis tool that catches common Bash mistakes before you run the script.
-
Exit codes are contracts — Returning the right exit code lets other scripts and tools (like systemd) know what happened.
-
Temporary files are dangerous — Race conditions, leftover files, and security issues. Use
mktempand cleanup traps.
Script Template
Section titled “Script Template”Production-Ready Starter
Section titled “Production-Ready Starter”#!/bin/bash## Description: Brief description of what this script does# Usage: ./script-name.sh [options] <arguments>#
set -euo pipefail
# === Configuration ===readonly SCRIPT_NAME=$(basename "$0")readonly SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)readonly LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"
# === Logging ===log() { local level=$1 shift local timestamp=$(date '+%Y-%m-%d %H:%M:%S') echo "[$timestamp] [$level] $*" | tee -a "$LOG_FILE"}
log_info() { log "INFO" "$@"; }log_warn() { log "WARN" "$@"; }log_error() { log "ERROR" "$@" >&2; }
# === Error Handling ===die() { log_error "$@" exit 1}
# === Cleanup ===cleanup() { local exit_code=$? # Add cleanup tasks here rm -f "${TEMP_FILE:-}" exit $exit_code}trap cleanup EXIT
# === Argument Parsing ===usage() { cat << EOFUsage: $SCRIPT_NAME [options] <argument>
Description: Brief description of what this script does.
Options: -h, --help Show this help message -v, --verbose Enable verbose output -d, --dry-run Show what would be done
Arguments: argument Description of required argument
Examples: $SCRIPT_NAME -v input.txt $SCRIPT_NAME --dry-run /path/to/fileEOF exit 0}
# === Main Logic ===main() { local verbose=false local dry_run=false
# Parse arguments while [[ $# -gt 0 ]]; do case $1 in -h|--help) usage ;; -v|--verbose) verbose=true; shift ;; -d|--dry-run) dry_run=true; shift ;; -*) die "Unknown option: $1" ;; *) break ;; esac done
# Validate arguments [[ $# -lt 1 ]] && die "Missing required argument. Use -h for help."
local input=$1
# Validate input [[ -f "$input" ]] || die "File not found: $input"
# Do the work log_info "Processing: $input" if [[ "$dry_run" == true ]]; then log_info "Dry run - would process $input" else # Actual processing here log_info "Done" fi}
main "$@"Error Handling Patterns
Section titled “Error Handling Patterns”Safe Mode
Section titled “Safe Mode”#!/bin/bashset -euo pipefail
# -e: Exit on any error# -u: Exit on undefined variable# -o pipefail: Exit on pipe failure
# Sometimes you want to handle errors yourselfset +e # Temporarily disablecommand_that_might_failexit_code=$?set -e # Re-enable
if [[ $exit_code -ne 0 ]]; then echo "Command failed with $exit_code"fiPause and predict: What happens if you run
grep "error" log.txt | wc -landlog.txtdoesn’t exist? How doesset -o pipefailchange the outcome?
Knowledge Check
Section titled “Knowledge Check”What does set -euo pipefail do?
Show Answer
Three separate options:
-e(errexit): Exit immediately if a command returns non-zero-u(nounset): Exit if an undefined variable is used-o pipefail: The pipeline returns the exit code of the rightmost failing command
Without pipefail:
false | true # Exit code 0With pipefail:
false | true # Exit code 1This is the recommended start for reliable scripts.
Trap for Cleanup
Section titled “Trap for Cleanup”# Cleanup on exit, error, or interruptcleanup() { local exit_code=$? log "Cleaning up..." rm -f "$TEMP_FILE" [[ -d "$TEMP_DIR" ]] && rm -rf "$TEMP_DIR" exit $exit_code}
trap cleanup EXIT # Normal exittrap cleanup ERR # On errortrap cleanup INT TERM # Ctrl+C, killKnowledge Check
Section titled “Knowledge Check”Why is rm -rf "$TEMP_DIR" in a trap better than at the end of the script?
Show Answer
The trap runs on ANY exit, including:
- Normal script completion
set -etriggering on error- Ctrl+C (SIGINT)
kill(SIGTERM)
Without a trap, if the script errors out early, the temp directory remains.
trap 'rm -rf "$TEMP_DIR"' EXITTEMP_DIR=$(mktemp -d)The trap is registered before creating the temp dir, ensuring cleanup even if mktemp somehow fails later in a more complex script.
Retry Logic
Section titled “Retry Logic”retry() { local max_attempts=$1 local delay=$2 shift 2 local cmd="$@"
local attempt=1 while [[ $attempt -le $max_attempts ]]; do log_info "Attempt $attempt/$max_attempts: $cmd" if eval "$cmd"; then return 0 fi log_warn "Failed, waiting ${delay}s..." sleep "$delay" ((attempt++)) done
log_error "All $max_attempts attempts failed" return 1}
# Usageretry 3 5 curl -s http://example.com/apiTimeout
Section titled “Timeout”# Using timeout commandtimeout 30 long_running_command
# Check resultif timeout 10 curl -s http://example.com > /dev/null; then echo "Success"else echo "Timeout or failure"fi
# Custom timeout with background processrun_with_timeout() { local timeout=$1 shift "$@" & local pid=$!
( sleep "$timeout"; kill -9 $pid 2>/dev/null ) & local killer=$!
wait $pid 2>/dev/null local result=$?
kill $killer 2>/dev/null return $result}Logging Patterns
Section titled “Logging Patterns”Structured Logging
Section titled “Structured Logging”# Log levelsLOG_LEVEL=${LOG_LEVEL:-INFO}
declare -A LOG_LEVELS=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3)
log() { local level=$1 shift local level_num=${LOG_LEVELS[$level]:-1} local threshold=${LOG_LEVELS[$LOG_LEVEL]:-1}
if [[ $level_num -ge $threshold ]]; then local timestamp=$(date '+%Y-%m-%d %H:%M:%S') printf '[%s] [%s] %s\n' "$timestamp" "$level" "$*" fi}
# UsageLOG_LEVEL=DEBUGlog DEBUG "Detailed info"log INFO "Normal message"log WARN "Warning!"log ERROR "Error!"Log to File and Console
Section titled “Log to File and Console”# Redirect all output to log file while keeping consoleexec > >(tee -a "$LOG_FILE") 2>&1
# Or for specific commandsecho "This goes to console and log" | tee -a "$LOG_FILE"
# Errors to stderr and loglog_error() { echo "[ERROR] $*" | tee -a "$LOG_FILE" >&2}Progress Indication
Section titled “Progress Indication”# Simple progressfor i in {1..100}; do printf "\rProgress: %d%%" "$i" sleep 0.1doneecho
# Spinnerspin() { local pid=$1 local chars="⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏" local i=0 while kill -0 "$pid" 2>/dev/null; do printf "\r${chars:i++%${#chars}:1} Working..." sleep 0.1 done printf "\r"}
long_command &spin $!waitecho "Done!"Input Validation
Section titled “Input Validation”Argument Checking
Section titled “Argument Checking”# Required arguments[[ $# -lt 2 ]] && die "Usage: $0 <source> <dest>"
# Validate file existsvalidate_file() { local file=$1 [[ -f "$file" ]] || die "Not a file: $file" [[ -r "$file" ]] || die "Cannot read: $file"}
# Validate directoryvalidate_dir() { local dir=$1 [[ -d "$dir" ]] || die "Not a directory: $dir" [[ -w "$dir" ]] || die "Cannot write to: $dir"}
# Validate command existsrequire_command() { local cmd=$1 command -v "$cmd" &>/dev/null || die "Required command not found: $cmd"}
require_command kubectlrequire_command jqInput Sanitization
Section titled “Input Sanitization”# Remove dangerous characterssanitize() { local input=$1 # Remove everything except alphanumeric, dash, underscore, dot echo "${input//[^a-zA-Z0-9._-]/}"}
# Validate is numberis_number() { [[ $1 =~ ^[0-9]+$ ]]}
# Validate IP addressis_ip() { [[ $1 =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]}
# Safe defaultport=${1:-8080}is_number "$port" || die "Invalid port: $port"File Handling
Section titled “File Handling”Safe Temporary Files
Section titled “Safe Temporary Files”# Create temp fileTEMP_FILE=$(mktemp)trap 'rm -f "$TEMP_FILE"' EXIT
# Create temp directoryTEMP_DIR=$(mktemp -d)trap 'rm -rf "$TEMP_DIR"' EXIT
# With prefixTEMP_FILE=$(mktemp /tmp/myscript.XXXXXX)
# Never do this (race condition, predictable)# TEMP_FILE=/tmp/myscript.tmp # BAD!Knowledge Check
Section titled “Knowledge Check”What’s wrong with TEMP=/tmp/myscript.tmp?
Show Answer
Several problems:
- Predictable path — Security risk (symlink attacks)
- Race condition — Two instances overwrite each other
- Not cleaned up — If script crashes, file remains
Correct approach:
TEMP=$(mktemp)trap 'rm -f "$TEMP"' EXITmktempcreates unique filenametrapensures cleanup- Permissions are secure by default
Atomic File Operations
Section titled “Atomic File Operations”Writing directly to a configuration file can cause partial reads if another service loads the file before the write finishes. We prevent this using atomic writes.
sequenceDiagram participant Script participant TempFile as /tmp/config.tmp participant ProdFile as /etc/config.yaml participant App as Target Application
Script->>TempFile: 1. Write data (takes time) App--xProdFile: 2. Reads existing intact config Script->>ProdFile: 3. Atomic rename (mv) App->>ProdFile: 4. Reads fully updated configWar Story: In 2018, a major SaaS provider had a cronjob that rebuilt their HAProxy configuration every minute using
cat new_config > /etc/haproxy/haproxy.cfg. Once, the script ran out of memory halfway through thecatcommand. HAProxy automatically reloaded the half-empty configuration file, causing a global load balancer outage that took 45 minutes to resolve. If they had written to a temporary file and usedmv, the partial file would never have been loaded.
# Atomic write (write to temp, then move)atomic_write() { local dest=$1 local temp=$(mktemp "${dest}.XXXXXX")
cat > "$temp" # Write stdin to temp
chmod --reference="$dest" "$temp" 2>/dev/null || chmod 644 "$temp" mv "$temp" "$dest" # Atomic rename}
# Usagegenerate_config | atomic_write /etc/app/config.yamlKnowledge Check
Section titled “Knowledge Check”How do you safely write to a config file that other processes might be reading?
Show Answer
Atomic write — write to temp file, then rename:
generate_config() { echo "key=value" # ...}
DEST=/etc/app/config.yamlTEMP=$(mktemp "${DEST}.XXXXXX")
generate_config > "$TEMP"chmod 644 "$TEMP"mv "$TEMP" "$DEST" # Atomic on same filesystemWhy it works:
mvon same filesystem is atomic (rename syscall)- Other processes never see partial file
- If generation fails, original untouched
File Locking
Section titled “File Locking”Think of file locking like the key to a single-occupancy restroom at a gas station. Only one process can hold the lock at a time. If another process tries to enter the locked section, it must either wait (blocking) or walk away entirely (non-blocking).
stateDiagram-v2 [*] --> CheckLock CheckLock --> AcquireLock: File is unlocked AcquireLock --> ExecuteTask: Lock acquired ExecuteTask --> ReleaseLock: Task finishes/fails ReleaseLock --> [*]: Lock removed
CheckLock --> Fail: File is locked (non-blocking) Fail --> [*]: Exit immediately# Lock file for single instanceLOCK_FILE="/var/run/${SCRIPT_NAME}.lock"
acquire_lock() { exec 9>"$LOCK_FILE" if ! flock -n 9; then die "Another instance is running" fi}
release_lock() { flock -u 9 rm -f "$LOCK_FILE"}
trap release_lock EXITacquire_lockKnowledge Check
Section titled “Knowledge Check”How do you ensure a script only runs one instance at a time?
Show Answer
File locking with flock:
LOCK_FILE="/var/run/myscript.lock"
exec 9>"$LOCK_FILE"if ! flock -n 9; then echo "Another instance is running" >&2 exit 1fi- Opens file descriptor 9 to the lock file
flock -n 9tries to acquire an exclusive lock-nmakes it non-blocking (fail immediately if locked)- Lock is released when script exits
Alternative: Check for PID file, but that has race conditions.
Designing for Idempotency
Section titled “Designing for Idempotency”Idempotency is the property of an operation that can be applied multiple times without changing the result beyond the initial application. In scripting, this means your script should be safe to run again if it fails halfway through.
Unsafe vs. Idempotent Operations
Section titled “Unsafe vs. Idempotent Operations”Not Idempotent (Fails or duplicates on second run):
mkdir /app/configuseradd nginxecho "export ENV=prod" >> /etc/environmentIdempotent (Safe to run repeatedly):
mkdir -p /app/config
if ! id -u nginx >/dev/null 2>&1; then useradd nginxfi
if ! grep -q "^export ENV=prod" /etc/environment; then echo "export ENV=prod" >> /etc/environmentfiStop and think: If your deployment script crashes on step 4 of 10, and you run it again, what happens if steps 1-3 were not written idempotently?
Automating Health Checks
Section titled “Automating Health Checks”Health checks are automated scripts that verify system state. They should be binary (pass/fail) and output clear diagnostic information when they fail.
Endpoint Health Check
Section titled “Endpoint Health Check”check_endpoint() { local url=$1 local expected_status=${2:-200}
local status status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [[ "$status" != "$expected_status" ]]; then log_error "Endpoint $url returned $status (expected $expected_status)" return 1 fi log_info "Endpoint $url is healthy" return 0}Disk Space Health Check
Section titled “Disk Space Health Check”check_disk_space() { local threshold_percent=$1 local mount_point=$2
local usage usage=$(df -h "$mount_point" | awk 'NR==2 {print $5}' | sed 's/%//')
if [[ "$usage" -gt "$threshold_percent" ]]; then log_error "Disk usage on $mount_point is at ${usage}% (threshold: ${threshold_percent}%)" return 1 fi return 0}Common Patterns
Section titled “Common Patterns”Confirm Before Action
Section titled “Confirm Before Action”confirm() { local prompt=${1:-"Continue?"} read -rp "$prompt [y/N] " response [[ "$response" =~ ^[yY]$ ]]}
# Usageif confirm "Delete all files?"; then rm -rf /path/to/filesfi
# With default yesconfirm_yes() { local prompt=${1:-"Continue?"} read -rp "$prompt [Y/n] " response [[ ! "$response" =~ ^[nN]$ ]]}Dry Run Mode
Section titled “Dry Run Mode”DRY_RUN=${DRY_RUN:-false}
run() { if [[ "$DRY_RUN" == true ]]; then echo "[DRY RUN] $*" else "$@" fi}
# Usagerun rm -f /tmp/filerun kubectl delete pod nginxParallel Execution
Section titled “Parallel Execution”# Process files in parallelprocess_parallel() { local max_jobs=$1 shift
local pids=() for item in "$@"; do process_item "$item" & pids+=($!)
if [[ ${#pids[@]} -ge $max_jobs ]]; then wait -n # Wait for any job to finish pids=($(jobs -rp)) # Update running pids fi done wait # Wait for remaining}
# Usageprocess_parallel 4 file1 file2 file3 file4 file5Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| No shebang | Script might run with wrong shell | Always #!/bin/bash |
| Unquoted variables | Breaks on spaces | Always "$var" |
No set -e | Errors ignored | Use set -euo pipefail |
| Hardcoded paths | Not portable | Use variables, find paths |
| No cleanup | Temp files left behind | Use trap EXIT |
| Parsing ls output | Breaks on special filenames | Use globs or find |
Question 1
Section titled “Question 1”You are building a deployment script that needs to append an application database URL to /etc/environment and then restart a background service. During the first run, the service restart fails due to a syntax error in your systemd unit, but the database URL is successfully appended. You fix the unit file and run the script again. What will happen if the script is not idempotent?
Show Answer
The database URL will be appended a second time to /etc/environment. If your script uses a standard echo "DB_URL=..." >> /etc/environment, every subsequent run will add duplicate lines. This can lead to configuration file bloat, unexpected behavior if values conflict, or outright parsing errors.
Why this matters: Scripts must be designed expecting to fail halfway through. To make this idempotent, you must check for the existence of the line before appending:
if ! grep -q "^DB_URL=" /etc/environment; then echo "DB_URL=..." >> /etc/environmentfiAlternatively, use tools like sed to replace the value if the key already exists.
Question 2
Section titled “Question 2”You are writing a critical automated backup script that compresses /var/www and copies the archive to a mounted NFS drive at /mnt/backups. What specific edge cases must you systematically test to ensure this script won’t fail silently or cause damage in production?
Show Answer
You must systematically test the following edge cases:
- Missing source: What happens if
/var/wwwdoesn’t exist? (The script should fail fast and alert). - Unmounted destination: What happens if the NFS drive drops and
/mnt/backupsis just an empty local directory? (The script might fill up the local root partition). - Full destination disk: What happens if there is no space left on the NFS drive? (The script must trap the failure and clean up the partially written archive).
- Permission denial: Does the script run as a user with read access to all files inside
/var/www?
Why this matters: A silent failure in a backup script is catastrophic because you only discover the bug months later when you need to restore data during an emergency. Validating inputs, testing bounds, and handling external system failures (like unmounted drives) ensures your automation reports issues proactively.
Hands-On Exercise
Section titled “Hands-On Exercise”Building a Practical Script
Section titled “Building a Practical Script”Objective: Create a production-quality script using patterns from this module.
Environment: Any Linux system with Bash
Build: Log Analyzer Script
Section titled “Build: Log Analyzer Script”cat > /tmp/log-analyzer.sh << 'SCRIPT'#!/bin/bash## Description: Analyze log files and report statistics# Usage: ./log-analyzer.sh [-v] [-n TOP] <logfile>#
set -euo pipefail
# === Configuration ===readonly SCRIPT_NAME=$(basename "$0")readonly VERSION="1.0.0"
# === Defaults ===VERBOSE=falseTOP_COUNT=10
# === Logging ===log_info() { echo "[INFO] $*"; }log_debug() { [[ "$VERBOSE" == true ]] && echo "[DEBUG] $*" || true; }log_error() { echo "[ERROR] $*" >&2; }
# === Error Handling ===die() { log_error "$@" exit 1}
# === Usage ===usage() { cat << EOFUsage: $SCRIPT_NAME [options] <logfile>
Analyze log files and report statistics.
Options: -h, --help Show this help message -v, --verbose Enable verbose output -n, --top NUM Show top N results (default: 10) --version Show version
Examples: $SCRIPT_NAME /var/log/syslog $SCRIPT_NAME -v -n 5 app.logEOF exit 0}
# === Functions ===count_by_field() { local file=$1 local field=$2 log_debug "Counting by field $field" awk "{print \$$field}" "$file" | sort | uniq -c | sort -rn | head -n "$TOP_COUNT"}
analyze_log() { local file=$1
log_info "Analyzing: $file" echo
# Basic stats local total_lines=$(wc -l < "$file") echo "Total lines: $total_lines" echo
# If it looks like a syslog/access log if head -1 "$file" | grep -qE '^[A-Z][a-z]{2} [0-9]|^[0-9]{4}-[0-9]{2}'; then echo "=== Log Level Distribution ===" grep -oE '(INFO|DEBUG|WARN|WARNING|ERROR|FATAL)' "$file" 2>/dev/null | \ sort | uniq -c | sort -rn || echo "No log levels found" echo fi
# Word frequency echo "=== Most Common Words ===" tr -cs 'A-Za-z' '\n' < "$file" | \ tr '[:upper:]' '[:lower:]' | \ sort | uniq -c | sort -rn | head -n "$TOP_COUNT" echo
log_info "Analysis complete"}
# === Main ===main() { # Parse arguments while [[ $# -gt 0 ]]; do case $1 in -h|--help) usage ;; --version) echo "$SCRIPT_NAME $VERSION"; exit 0 ;; -v|--verbose) VERBOSE=true; shift ;; -n|--top) [[ -n "${2:-}" ]] || die "Missing value for $1" TOP_COUNT=$2 shift 2 ;; -*) die "Unknown option: $1. Use -h for help." ;; *) break ;; esac done
# Validate arguments [[ $# -lt 1 ]] && die "Missing log file. Use -h for help."
local logfile=$1
# Validate input [[ -f "$logfile" ]] || die "File not found: $logfile" [[ -r "$logfile" ]] || die "Cannot read: $logfile" [[ -s "$logfile" ]] || die "File is empty: $logfile"
log_debug "TOP_COUNT=$TOP_COUNT" log_debug "VERBOSE=$VERBOSE"
analyze_log "$logfile"}
main "$@"SCRIPT
chmod +x /tmp/log-analyzer.shTest the Script
Section titled “Test the Script”# Create test logcat > /tmp/test.log << 'EOF'2024-01-15 10:00:00 INFO Application started2024-01-15 10:00:01 DEBUG Loading configuration2024-01-15 10:00:02 INFO Connected to database2024-01-15 10:00:03 WARNING Slow query detected2024-01-15 10:00:04 ERROR Connection timeout2024-01-15 10:00:05 INFO Retrying connection2024-01-15 10:00:06 DEBUG Cache miss2024-01-15 10:00:07 INFO Connection established2024-01-15 10:00:08 ERROR Authentication failed2024-01-15 10:00:09 WARN Rate limit exceeded2024-01-15 10:00:10 INFO Request processed successfullyEOF
# Test runs/tmp/log-analyzer.sh --help/tmp/log-analyzer.sh --version/tmp/log-analyzer.sh /tmp/test.log/tmp/log-analyzer.sh -v -n 5 /tmp/test.log
# Test error handling/tmp/log-analyzer.sh /nonexistent 2>&1 || true/tmp/log-analyzer.sh --invalid 2>&1 || trueExtend the Script
Section titled “Extend the Script”# Add these features:# 1. Output format option (text/json)# 2. Date range filtering# 3. Error-only mode
# Example addition for error-only:# Add to argument parsing:# -e|--errors) ERRORS_ONLY=true; shift ;;
# Add to analyze_log:# if [[ "$ERRORS_ONLY" == true ]]; then# grep -E "ERROR|FATAL" "$file"# return# fiSuccess Criteria
Section titled “Success Criteria”- Script uses
set -euo pipefail - Has proper argument parsing with help
- Validates input files
- Has logging functions
- Handles errors gracefully
- Runs without errors on valid input
Key Takeaways
Section titled “Key Takeaways”-
Start with
set -euo pipefail— Catch errors early -
Use traps for cleanup — Always clean up temp files
-
Validate all inputs — Don’t trust arguments or files
-
Log meaningfully — Future debugging depends on it
-
Dry-run mode is essential — Test safely before executing
What’s Next?
Section titled “What’s Next?”In Module 7.4: DevOps Automation, you’ll apply these patterns to real operational tasks—kubectl wrappers, deployment scripts, and CI/CD automation.
Further Reading
Section titled “Further Reading”- ShellCheck — Lint your scripts
- Bash Strict Mode
- Google Shell Style Guide
- Pure Bash Bible