Module 0.1: The CLI Power User (Search & Streams)
Complexity: [QUICK] Time to Complete: 45 minutes Prerequisites: Zero to Terminal (Module 0.8)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Chain commands with pipes, redirects, and subshells to build powerful one-liners (Assessed in Quiz Q6, Exercise Task 6)
- Search files and content efficiently using find, grep, and xargs (Assessed in Quiz Q1, Q4, Q7, Exercise Task 5)
- Process text streams with cut, sort, uniq, awk, and sed for log analysis (Assessed in Quiz Q5, Exercise Tasks 3 & 4)
- Redirect stdout, stderr, and combine streams for scripting and debugging (Assessed in Quiz Q2, Q3, Q8, Exercise Task 2)
Why This Module Matters
Section titled “Why This Module Matters”Picture this: it is 2 AM and your pager goes off. The web application is returning 500 errors. Somewhere in 47 log files spread across /var/log, there is a clue — a stack trace, a timeout, a connection refused. You could open each file one by one, scroll through thousands of lines with your tired eyes, and hope you spot the word “error” before sunrise. Or you could type one line:
grep -ri "error" /var/log/app/ | grep -i "database" | tail -20Three seconds later, you have the 20 most recent database-related errors across every log file. Problem identified. Fix deployed. Back to sleep by 2:15 AM.
That is the difference this module makes. You already know how to navigate the filesystem and run basic commands from Module 0.8. Now you are going to learn how to combine those commands into powerful workflows. You will learn to search for files, search inside files, control where output goes, and chain commands together like building blocks.
These are not advanced tricks. These are everyday tools that Linux professionals use hundreds of times a day. By the end of this module, you will feel like you have a superpower — because you will.
1. Wildcards: Pattern Matching for the Impatient
Section titled “1. Wildcards: Pattern Matching for the Impatient”Typing out full file names is slow and error-prone. Wildcards (also called “globs”) let you match multiple files using patterns. The shell expands them before the command even runs.
The Asterisk * — Match Anything
Section titled “The Asterisk * — Match Anything”The * matches zero or more characters of any kind.
# List every .txt file in the current directoryls *.txt# Matches: notes.txt, readme.txt, a.txt# Does NOT match: notes.txt.bak (doesn't end with .txt)
# List every file that starts with "report"ls report*# Matches: report.pdf, report_2024.csv, report-final.docx, reports/
# Remove all JPEG imagesrm *.jpg# Matches: photo.jpg, screenshot.jpg, a.jpg
# Copy all Python files to a backup directorycp *.py ~/backup/
# List files with "config" anywhere in the namels *config*# Matches: config.yaml, app-config.json, myconfig, config_backup.tar.gzThe Question Mark ? — Match Exactly One Character
Section titled “The Question Mark ? — Match Exactly One Character”The ? matches exactly one character — no more, no less.
# List files like file1.txt, file2.txt, fileA.txtls file?.txt# Matches: file1.txt, fileA.txt, filex.txt# Does NOT match: file12.txt (two characters where ? expects one)# Does NOT match: file.txt (zero characters where ? expects one)
# Match two-character extensionsls *.??# Matches: archive.gz, image.py, data.js# Does NOT match: readme.txt (three-character extension)
# Match log files with single-digit rotation numberls app.log.?# Matches: app.log.1, app.log.2, app.log.9# Does NOT match: app.log.10Square Brackets [] — Match From a Set
Section titled “Square Brackets [] — Match From a Set”Square brackets match exactly one character, but only from the characters you specify.
# Match specific charactersls file[abc].txt# Matches: filea.txt, fileb.txt, filec.txt# Does NOT match: filed.txt
# Match a range of charactersls data[0-9].csv# Matches: data0.csv, data1.csv, ... data9.csv
# Match uppercase lettersls report[A-Z].pdf# Matches: reportA.pdf, reportB.pdf, ... reportZ.pdf
# Combine rangesls log[0-9a-f].txt# Matches: log0.txt, loga.txt, logf.txt# Does NOT match: logg.txt (g is outside the range)
# Negate with ! or ^ -- match anything EXCEPT thesels file[!0-9].txt# Matches: filea.txt, fileZ.txt# Does NOT match: file1.txt, file9.txtCurly Braces {} — Generate Multiple Strings
Section titled “Curly Braces {} — Generate Multiple Strings”Curly braces are not technically wildcards (they are a Bash expansion feature called “brace expansion”), but they are used constantly alongside wildcards.
# Create multiple files at oncetouch report_{jan,feb,mar}.txt# Creates: report_jan.txt, report_feb.txt, report_mar.txt
# Backup a file with a new extensioncp config.yaml{,.bak}# Expands to: cp config.yaml config.yaml.bak
# Create a directory structuremkdir -p project/{src,tests,docs}# Creates three subdirectories inside project/Key insight: The shell expands wildcards before the command sees them. When you type
ls *.txt, the shell turns it into something likels notes.txt readme.txt todo.txtand then runslswith those three arguments. Thelscommand never sees the*at all.
2. Streams: How Data Flows Through Commands
Section titled “2. Streams: How Data Flows Through Commands”Every single command in Linux works with three invisible channels of data. Understanding these channels is the foundation for everything else in this module.
┌─────────────────────┐ Keyboard -------->│ │--------> Screen (stdin) │ Your Command │ (stdout) Stream 0 │ (e.g., grep) │ Stream 1 │ │ │ │--------> Screen └─────────────────────┘ (stderr) Stream 2Pause and predict: If you run a command that produces both normal output and an error message, and you redirect stream 1 (
>) to a file, what will you see on your terminal screen?
- stdin (stream 0): Where the command reads its input. By default, this is your keyboard. But it can be a file or the output of another command.
- stdout (stream 1): Where the command sends its normal results. By default, this is your terminal screen.
- stderr (stream 2): Where the command sends error messages. Also goes to your screen by default — but it is a separate stream from stdout.
Why are stdout and stderr separate? Because you often want to handle them differently. You might want to save normal output to a file while still seeing errors on screen. Or you might want to hide errors while keeping the output. The separation gives you that control.
3. Redirection: Controlling Where Output Goes
Section titled “3. Redirection: Controlling Where Output Goes”Redirection lets you reroute those three streams — sending output to files instead of the screen, or reading input from files instead of the keyboard.
> Redirect Output (Overwrite)
Section titled “> Redirect Output (Overwrite)”Sends stdout to a file. Warning: this destroys the existing contents of the file.
# Save a directory listing to a filels -la /etc > etc_contents.txt
# Save the current date and timedate > timestamp.txt
# Save your system's hostnamehostname > server_info.txt>> Redirect Output (Append)
Section titled “>> Redirect Output (Append)”Adds stdout to the end of a file. Existing contents are preserved.
# Build a log file over timeecho "=== Deploy started ===" >> deploy.logecho "Version: 2.4.1" >> deploy.logecho "=== Deploy finished ===" >> deploy.log
# Append today's disk usage to a daily reportdf -h >> /var/log/daily_disk_report.txtReal-world use case — simple logging:
# Every time this script runs, it appends a timestamped entryecho "$(date): Backup completed successfully" >> /var/log/backup.log2> Redirect Errors
Section titled “2> Redirect Errors”Sends only stderr (stream 2) to a file. Normal output still goes to the screen.
# Try listing a directory you can't access -- send errors to a filels /root 2> errors.log# The "Permission denied" error goes to errors.log, not your screen
# Run find without being spammed by permission errorsfind / -name "nginx.conf" 2> /dev/null# /dev/null is Linux's black hole -- errors vanish, results appear on screen&> Redirect Everything
Section titled “&> Redirect Everything”Sends both stdout and stderr to the same file.
# Capture all output from a build processmake build &> build_output.log
# Run a script and capture everything for debugging./deploy.sh &> deploy_full_log.txt< Redirect Input
Section titled “< Redirect Input”Sends the contents of a file into a command’s stdin (instead of typing from the keyboard).
# Count lines, words, and characters in a filewc < report.txt
# Sort the contents of a filesort < unsorted_names.txt
# Send an email with file contents as the body (if mail is configured)mail -s "Report" admin@example.com < report.txtCombining Redirections
Section titled “Combining Redirections”You can redirect stdout and stderr to different files:
# Save output to one file and errors to anotherfind / -name "*.conf" > results.txt 2> errors.txt
# Save output to a file, throw away errorsgrep -r "TODO" /project > todos.txt 2> /dev/nullThe /dev/null trick: /dev/null is a special file that discards everything written to it. Think of it as a black hole. It is incredibly useful when you want to suppress output you do not care about.
# Run a command silently -- no output, no errorscommand_that_is_noisy > /dev/null 2>&1
# Check if a command succeeds without caring about outputif grep -q "ready" status.txt 2> /dev/null; then echo "System is ready"fi4. Pipes: The Assembly Line
Section titled “4. Pipes: The Assembly Line”Pipes (|) are the single most powerful concept in this module. A pipe takes the stdout of one command and feeds it directly into the stdin of the next command. No temporary files needed.
Think of it like a factory assembly line: each machine (command) does one small job and passes the result to the next machine.
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐│ cmd1 │--->│ cmd2 │--->│ cmd3 │--->│ cmd4 │---> Final output└──────┘ └──────┘ └──────┘ └──────┘ stdout stdout stdout stdout becomes becomes becomes stdin stdin stdinStop and think: If
cmd2in the assembly line fails and produces an error message (stderr), does that error flow intocmd3? (Hint: look at which stream the pipe connects!)
Starting Simple
Section titled “Starting Simple”# List files and scroll through the output page by pagels -la /etc | less
# Count how many files are in a directoryls /etc | wc -l
# Show only the first 5 largest filesdu -sh /var/log/* | sort -rh | head -5Building Up: Two-Command Pipes
Section titled “Building Up: Two-Command Pipes”# Find all running processes containing "nginx"ps aux | grep nginx
# Show disk usage sorted by size (largest first)du -sh /home/* | sort -rh
# List only directories in the current locationls -la | grep "^d"
# Show unique shells used on the systemcat /etc/passwd | cut -d: -f7 | sort -uThe Real Power: Multi-Command Chains
Section titled “The Real Power: Multi-Command Chains”Here is where pipes really shine. Each step does one simple thing, but together they solve complex problems.
# Find the 5 IP addresses that hit your web server most oftencat /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -5# Step by step:# 1. cat: read the log file# 2. awk '{print $1}': extract the first field (IP address) from each line# 3. sort: sort the IPs alphabetically (required for uniq)# 4. uniq -c: count consecutive duplicate lines# 5. sort -rn: sort by count, numerically, in reverse (highest first)# 6. head -5: show only the top 5
# Find which processes are using the most memoryps aux | sort -k4 -rn | head -10 | awk '{print $4"% "$11}'# Shows: memory percentage and command name for top 10 processes
# Count how many times each HTTP status code appears in a logcat access.log | awk '{print $9}' | sort | uniq -c | sort -rn# Output might look like:# 15234 200# 2341 304# 187 404# 23 500The “Before and After” — Why Pipes Change Everything
Section titled “The “Before and After” — Why Pipes Change Everything”Without pipes (the hard way):
# Step 1: Save all process info to a temp fileps aux > /tmp/all_processes.txt# Step 2: Search that file for pythongrep python /tmp/all_processes.txt > /tmp/python_processes.txt# Step 3: Count the lineswc -l /tmp/python_processes.txt# Step 4: Clean up temp filesrm /tmp/all_processes.txt /tmp/python_processes.txtWith pipes (the power user way):
ps aux | grep python | wc -lOne line. No temp files. No cleanup. Same result.
5. find: Locating Files Like a Detective
Section titled “5. find: Locating Files Like a Detective”The find command searches the filesystem for files and directories that match criteria you specify. Unlike ls, it searches recursively through subdirectories by default.
Basic syntax: find [where to look] [what to match]
Find by Name
Section titled “Find by Name”# Find a file named exactly "config.yaml" starting from current directoryfind . -name "config.yaml"
# Find all YAML files (case-sensitive)find . -name "*.yaml"
# Find all YAML files (case-insensitive -- also matches .YAML, .Yaml)find . -iname "*.yaml"
# Find in a specific directoryfind /etc -name "*.conf"Find by Type
Section titled “Find by Type”# Find only directoriesfind . -type d
# Find only regular files (not directories, not symlinks)find . -type f
# Find only symbolic linksfind . -type l
# Find all directories named "test"find . -type d -name "test"Find by Size — Real-World Scenarios
Section titled “Find by Size — Real-World Scenarios”# Find files larger than 100 MB (hunting disk space hogs)find /var -type f -size +100M
# Find files larger than 1 GBfind / -type f -size +1G 2> /dev/null
# Find small files (less than 1 KB) -- possible empty or junk filesfind . -type f -size -1k
# Find files exactly 0 bytes (empty files)find . -type f -emptyFind by Time — What Changed Recently?
Section titled “Find by Time — What Changed Recently?”# Find files modified in the last 24 hoursfind /var/log -type f -mtime -1
# Find files modified in the last 30 minutesfind . -type f -mmin -30
# Find files NOT modified in the last 90 days (stale files)find /home -type f -mtime +90
# Find files accessed in the last hourfind . -type f -amin -60Find with Actions
Section titled “Find with Actions”find can do things with the files it discovers, not just list them.
# Delete all .tmp files (BE CAREFUL -- no undo!)find /tmp -name "*.tmp" -delete
# Run a command on each file found (-exec)find . -name "*.log" -exec wc -l {} \;# {} is replaced with each filename found# \; marks the end of the -exec command
# Change permissions on all shell scriptsfind . -name "*.sh" -exec chmod +x {} \;
# Print results with details (like ls -l)find . -name "*.conf" -lsTip: Always quote your patterns in
find. If you writefind . -name *.txtwithout quotes and there are .txt files in your current directory, the shell will expand*.txtbeforefindsees it, and you will get unexpected results. Always usefind . -name "*.txt".
6. grep: Searching Inside Files
Section titled “6. grep: Searching Inside Files”While find locates files by name, size, or date, grep looks inside files for text that matches a pattern. It prints every line that contains a match.
Basic syntax: grep [options] "pattern" file
Basic Searches
Section titled “Basic Searches”# Find lines containing "error" in a log filegrep "error" /var/log/syslog
# Case-insensitive search (matches Error, ERROR, error, eRRoR)grep -i "error" /var/log/syslog
# Search multiple files at oncegrep "timeout" server1.log server2.log server3.log
# Search all files in a directory recursivelygrep -r "password" /etc/# CAUTION: this can reveal sensitive information -- use responsiblyContext — Show Surrounding Lines
Section titled “Context — Show Surrounding Lines”When you find a match, you often need to see what is happening around it. That is what -A, -B, and -C are for.
# Show 3 lines AFTER each match (A = After)grep -A 3 "Exception" app.log# Great for seeing stack traces that follow an error
# Show 2 lines BEFORE each match (B = Before)grep -B 2 "failed" deploy.log# Great for seeing what command caused a failure
# Show 3 lines before AND after each match (C = Context)grep -C 3 "timeout" app.log# The full picture around each match
# Combine with case-insensitivegrep -i -C 5 "critical" /var/log/syslogCounting and Listing
Section titled “Counting and Listing”# Count matching lines (instead of showing them)grep -c "404" access.log# Output: 187 (just the number)
# Show only filenames that contain a match (not the lines themselves)grep -rl "TODO" /project/src/# Output: list of files, one per line
# Show line numbers alongside matchesgrep -n "def main" *.py# Output: main.py:42:def main():Inverting and Exact Matching
Section titled “Inverting and Exact Matching”# Show lines that do NOT match (invert)grep -v "DEBUG" app.log# Shows everything except debug lines -- great for reducing noise
# Match whole words only (not partial matches)grep -w "error" app.log# Matches "error" but NOT "errors" or "error_handler"
# Match at the start or end of a linegrep "^#" config.conf # Lines starting with # (comments)grep "\.conf$" filelist # Lines ending with .confgrep with Pipes — The Most Common Pattern
Section titled “grep with Pipes — The Most Common Pattern”You will use grep with pipes more than any other combination in Linux.
# Find running processes by nameps aux | grep nginx
# Filter command historyhistory | grep "docker"
# Find listening network ports for a specific servicess -tlnp | grep 8080
# Check if a package is installed (Debian/Ubuntu)dpkg -l | grep "nginx"
# Find environment variables related to Javaenv | grep -i java7. The Power Combo: find + grep + Pipes
Section titled “7. The Power Combo: find + grep + Pipes”The real magic happens when you combine these tools. This is how experienced Linux users solve real problems.
Scenario 1: Find All Config Files Containing a Specific Database Host
Section titled “Scenario 1: Find All Config Files Containing a Specific Database Host”find /etc -name "*.conf" -exec grep -l "db.example.com" {} \;This finds every .conf file under /etc, then checks each one for the text “db.example.com”, and prints only the filenames that contain it.
Scenario 2: Search for a Function Definition Across a Project
Section titled “Scenario 2: Search for a Function Definition Across a Project”find . -name "*.py" | xargs grep -n "def process_order"Here, find lists all Python files, and xargs passes that list to grep, which searches each file for the function definition and shows the line number.
Trade-Off: find -exec vs xargs
Section titled “Trade-Off: find -exec vs xargs”You might notice two ways to run commands on found files: find . -name "*.log" -exec grep "ERROR" {} \; and find . -name "*.log" | xargs grep "ERROR". Which is better? It depends entirely on scale.
The -exec ... \; method runs the command once for every single file. If you find 10,000 files, Linux starts the grep process 10,000 times. This is completely safe regarding weird filenames, but it is extremely slow due to process overhead.
The xargs method batches the files. It runs grep "ERROR" file1 file2 file3 ... up to the system’s character limit, drastically reducing the number of processes started. It is blazing fast for massive directories. However, xargs can stumble if filenames contain spaces (unless you use the safer find -print0 | xargs -0 variant). Rule of thumb: Use -exec for a few files or complex command strings, but switch to xargs when performance matters and you are parsing thousands of files.
Scenario 3: Find Large Log Files Modified Today
Section titled “Scenario 3: Find Large Log Files Modified Today”find /var/log -name "*.log" -size +10M -mtime -1 -exec ls -lh {} \;Finds log files bigger than 10 MB that were modified in the last 24 hours, and shows their details.
Scenario 4: Count TODO Comments Across a Codebase
Section titled “Scenario 4: Count TODO Comments Across a Codebase”find . -name "*.js" -o -name "*.ts" | xargs grep -c "TODO" | grep -v ":0$"Finds all JavaScript and TypeScript files, counts TODO occurrences in each, and filters out files with zero TODOs.
Scenario 5: Find and Kill a Runaway Process
Section titled “Scenario 5: Find and Kill a Runaway Process”ps aux | grep "[r]unaway_script" | awk '{print $2}' | xargs killThe [r] trick prevents grep from matching its own process. awk extracts the PID (second column), and xargs passes it to kill.
War Story: The 3 AM Log Hunt
Section titled “War Story: The 3 AM Log Hunt”A few years ago, a junior engineer got paged because the company’s payment processing was failing. They SSH’d into the server, opened /var/log/app.log in nano, and started scrolling. The file was 2 GB. Nano froze. They tried vi. It loaded, slowly, and they started searching manually. Forty-five minutes later, they still had not found the issue.
A senior engineer joined the call and typed:
grep -i "payment" /var/log/app.log | grep -i "error" | tail -20Eight seconds later they had the answer: a third-party payment gateway had changed their API endpoint. The fix was a one-line config change.
The junior engineer could have found it in under a minute with the tools you just learned. The lesson is not that scrolling through files is bad — it is that knowing your tools transforms you from someone who reacts slowly to someone who solves problems fast. That is what makes you valuable in an on-call rotation, in a job interview, and in your daily work.
Did You Know?
Section titled “Did You Know?”-
The name
grepis an acronym. It stands for “Global Regular Expression Print.” It comes from a command in the ancientedtext editor:g/re/p. The tool was written by Ken Thompson in 1973 and has barely changed since — because it did not need to. -
Pipes were invented over a lunch break. Doug McIlrath proposed the idea of connecting programs together at Bell Labs in 1964. Ken Thompson liked it so much that he added pipe support to Unix overnight. The
|symbol was chosen because it was rarely used and looked like a physical pipe. -
/dev/nullhas been called “the most useful file in Unix.” Everything written to it vanishes instantly. It has been a part of Unix since 1971. In internet culture, “sending something to /dev/null” means ignoring it completely. -
The
findcommand can replace dozens of GUI clicks. A task like “find all PDF files modified in the last week that are larger than 5 MB” would take minutes of clicking through folder after folder in a graphical file manager. Withfind, it is one line:find . -name "*.pdf" -mtime -7 -size +5M.
Common Mistakes
Section titled “Common Mistakes”| Mistake | Why It Happens | How to Fix It |
|---|---|---|
Using > when you meant >> | You wanted to append to a log file but accidentally overwrote it, destroying all previous entries. | Pause and think: “Do I want to replace or add to?” When in doubt, use >>. You can always clean up a file later, but you cannot un-overwrite one. |
Forgetting quotes around wildcards in find | Writing find . -name *.txt without quotes. If .txt files exist in the current directory, the shell expands *.txt before find runs, causing confusing errors. | Always quote: find . -name "*.txt". This ensures find sees the literal pattern. |
| Piping into commands that ignore stdin | Commands like ls, cd, and rm do not read from stdin. Writing echo "/etc" | ls does nothing useful. | Use xargs to convert stdin into arguments: echo "/etc" | xargs ls. Or use command substitution: ls $(echo "/etc"). |
Searching from / without suppressing errors | Running find / -name "myfile" floods your screen with “Permission denied” errors from directories you cannot access. | Add 2> /dev/null to hide the errors: find / -name "myfile" 2> /dev/null. |
Using grep without -r and wondering why nothing matches | You wrote grep "error" /var/log/ expecting it to search all files in the directory, but grep only searches files you explicitly name. | Add -r for recursive search: grep -r "error" /var/log/. |
Confusing find -name with find -path | You expected find . -name "src/config.yaml" to match a path, but -name only matches the filename portion. | Use -path for matching against the full path: find . -path "*/src/config.yaml". |
Forgetting that grep "pattern" with no file waits for keyboard input | You typed grep "hello" and the terminal seems frozen. It is actually waiting for you to type stdin input. | Press Ctrl+C to cancel. Then either specify a file (grep "hello" file.txt) or pipe input into it (echo "hello world" | grep "hello"). |
Test your understanding. Try to answer before revealing the solution.
1. Scenario: You are cleaning up a directory filled with backup files. You have backup_1.tar, backup_12.tar, and backup_final.tar. You run rm backup_?.tar. Which files are deleted and why?
Only backup_1.tar is deleted in this scenario. This happens because the ? wildcard strictly matches exactly one character, no more and no less. Since 1 is a single character, it perfectly matches the pattern. However, 12 is two characters and final is five characters, meaning backup_12.tar and backup_final.tar are safely ignored. If you had used the * wildcard instead, which matches zero or more characters of any kind, all three files would have been deleted. (Maps to Outcome: Search files)
2. Scenario: You have a script that checks system health every hour. You want to record the output of uptime into /var/log/health.log so you can review the history at the end of the week. Which redirection operator should you use and why?
You should use the append operator >> for this task. When you use >>, the shell opens the target file and adds the new output to the very end of it, preserving all the existing historical data. If you were to use the standard > operator instead, the shell would truncate (empty) the file before writing the new output, meaning you would only ever have the most recent hour’s data. For logging purposes where data must be retained over time, appending is always the correct and safe choice. (Maps to Outcome: Redirect streams)
3. Scenario: You are running find / -name "secret.key" as a regular user. Your screen is instantly flooded with hundreds of "Permission denied" lines, making it impossible to see if the file was actually found. How do you fix this command and why does the fix work?
You fix it by appending 2> /dev/null to the end of your command. This works because it separates the standard error stream (stream 2) from the standard output stream (stream 1). By redirecting stream 2 into /dev/null (the system’s black hole), all the “Permission denied” errors are immediately discarded by the operating system. Meanwhile, stream 1 is completely untouched, meaning if find does successfully locate “secret.key”, that valid result will still print cleanly to your terminal screen. (Maps to Outcome: Redirect streams)
4. Scenario: Your web server's primary disk is at 99% capacity. You suspect some rotated log files in /var/log have grown out of control, specifically files over 50 Megabytes. How do you locate just these massive files?
You would use the command find /var/log -type f -name "*.log" -size +50M. This approach works because find allows you to combine multiple search criteria that must all be evaluated as true. The -type f flag ensures you are only looking at actual files rather than directories. The -name "*.log" flag filters for the correct extension, and crucially, -size +50M restricts the results to files strictly larger than 50 megabytes. This surgical filtering prevents you from wasting time manually checking the size of hundreds of smaller files. (Maps to Outcome: Search files)
5. Scenario: A customer reported a timeout at exactly 14:32. You search app.log for "timeout" and find the error, but it doesn't tell you which user triggered it. You need to see the log lines immediately before the error to find the user ID. What command do you use?
You would use the command grep -B 3 -i "timeout" app.log (or -C 3 for context on both sides). This works because grep does not just match isolated lines; it can retain a buffer of surrounding text. By specifying -B 3 (Before), grep prints the matching line plus the three lines that immediately preceded it in the file. This context is absolutely critical for debugging, as the events leading up to an error (like a user logging in or initiating a transaction) are almost always printed on the lines just above the actual failure message. (Maps to Outcome: Process text streams)
6. Scenario: You inherit a server and see this command in the history: cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10. The previous admin used it during a DDoS attack. What exactly was this command helping them discover?
This command was helping them discover the top 10 IP addresses making the most requests, which is crucial for identifying the primary sources of a DDoS attack. It works by chaining small, focused utilities together through pipes. First, awk extracts just the IP address column, then the first sort groups identical IPs together so that uniq -c can accurately count consecutive duplicates. The second sort -rn orders those newly calculated counts numerically from highest to lowest. Finally, head -10 truncates the massive list to just the top 10 offenders, giving the admin exactly the target list they need to block the malicious traffic. (Maps to Outcome: Chain commands)
7. Scenario: You are in a directory containing script.py, utils.py, and data.csv. You want to find all Python files in the subdirectories. You run find . -name *.py. It returns script.py and utils.py from the current directory, but completely misses src/main.py. Why did this fail?
It failed because you forgot to put quotes around the *.py wildcard pattern. Because there were already .py files in your current working directory, your shell expanded the wildcard before passing it to the find command. The shell actually executed find . -name script.py utils.py, which causes unexpected behavior because find was told to look specifically for those two exact filenames, not a broader pattern. By adding quotes ("*.py"), you protect the asterisk from the shell, forcing find itself to do the expansion dynamically across all subdirectories. (Maps to Outcome: Search files)
8. Scenario: You are running a massive database migration script (./migrate.sh) that takes hours. You want to save all successful progress messages to progress.txt, but you want any critical failures to be saved separately in alerts.txt so you can review them quickly. How do you run the script?
You would run it using the command ./migrate.sh > progress.txt 2> alerts.txt. This solution works because Linux strictly separates standard output (stream 1) from standard error (stream 2) at the kernel level. The > operator is shorthand for 1>, meaning it grabs the normal output and funnels it exclusively into progress.txt. The 2> operator explicitly grabs only the error stream and routes it into alerts.txt. This strict separation ensures that your critical error alerts are not buried in the middle of a massive file full of routine success messages. (Maps to Outcome: Redirect streams)
Hands-On Exercise: The Server Investigation
Section titled “Hands-On Exercise: The Server Investigation”Scenario: You are a junior SRE investigating a reported outage. You have been given a server with multiple log files, and your job is to find the relevant errors, extract them, and create a summary report. You will use everything from this module: wildcards, redirection, pipes, find, and grep.
First, create the practice environment:
# Create the investigation directorymkdir -p ~/investigation/logs
# Create simulated log files with realistic contentcat > ~/investigation/logs/web.log << 'EOF'2024-03-15 08:01:12 INFO Request received: GET /api/users2024-03-15 08:01:13 INFO Response sent: 200 OK2024-03-15 08:02:44 ERROR Connection to database timed out2024-03-15 08:02:45 ERROR Failed to process request: GET /api/orders2024-03-15 08:03:01 WARN Retry attempt 1 for database connection2024-03-15 08:03:02 INFO Database connection restored2024-03-15 08:05:30 INFO Request received: GET /api/products2024-03-15 08:07:15 ERROR OutOfMemoryError: heap space exhausted2024-03-15 08:07:16 ERROR Service crashed - restartingEOF
cat > ~/investigation/logs/db.log << 'EOF'2024-03-15 08:00:00 INFO Database server started2024-03-15 08:02:40 WARN Connection pool exhausted (max: 100)2024-03-15 08:02:42 ERROR Too many connections - rejecting new requests2024-03-15 08:02:43 ERROR Query timeout: SELECT * FROM orders WHERE status='pending'2024-03-15 08:03:00 INFO Connection pool recovered2024-03-15 08:06:00 INFO Routine backup started2024-03-15 08:07:10 ERROR Disk space critical: /var/lib/mysql at 98%EOF
cat > ~/investigation/logs/app.log << 'EOF'2024-03-15 08:00:05 INFO Application started on port 80802024-03-15 08:01:10 INFO Health check passed2024-03-15 08:02:44 ERROR Upstream service unavailable: payment-gateway2024-03-15 08:02:50 WARN Circuit breaker opened for payment-gateway2024-03-15 08:04:00 INFO Circuit breaker closed for payment-gateway2024-03-15 08:07:14 ERROR Memory usage exceeded threshold: 95%2024-03-15 08:07:15 ERROR Initiating graceful shutdown2024-03-15 08:07:16 INFO Shutdown completeEOF
# Create a text file to make find exercises interestingecho "This is a config file" > ~/investigation/config.txtecho "Important notes here" > ~/investigation/notes.txtComplete each task using the tools you learned. Try each one yourself before checking the answer.
Task 1: Use a wildcard to list only the .log files in ~/investigation/logs/.
Solution
ls ~/investigation/logs/*.logTask 2: Use grep to find all lines containing “ERROR” across all log files. Save the results to ~/investigation/all_errors.txt.
Solution
grep "ERROR" ~/investigation/logs/*.log > ~/investigation/all_errors.txtTask 3: Count how many ERROR lines exist in each log file (without showing the lines themselves).
Solution
grep -c "ERROR" ~/investigation/logs/*.logExpected output:
/home/user/investigation/logs/app.log:3/home/user/investigation/logs/db.log:3/home/user/investigation/logs/web.log:4Task 4: Find all errors related to “database” or “memory” (case-insensitive), showing 1 line of context before each match.
Solution
grep -i -B 1 "database\|memory" ~/investigation/logs/*.logOr using two greps with a pipe:
cat ~/investigation/logs/*.log | grep -i -B 1 -E "database|memory"Task 5: Use find to locate all .txt files in the ~/investigation directory.
Solution
find ~/investigation -name "*.txt"Task 6 (The Boss Level): Write a single pipeline that:
- Reads all log files
- Filters only ERROR lines
- Sorts them by timestamp
- Takes the last 5 (most recent errors)
- Saves them to
~/investigation/recent_critical.txt
Solution
cat ~/investigation/logs/*.log | grep "ERROR" | sort | tail -5 > ~/investigation/recent_critical.txtVerify with:
cat ~/investigation/recent_critical.txtYou should see the 5 most recent ERROR entries sorted by timestamp.
Success Criteria
Section titled “Success Criteria”You have completed this exercise when:
-
~/investigation/all_errors.txtexists and contains only ERROR lines from all three log files - You can tell how many errors each log file has (Task 3)
-
~/investigation/recent_critical.txtcontains exactly 5 sorted error lines - You completed each task using commands from this module (no GUI tools)
Next Up: Module 0.2: Environment & Permissions — Learn how Linux knows who you are, how to customize your shell, and how the permission system keeps everything secure.