Module 12.2: Semgrep - Security Rules in Minutes
Complexity: [MEDIUM]
Section titled “Complexity: [MEDIUM]”Time to Complete: 45-50 minutes
Section titled “Time to Complete: 45-50 minutes”Prerequisites
Section titled “Prerequisites”Before starting this module, you should have completed:
- Module 12.1: SonarQube - Code quality fundamentals
- DevSecOps Discipline - Security integration concepts
- Basic regex understanding
- Programming experience in at least one language
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After completing this module, you will be able to:
- Deploy Semgrep and configure custom rules for security-focused static analysis in CI/CD pipelines
- Implement Semgrep patterns for detecting OWASP vulnerabilities and insecure coding practices
- Configure Semgrep’s autofix capabilities to automatically remediate detected code issues
- Compare Semgrep’s pattern-based approach against CodeQL for security scanning speed and coverage trade-offs
Why This Module Matters
Section titled “Why This Module Matters”The Custom Rule Problem
You’ve just discovered a security issue in your codebase: developers are calling dangerouslySetInnerHTML in React without sanitization. You want to prevent this from happening again. You have two options:
Option 1: Traditional SAST Rule
- Learn the SAST tool’s proprietary rule language
- Spend days understanding AST representations
- Write and debug complex rule definitions
- Wait for vendor to ship the update
- Hope it doesn’t break existing rules
Option 2: Semgrep
rules: - id: dangerous-html-without-sanitize pattern: dangerouslySetInnerHTML={{__html: $X}} pattern-not: dangerouslySetInnerHTML={{__html: sanitize($X)}} message: "Use sanitize() before dangerouslySetInnerHTML" severity: ERROR languages: [javascript, typescript]Five minutes. That’s how long it takes to write a Semgrep rule. The pattern syntax looks like code, not regex hell. You can test it locally before committing. And it runs fast enough to block PRs without developers revolting.
Semgrep isn’t trying to be the most sophisticated SAST tool—it’s trying to be the most practical one. Low false positives, fast execution, and rules that humans can actually write and understand.
Semgrep Architecture
Section titled “Semgrep Architecture”┌─────────────────────────────────────────────────────────────────┐│ SEMGREP ARCHITECTURE │├─────────────────────────────────────────────────────────────────┤│ ││ YOUR CODEBASE ││ ┌───────────────────────────────────────────────────────────┐ ││ │ source.py │ app.js │ main.go │ Server.java │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ SEMGREP ENGINE ││ ┌───────────────────────────────────────────────────────────┐ ││ │ │ ││ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ ││ │ │ Parser │ │ Pattern │ │ Matching │ │ ││ │ │ (per lang) │──▶│ Compiler │──▶│ Engine │ │ ││ │ │ │ │ │ │ (semgrep-core) │ │ ││ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ ││ │ │ │ │ ││ │ ▼ ▼ │ ││ │ ┌─────────────┐ ┌─────────────────┐ │ ││ │ │ AST │ │ Findings │ │ ││ │ │ (generic) │ │ (JSON/SARIF) │ │ ││ │ └─────────────┘ └─────────────────┘ │ ││ │ │ ││ └───────────────────────────────────────────────────────────┘ ││ │ ││ RULES │ ││ ┌───────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Community │ Your Custom │ Semgrep Registry │ ││ │ Rulesets │ Rules │ (pro rules) │ ││ │ (p/owasp) │ (.semgrep/) │ (additional coverage) │ ││ │ │ ││ └───────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘How Semgrep Works
Section titled “How Semgrep Works”- Parse: Convert source code to AST (Abstract Syntax Tree)
- Pattern Match: Match your pattern against the AST
- Filter: Apply
pattern-not,pattern-inside, etc. - Report: Output findings in JSON, SARIF, or text
The key insight: Semgrep patterns look like code, but match against the AST. This means foo(1, 2) matches foo(1,2) and foo( 1 , 2 ) because whitespace doesn’t matter in the AST.
Pattern Syntax Deep Dive
Section titled “Pattern Syntax Deep Dive”Basic Patterns
Section titled “Basic Patterns”# Literal matchpattern: print("debug")
# Metavariable (captures any expression)pattern: print($X)# Matches: print("hello"), print(user.name), print(1+2)
# Ellipsis (matches zero or more arguments)pattern: print(...)# Matches: print(), print("a"), print("a", "b", "c")
# Typed metavariable (Python 3 type hints)pattern: def $FUNC(...) -> str: ...# Matches functions returning strMetavariables in Action
Section titled “Metavariables in Action”rules: - id: hardcoded-password patterns: - pattern: $VAR = "..." - metavariable-regex: metavariable: $VAR regex: (?i)(password|passwd|pwd|secret|token|api_key) message: "Hardcoded credential in $VAR" severity: ERROR languages: [python]# Would match:password = "hunter2"API_KEY = "sk-1234567890"db_passwd = "supersecret"
# Would NOT match:username = "admin"description = "password reset flow"Combining Patterns
Section titled “Combining Patterns”rules: - id: sql-injection patterns: # Must match this - pattern-either: - pattern: cursor.execute($QUERY) - pattern: db.query($QUERY) # AND must have string concat in query - pattern-inside: | $QUERY = ... + $USER_INPUT + ... ... # But NOT if parameterized - pattern-not-inside: | $QUERY = "... %s ..." ... cursor.execute($QUERY, ...) message: "SQL injection via string concatenation" severity: ERROR languages: [python]Pattern Operators Reference
Section titled “Pattern Operators Reference”| Operator | Purpose | Example |
|---|---|---|
pattern | Match this pattern | pattern: eval($X) |
pattern-not | Don’t match this | pattern-not: eval("safe") |
pattern-either | Match any of these | Multiple patterns, OR logic |
pattern-inside | Match inside this context | Function or class scope |
pattern-not-inside | Don’t match in this context | Exclude test files |
pattern-regex | Regex on source | For when AST isn’t enough |
metavariable-regex | Regex on captured var | Filter by variable name |
metavariable-pattern | Pattern on captured var | Nested matching |
metavariable-comparison | Compare captured values | $X < 10 |
focus-metavariable | Report only this part | Precise error location |
Real-World Rules
Section titled “Real-World Rules”Preventing JWT Without Verification
Section titled “Preventing JWT Without Verification”rules: - id: jwt-decode-without-verify patterns: - pattern-either: - pattern: jwt.decode($TOKEN, ...) - pattern: jwt.decode($TOKEN) - pattern-not: jwt.decode($TOKEN, $KEY, ...) - pattern-not: jwt.decode($TOKEN, options={..., verify=True, ...}) message: | JWT decoded without signature verification. Use jwt.decode(token, key, algorithms=['HS256']) instead. severity: ERROR languages: [python] metadata: cwe: "CWE-347: Improper Verification of Cryptographic Signature" owasp: "A02:2021 - Cryptographic Failures"Detecting Insecure Deserialization
Section titled “Detecting Insecure Deserialization”rules: - id: pickle-load-untrusted patterns: - pattern-either: - pattern: pickle.load($X) - pattern: pickle.loads($X) - pattern-not-inside: | # Safe: loading from trusted source with open("$TRUSTED_FILE", ...) as $F: ... pickle.load($F) message: | pickle.load() can execute arbitrary code. Never unpickle untrusted data. Use JSON or a safe serialization format instead. severity: ERROR languages: [python] metadata: cwe: "CWE-502: Deserialization of Untrusted Data" references: - https://docs.python.org/3/library/pickle.html#module-pickleReact XSS Prevention
Section titled “React XSS Prevention”rules: - id: react-dangerously-set-html patterns: - pattern: dangerouslySetInnerHTML={{__html: $CONTENT}} - pattern-not: dangerouslySetInnerHTML={{__html: DOMPurify.sanitize($CONTENT)}} - pattern-not: dangerouslySetInnerHTML={{__html: sanitizeHtml($CONTENT, ...)}} message: | dangerouslySetInnerHTML without sanitization may cause XSS. Use DOMPurify.sanitize() or similar. severity: ERROR languages: [javascript, typescript, jsx, tsx] metadata: cwe: "CWE-79: Cross-site Scripting (XSS)"Go Error Handling
Section titled “Go Error Handling”rules: - id: go-error-not-handled patterns: - pattern: $RET, $ERR := $FUNC(...) - pattern-not-inside: | $RET, $ERR := $FUNC(...) ... if $ERR != nil { ... } message: "Error returned by $FUNC is not checked" severity: WARNING languages: [go]Kubernetes Security
Section titled “Kubernetes Security”rules: - id: k8s-privileged-container patterns: - pattern-inside: | spec: ... containers: ... - pattern: | securityContext: ... privileged: true message: "Container running in privileged mode" severity: ERROR languages: [yaml] paths: include: - "*.yaml" - "*.yml"Using Semgrep CLI
Section titled “Using Semgrep CLI”Installation
Section titled “Installation”# macOSbrew install semgrep
# Linux/Windows (pip)pip install semgrep
# Dockerdocker run --rm -v "$(pwd):/src" returntocorp/semgrep
# Verify installationsemgrep --versionRunning Scans
Section titled “Running Scans”# Run community rulessemgrep --config=auto .
# Run specific rulesetssemgrep --config=p/owasp-top-ten .semgrep --config=p/security-audit .semgrep --config=p/secrets .
# Run your custom rulessemgrep --config=.semgrep/ .
# Combine rulesetssemgrep --config=p/python --config=.semgrep/ .
# Output formatssemgrep --config=auto --json -o results.json .semgrep --config=auto --sarif -o results.sarif .
# Verbose for debuggingsemgrep --config=rule.yaml --verbose .
# Test rulessemgrep --test --config=.semgrep/Filtering Results
Section titled “Filtering Results”# By severitysemgrep --config=auto --severity ERROR .
# Exclude pathssemgrep --config=auto --exclude tests/ --exclude vendor/ .
# Include only certain pathssemgrep --config=auto --include "*.py" .
# Baseline (only new findings)semgrep --config=auto --baseline-commit HEAD~1 .CI/CD Integration
Section titled “CI/CD Integration”GitHub Actions
Section titled “GitHub Actions”name: Semgrepon: push: branches: [main] pull_request: branches: [main]
jobs: semgrep: runs-on: ubuntu-latest container: image: returntocorp/semgrep steps: - uses: actions/checkout@v4
- name: Run Semgrep run: | semgrep ci \ --config=p/security-audit \ --config=p/secrets \ --config=.semgrep/ \ --sarif --output=semgrep.sarif
- name: Upload SARIF uses: github/codeql-action/upload-sarif@v3 with: sarif_file: semgrep.sarif if: always()GitLab CI
Section titled “GitLab CI”semgrep: stage: test image: returntocorp/semgrep script: - semgrep ci --config=p/security-audit --config=.semgrep/ --gitlab-sast > gl-sast-report.json artifacts: reports: sast: gl-sast-report.json rules: - if: $CI_MERGE_REQUEST_IID - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCHPre-commit Hook
Section titled “Pre-commit Hook”repos: - repo: https://github.com/returntocorp/semgrep rev: v1.45.0 hooks: - id: semgrep args: ['--config', 'p/secrets', '--config', '.semgrep/', '--error']Jenkins Pipeline
Section titled “Jenkins Pipeline”pipeline { agent { docker { image 'returntocorp/semgrep' } } stages { stage('Security Scan') { steps { sh ''' semgrep ci \ --config=p/security-audit \ --config=.semgrep/ \ --junit-xml --output=semgrep-results.xml ''' junit 'semgrep-results.xml' } } }}Writing Custom Rules
Section titled “Writing Custom Rules”Rule Structure
Section titled “Rule Structure”rules: - id: unique-rule-identifier # What to match pattern: dangerous_function($X) # Or more complex matching patterns: - pattern: ... - pattern-not: ...
# Metadata message: "Human readable message with $X interpolation" severity: ERROR # ERROR, WARNING, INFO languages: [python, javascript] # Target languages
# Optional metadata metadata: cwe: "CWE-XXX" owasp: "A0X:2021" category: security technology: - flask references: - https://example.com/security-advisory
# File filtering paths: include: - "src/**/*.py" exclude: - "**/test/**" - "**/*_test.py"
# Fix suggestion (autofix) fix: safe_function($X)Testing Rules
Section titled “Testing Rules”# test_rule.py - Semgrep test file format
# ruleid: dangerous-functiondangerous_function(user_input)
# ok: dangerous-functionsafe_function(user_input)
# todoruleid: dangerous-function# This should match but doesn't yetedge_case(user_input)# Run testssemgrep --test --config=rules/Example: Building a Complete Ruleset
Section titled “Example: Building a Complete Ruleset”rules: # Flask Debug Mode - id: flask-debug-mode pattern: app.run(..., debug=True, ...) message: "Flask debug mode enabled. Disable in production." severity: ERROR languages: [python] metadata: category: security cwe: "CWE-489: Active Debug Code"
# Flask Secret Key Hardcoded - id: flask-hardcoded-secret patterns: - pattern: app.secret_key = "..." - pattern: app.config["SECRET_KEY"] = "..." message: "Hardcoded Flask secret key. Use environment variable." severity: ERROR languages: [python] fix: app.secret_key = os.environ.get("SECRET_KEY")
# Flask SQL Injection - id: flask-sql-injection patterns: - pattern-either: - pattern: db.execute($QUERY.format(...)) - pattern: db.execute(f"...$X...") - pattern: db.execute("..." + $X + "...") message: "SQL injection via string formatting. Use parameterized queries." severity: ERROR languages: [python] metadata: cwe: "CWE-89: SQL Injection"
# Missing CSRF Protection - id: flask-missing-csrf patterns: - pattern: | @app.route(..., methods=[..., "POST", ...]) def $FUNC(...): ... - pattern-not-inside: | csrf = CSRFProtect(...) ... message: "POST route without CSRF protection" severity: WARNING languages: [python]Semgrep vs Other Tools
Section titled “Semgrep vs Other Tools”COMPARISON MATRIX─────────────────────────────────────────────────────────────────
Semgrep CodeQL SonarQube─────────────────────────────────────────────────────────────────Rule Language Pattern QL (SQL-like) Java/CustomLearning Curve Low High MediumCustom Rules Easy Complex DifficultSpeed Fast Slow MediumFalse Positives Low Low MediumLanguages 30+ 15+ 30+Self-hosted Free Free LimitedCI Integration Excellent Good GoodDataflow Analysis Pro tier Excellent BasicInterfile Analysis Pro tier Excellent Yes
─────────────────────────────────────────────────────────────────
WHEN TO USE EACH:
Semgrep:• Custom rules for your specific patterns• Fast feedback in CI/CD• Security team writing rules• Low false positive tolerance
CodeQL:• Deep vulnerability research• Complex dataflow analysis• GitHub-centric workflows• Time/complexity not a concern
SonarQube:• Code quality + security together• Technical debt tracking needed• Quality gates with coverage• Enterprise compliance reportingWar Story: The Framework Migration
Section titled “War Story: The Framework Migration”Company: Payment processor, 80 engineers Challenge: Migrating from deprecated crypto library to new standard
The Situation:
The security team discovered that the old-crypto library had known vulnerabilities. 500+ files used it across 12 services. Manual migration would take months and be error-prone.
The Semgrep Approach:
# Phase 1: Find all usagesrules: - id: old-crypto-usage pattern-either: - pattern: import old_crypto - pattern: from old_crypto import ... - pattern: old_crypto.$FUNC(...) message: "Legacy crypto library usage. Migrate to new_crypto." severity: WARNING languages: [python]# Baseline scan$ semgrep --config=migration.yaml --json -o baseline.json .Found: 847 usages across 523 files# Phase 2: Add autofix rulesrules: - id: migrate-encrypt pattern: old_crypto.encrypt($DATA, $KEY) fix: new_crypto.encrypt($DATA, key=$KEY, algorithm="AES-256-GCM") message: "Migrate to new_crypto.encrypt()" severity: WARNING languages: [python]
- id: migrate-hash pattern: old_crypto.hash($DATA) fix: new_crypto.hash($DATA, algorithm="SHA-256") message: "Migrate to new_crypto.hash()" severity: WARNING languages: [python]# Apply autofixes$ semgrep --config=migration.yaml --autofix .Applied 623 fixes automatically# Phase 3: Block new usagesrules: - id: block-old-crypto pattern-either: - pattern: import old_crypto - pattern: from old_crypto import ... message: | BLOCKED: old_crypto is deprecated. Use new_crypto instead. See migration guide: wiki/crypto-migration severity: ERROR languages: [python]Results:
| Phase | Timeline | Effort |
|---|---|---|
| Find all usages | 30 minutes | Automated |
| Auto-fix 75% of usages | 2 hours | Review only |
| Manual fix remaining 25% | 2 weeks | Edge cases |
| Block new usages | Permanent | Zero effort |
Total Time: 2 weeks instead of estimated 3 months
Key Lessons:
- Start with WARNING, graduate to ERROR: Let developers fix before blocking
- Autofix where possible: Review is faster than rewrite
- Leave blocking rules permanently: Prevent regression
- Track progress with baseline: Show weekly improvement to management
Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Too specific patterns | Misses variations | Use metavariables, ellipsis |
| Too broad patterns | False positives | Add pattern-not exclusions |
| No testing | Rules break silently | Use semgrep —test |
| Ignoring metadata | Poor findings context | Add CWE, OWASP, references |
| Running all rules | Slow CI, noise | Curate rulesets for your stack |
| No baseline | Old findings block PRs | Use —baseline-commit |
| No autofix | Developers ignore findings | Add fix when pattern is clear |
| Single pattern only | Can’t handle context | Combine with pattern-inside |
1. What is the difference between $X and ... in Semgrep patterns?
Answer:
-
$X(metavariable): Captures a single expression and can be referenced elsewhere in the rule. Use to match specific values and reference them in the message or fix. -
...(ellipsis): Matches zero or more arguments, statements, or fields. Use when you don’t care about the specific content.
Examples:
# $X captures the argumentpattern: eval($X)message: "eval called with $X" # $X is interpolated
# ... matches any argumentspattern: print(...) # Matches print(), print(a), print(a, b, c)2. How do pattern-inside and pattern-not-inside work?
Answer: They provide context for where patterns should (or shouldn’t) match:
patterns: # Match eval() - pattern: eval($X) # Only inside request handlers - pattern-inside: | @app.route(...) def $FUNC(...): ... # But not inside admin routes - pattern-not-inside: | @app.route("/admin/...") def $FUNC(...): ...The primary pattern must match, AND it must be inside pattern-inside, AND it must NOT be inside pattern-not-inside.
3. What is metavariable-regex used for?
Answer: metavariable-regex applies a regex filter to a captured metavariable:
patterns: - pattern: $VAR = "..." - metavariable-regex: metavariable: $VAR regex: (?i)(password|secret|token|api_key)This matches variable assignments where the variable name looks like a credential. Without metavariable-regex, you’d match every string assignment.
Use cases:
- Variable naming conventions
- String content patterns
- Filtering by function names
4. How does Semgrep's autofix feature work?
Answer: The fix field provides a replacement pattern:
rules: - id: use-pathlib pattern: os.path.join($A, $B) fix: Path($A) / $B message: "Use pathlib instead of os.path.join"Running semgrep --autofix:
- Finds all matches
- Applies the fix pattern
- Interpolates captured metavariables
- Rewrites the file
Limitations:
- Must preserve captured metavariables exactly
- Complex fixes may need manual intervention
- Always review changes before committing
5. What are Semgrep's community rulesets and how do you use them?
Answer: Community rulesets are curated rule collections:
# Popular rulesetssemgrep --config=p/owasp-top-ten . # OWASP vulnerabilitiessemgrep --config=p/security-audit . # General securitysemgrep --config=p/secrets . # Hardcoded secretssemgrep --config=p/python . # Python-specificsemgrep --config=p/javascript . # JavaScript/TypeScriptsemgrep --config=p/ci . # CI/CD misconfigurations
# Auto (detects languages, picks relevant rules)semgrep --config=auto .
# Combine multiplesemgrep --config=p/security-audit --config=p/secrets --config=.semgrep/ .Registry at: https://semgrep.dev/explore
6. How do you test Semgrep rules?
Answer: Semgrep has built-in test support:
# ruleid: sql-injectionquery = "SELECT * FROM users WHERE id = " + user_idcursor.execute(query)
# ok: sql-injectioncursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
# todoruleid: sql-injection# Known false negative - tracking for future fix# Run tests (looks for test files matching rules)semgrep --test --config=rules/
# Test specific rulesemgrep --test --config=rules/sql-injection.yamlAnnotations:
# ruleid: <id>- Should match# ok: <id>- Should NOT match# todoruleid: <id>- Known missing match (future work)
7. What is the difference between Semgrep OSS and Semgrep Pro?
Answer:
| Feature | OSS (Free) | Pro (Paid) |
|---|---|---|
| Pattern matching | ✓ | ✓ |
| 2000+ community rules | ✓ | ✓ |
| Custom rules | ✓ | ✓ |
| CLI & CI | ✓ | ✓ |
| Dataflow analysis | ✗ | ✓ |
| Cross-file analysis | ✗ | ✓ |
| Pro rules (deeper coverage) | ✗ | ✓ |
| Dashboard & SBOM | ✗ | ✓ |
| SSO & Teams | ✗ | ✓ |
For most teams, OSS is sufficient. Pro is valuable for:
- Finding vulnerabilities that span multiple files
- Taint tracking (source → sink analysis)
- Enterprise compliance requirements
8. How do you reduce false positives in Semgrep rules?
Answer: Several techniques:
- pattern-not exclusions:
patterns: - pattern: eval($X) - pattern-not: eval("literal") # Exclude known safe- pattern-not-inside context:
patterns: - pattern: $FUNC(...) - pattern-not-inside: | def test_$NAME(...): # Exclude test files ...- Metavariable filtering:
patterns: - pattern: $VAR = "..." - metavariable-regex: metavariable: $VAR regex: ^(password|secret)$ # Only specific names- Path exclusions:
paths: exclude: - "**/test/**" - "**/vendor/**"- Start with WARNING: Tune before making ERROR
Hands-On Exercise
Section titled “Hands-On Exercise”Objective: Write custom Semgrep rules to secure a Python Flask application.
Part 1: Setup
Section titled “Part 1: Setup”# Create project directorymkdir semgrep-lab && cd semgrep-lab
# Create vulnerable Flask appcat > app.py << 'EOF'from flask import Flask, request, render_template_stringimport subprocessimport pickleimport hashlib
app = Flask(__name__)app.secret_key = "supersecret123" # Hardcoded secret
@app.route("/search")def search(): query = request.args.get("q") # SQL Injection results = db.execute(f"SELECT * FROM items WHERE name LIKE '%{query}%'") return results
@app.route("/run")def run_command(): cmd = request.args.get("cmd") # Command Injection output = subprocess.check_output(cmd, shell=True) return output
@app.route("/template")def template(): name = request.args.get("name") # SSTI vulnerability return render_template_string(f"Hello {name}!")
@app.route("/load")def load_data(): data = request.get_data() # Insecure deserialization return pickle.loads(data)
@app.route("/hash")def hash_password(): password = request.args.get("pw") # Weak hash return hashlib.md5(password.encode()).hexdigest()
if __name__ == "__main__": app.run(debug=True) # Debug in productionEOFPart 2: Run Community Rules
Section titled “Part 2: Run Community Rules”# Install Semgreppip install semgrep
# Run security auditsemgrep --config=p/security-audit --config=p/python app.py
# How many findings?semgrep --config=p/security-audit --config=p/python app.py --json | jq '.results | length'Part 3: Write Custom Rules
Section titled “Part 3: Write Custom Rules”# Create rules directorymkdir -p .semgrep
# Create custom rulesetcat > .semgrep/flask-security.yaml << 'EOF'rules: # Rule 1: Hardcoded Flask secret key - id: flask-hardcoded-secret patterns: - pattern-either: - pattern: app.secret_key = "..." - pattern: app.config["SECRET_KEY"] = "..." message: | Hardcoded Flask secret key detected. Use environment variable: app.secret_key = os.environ.get("SECRET_KEY") severity: ERROR languages: [python] fix: app.secret_key = os.environ.get("SECRET_KEY") metadata: cwe: "CWE-798: Use of Hard-coded Credentials"
# Rule 2: Flask debug mode - id: flask-debug-enabled pattern: app.run(..., debug=True, ...) message: "Flask debug mode enabled. Disable in production." severity: ERROR languages: [python] fix: app.run(debug=False) metadata: cwe: "CWE-489: Active Debug Code"
# Rule 3: render_template_string with user input - id: flask-ssti patterns: - pattern: render_template_string($TEMPLATE) - pattern-inside: | $VAR = request.$METHOD.get(...) ... message: | Server-Side Template Injection (SSTI) via render_template_string. Use render_template() with a file instead. severity: ERROR languages: [python] metadata: cwe: "CWE-94: Code Injection"
# Rule 4: Weak hashing (MD5/SHA1) - id: weak-hash-algorithm patterns: - pattern-either: - pattern: hashlib.md5(...) - pattern: hashlib.sha1(...) message: | Weak hash algorithm ($1). Use SHA-256 or better: hashlib.sha256($X).hexdigest() severity: WARNING languages: [python] metadata: cwe: "CWE-328: Reversible One-Way Hash"EOFPart 4: Test Your Rules
Section titled “Part 4: Test Your Rules”# Create test filecat > .semgrep/test_flask_security.py << 'EOF'from flask import Flask, render_template_string, requestimport hashlibimport os
app = Flask(__name__)
# ruleid: flask-hardcoded-secretapp.secret_key = "hardcoded"
# ok: flask-hardcoded-secretapp.secret_key = os.environ.get("SECRET_KEY")
# ruleid: flask-debug-enabledapp.run(debug=True)
# ok: flask-debug-enabledapp.run(debug=False)
# ruleid: weak-hash-algorithmhashlib.md5(b"test")
# ok: weak-hash-algorithmhashlib.sha256(b"test")EOF
# Run testssemgrep --test --config=.semgrep/Part 5: Full Scan with Custom + Community Rules
Section titled “Part 5: Full Scan with Custom + Community Rules”# Combined scansemgrep --config=p/python --config=.semgrep/ app.py
# Generate reportsemgrep --config=p/python --config=.semgrep/ app.py --sarif -o report.sarif
# Apply autofixessemgrep --config=.semgrep/ app.py --autofix --dryrun # Previewsemgrep --config=.semgrep/ app.py --autofix # ApplySuccess Criteria
Section titled “Success Criteria”- Community rules find SQL injection, command injection
- Custom rule catches hardcoded secret_key
- Custom rule catches debug=True
- Custom rule catches SSTI vulnerability
- Custom rule catches MD5 usage
- All tests pass with
semgrep --test - Autofix correctly replaces secret_key
Key Takeaways
Section titled “Key Takeaways”- Patterns look like code — Not regex, not a query language—actual code
- Metavariables capture —
$Xmatches any expression and can be referenced - Combine patterns —
pattern-inside,pattern-notreduce false positives - Test your rules — Built-in testing with
# ruleid:annotations - Autofix accelerates adoption — Developers fix faster when fix is provided
- Start with community rules — 2000+ rules ready to use
- Layer custom on top — Your specific patterns + community baseline
- Run in CI but also pre-commit — Catch before push, block before merge
- Baseline for legacy code — Don’t block PRs on old findings
- Iterate quickly — 5-minute rule writing means rapid security coverage
Did You Know?
Section titled “Did You Know?”Name Origin: Semgrep stands for “semantic grep”—grep that understands code structure, not just text. The first version was called “sgrep” before the rename.
Speed Secret: Semgrep compiles patterns into a specialized matching engine written in OCaml (semgrep-core). It’s designed to be fast enough for pre-commit hooks.
Return Path: Semgrep was created by r2c (now Semgrep Inc.), founded by former members of Facebook’s static analysis team who built Infer and Zoncolan.
Community Growth: The Semgrep Registry has grown from 500 rules in 2020 to over 3000+ rules in 2024, covering security, correctness, and best practices.
Next Module
Section titled “Next Module”Continue to Module 12.3: CodeQL to learn about semantic code analysis with GitHub’s powerful query language for finding complex vulnerabilities.