Why Semgrep?

Semgrep is a fast, open-source static analysis tool. It finds bugs and security issues using pattern matching. Think "grep for code structure."

Quick Start

Installation

bash

# macOS brew install semgrep # pip (any platform) pip install semgrep

# Docker docker pull returntocorp/semgrep

First Scan

bash

# Scan with default security rules semgrep scan --config auto

# Scan specific directory semgrep scan --config auto ./src

What "auto" Config Includes

Security audit rules (OWASP Top 10)
Secret detection
Best practices for your languages

Configuration Options

Using Registry Rules

bash

# Security-focused rules semgrep --config p/security-audit # Secret detection semgrep --config p/secrets # Language-specific semgrep --config p/javascript semgrep --config p/typescript semgrep --config p/react

# Multiple configs semgrep --config p/security-audit --config p/secrets

Popular Rule Packs

Pack	Focus

p/security-audit	General security

p/secrets	Hardcoded secrets

p/owasp-top-ten	OWASP vulnerabilities

p/xss	Cross-site scripting

p/sql-injection	SQL injection

p/jwt	JWT security issues

GitHub Actions Integration

Basic Workflow

yaml

# .github/workflows/semgrep.yml name: Semgrep Security Scan on: push: branches: [main] pull_request: branches: [main] jobs: semgrep: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

- name: Run Semgrep uses: returntocorp/semgrep-action@v1 with: config: >- p/security-audit p/secrets

Block PRs with Findings

yaml

name: Semgrep Security Gate on: pull_request: branches: [main] jobs: semgrep: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run Semgrep uses: returntocorp/semgrep-action@v1 with: config: p/security-audit p/secrets generateSarif: true

- name: Upload SARIF uses: github/codeql-action/upload-sarif@v3 with: sarif_file: semgrep.sarif if: always()

Configuring Required Status Check

Go to repository Settings
Branches → Add rule for main
Enable "Require status checks"
Select "semgrep" job

Custom Rules

Basic Rule Structure

yaml

# .semgrep/custom-rules.yml
rules:
  - id: no-console-log
    pattern: console.log(...)
    message: "Remove console.log before production"
    severity: WARNING
    languages: [javascript, typescript]  - id: no-any-type
    pattern: |
      function $FUNC(...): any { ... }
    message: "Avoid using 'any' return type"
    severity: INFO
    languages: [typescript]

Pattern Syntax

yaml

# Exact match
pattern: console.log(...)
# Match any function call
pattern: $FUNC(...)
# Match with metavariable
pattern: fetch($URL)
# Match any statement
pattern: ...# Match specific structure
pattern: |
  if ($COND) {
    $BODY
  }

AI Code Security Rules

yaml

# .semgrep/ai-security.yml
rules:
  - id: sql-template-literal
    patterns:
      - pattern-either:
          - pattern: $DB.query(...${$VAR}...)
          - pattern: $DB.execute(...${$VAR}...)
    message: |
      SQL injection risk: Template literal with variable interpolation.
      Use parameterized queries instead.
    severity: ERROR
    languages: [javascript, typescript]
  - id: dangerous-innerhtml
    pattern: dangerouslySetInnerHTML={{__html: $VAR}}
    message: |
      XSS risk: dangerouslySetInnerHTML with variable.
      Sanitize content with DOMPurify or avoid innerHTML.
    severity: WARNING
    languages: [javascript, typescript]
  - id: hardcoded-stripe-key
    pattern-regex: sk_(live|test)_[A-Za-z0-9]{24,}
    message: "Hardcoded Stripe key detected. Use environment variables."
    severity: ERROR
    languages: [generic]  - id: missing-auth-check
    patterns:
      - pattern: |
          export async function $METHOD(request: Request) {
            ...
            $DB.$QUERY(...)
            ...
          }
      - pattern-not: |
          export async function $METHOD(request: Request) {
            ...
            getServerSession(...)
            ...
          }
    message: "API route accesses database without session check"
    severity: WARNING
    languages: [typescript]

Using Custom Rules

bash

# Local file semgrep --config .semgrep/custom-rules.yml

# Combined with registry semgrep --config p/security-audit --config .semgrep/custom-rules.yml

Ignoring False Positives

Inline Comments

javascript

// nosemgrep: sql-injection
const query = SELECT * FROM logs WHERE level = '${level}'
// This is safe because 'level' is an enum validated elsewhere

Semgrep Ignore File

yaml

# .semgrepignore # Ignore test files tests/ **/*.test.ts **/*.spec.ts # Ignore specific files src/legacy/old-code.js

# Ignore generated files generated/ *.generated.ts

Rule-Specific Ignores

yaml

# In custom rule
rules:
  - id: my-rule
    pattern: ...
    paths:
      exclude:
        - tests/*
        - "*.test.ts"

Output Formats

bash

# Default (human-readable)
semgrep --config auto
# JSON (for processing)
semgrep --config auto --json > results.json
# SARIF (for GitHub)
semgrep --config auto --sarif > results.sarif# JUnit (for CI systems)
semgrep --config auto --junit-xml > results.xml

Performance Optimization

Scan Specific Files

bash

# Only changed files (in CI)
git diff --name-only HEAD~1 | xargs semgrep --config auto

Parallel Scanning

bash

# Use multiple cores
semgrep --config auto --jobs 4

Caching

yaml

# GitHub Actions with caching
name: Cache Semgrep
  uses: actions/cache@v3
  with:
    path: ~/.semgrep
    key: semgrep-${{ runner.os }}

Complete Setup Example

Project Structure

my-project/
├── .github/
│   └── workflows/
│       └── semgrep.yml
├── .semgrep/
│   ├── custom-rules.yml
│   └── ai-security.yml
├── .semgrepignore
├── src/
│   └── ...
└── package.json

Workflow File

yaml

# .github/workflows/semgrep.yml name: Security Scan on: push: branches: [main, develop] pull_request: branches: [main] jobs: semgrep: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Run Semgrep uses: returntocorp/semgrep-action@v1 with: config: >- p/security-audit p/secrets .semgrep/custom-rules.yml .semgrep/ai-security.yml

- name: Upload Results uses: github/codeql-action/upload-sarif@v3 with: sarif_file: semgrep.sarif if: always()

Troubleshooting

"No rules found"

Check config path:

bash

semgrep --config ./correct/path/to/rules.yml

"Pattern parse error"

YAML indentation matters:

yaml

# Wrong
pattern: function $F(...) { ... }# Right
pattern: |
  function $F(...) {
    ...
  }

"Too many findings"

Start with specific rules:

bash

# Instead of auto semgrep --config p/sql-injection

# Then gradually add semgrep --config p/sql-injection --config p/secrets

The Bottom Line

Semgrep gives you control over security scanning. Start with registry rules, add custom rules for your patterns, and integrate into CI/CD.

Automated scanning catches what manual review misses. Every push, every PR.

Setting Up Semgrep for Continuous Security Scanning: A Developer's Guide

Why Semgrep?

Quick Start

Installation

First Scan

What "auto" Config Includes

Configuration Options

Using Registry Rules

Popular Rule Packs

GitHub Actions Integration

Basic Workflow

Block PRs with Findings

Configuring Required Status Check

Custom Rules

Basic Rule Structure

Pattern Syntax

AI Code Security Rules

Using Custom Rules

Ignoring False Positives

Inline Comments

Semgrep Ignore File

Rule-Specific Ignores

Output Formats

Performance Optimization

Scan Specific Files

Parallel Scanning

Caching

Complete Setup Example

Project Structure

Workflow File

Troubleshooting

"No rules found"

"Pattern parse error"

"Too many findings"

The Bottom Line

Ready to secure your AI-generated code?

Continue Reading

How to Scan Your GitHub Repository for Security Vulnerabilities in 5 Minutes

Tutorial: Fixing Your First Security Vulnerability (With Code Examples)