semhound
Motivation
Security teams often need the same answer across many repositories: “Where else does this pattern show up?” That might be a bug-bounty SQLi variant, a zero-day in a dependency, or a custom policy you encode as Semgrep rules. Running Semgrep repo-by-repo means scripting discovery, cloning, execution, and reporting yourself.
semhound automates that loop at GitHub org (or user) scale: you supply the rules, it handles discovery (gh repo list), parallel shallow clones over SSH, scanning, and a single report per target with GitHub permalinks. If you want help separating noise from signal, optional AI triage adds a confidence score and a true-positive verdict per finding.
The mental model is simple: tools like TruffleHog or Gitleaks are built for secrets; semhound is for any Semgrep pattern you define, swept across every repo you can access—like a hound for Semgrep findings.
What it does
- Discover — Lists repositories for each org or username you pass (inline or via
--orgs-file). - Clone — Shallow clone (
--depth 1) with a blob size cap aligned to Semgrep’s default so large binaries are skipped. - Scan — Runs your rules from a local
--rules-dir, remote--rules-url, or both. - Report — Writes
<target>_scan.csvand optional SARIF (--sarif).
semhound is aimed at targeted, on-demand investigations (tight rule sets, specific events), not continuous full-org scanning with huge rule packs.
Install and docs
- PyPI:
https://pypi.org/project/semhound/ —
pipx install semhoundis the recommended install path. - Source and full README:
https://github.com/salecharohit/semhound — prerequisites (
gh,git,semgrep, SSH to GitHub), usage examples, AI provider configuration (ai.config.example), output column reference, and FAQ.
If you use private repositories, you need gh auth login plus an SSH key registered with GitHub for cloning.
Licence
Open source under the MIT licence; see the repository for details.