Semgrep

Static Application Security Testing or SAST is a testing methodology to analyse application source code and to identify security vulnerabilities.

Many compliance frameworks require organizations to perform security testing prior to pushing code into production environments.

Semgrep logo

Semgrep is a free open-source static code analysis tool developed by r2c and the open-source community. It has stable support for Go, Java, JavaScript, JSON, Python, and Ruby and experimental support for many other languages.

To complement the native semgrep tool, r2c also provides a continuous integration service (called Semgrep CI) and maintains a rule library (called Semgrep Registry). Basic individual use of these services are offered for free while paid tiers cover team and commercial use-cases.

It can be a bit confusing at the beginning but there are three ways of running semgrep:

Semgrep CLI is the command line interface for semgrep and can be easily installed via brew or pip and executed in a source code directory with semgrep --config=auto
Semgrep App is esentially Semgrep CLI but can be integrated in GitHub or GiLab and now the scans are configurable in repositories via semgrep.yml. After setting everything up, Semgrep can be executed daily or weekly or on a PR (pull request) or MR (merge request).
Semgrep CI is the approach where Semgrep is integrated in you CI (continuous integration) environment. When connecting it to Semgrep App you can now do things like bulk triage for false positives, notifications via Slack and email and PR/MR review comments.

We suggest to start with Semgrep CLI or a plugin for your preferred IDE.

Rules

Rules are at the core of a software like semgrep. These instructions describe patterns in the source code. A code finding is when the mentioned rules detect a match in the code. Semgrep matches the code using rules to report a finding.

Semgrep support developers by catching issues of security, performance, correctness, and enforce best practices. Rules are either created by the community or can be customly defined.

Finding Noise

Static analysis has a reputation for being “noisy”, reporting hundreds or even thousands of rule violations just when you thought you were ready to release. Fortunately, semgrep provides strategies for dealing with those false positives.

States

Semgrep CI can track the lifetime of an individual finding in a 4-tuple: (rule ID, file path, syntactic context, index). These are hashed and returned as the uniqye identifier of a finding aka the syntactic_id.

Semgrep App provides the most flexibility in managing findings. Here, the finding can move between states according to the unique syntactic_id from Semgrep CI. The four states are:

OPEN: the finding exists in the code
FIXED: the finding existed in the code but is no longer found
MUTED: the finding has been ignored
REMOVED: the finding’s rule isn’t enabled anymore

Shift Left

Traditionally, the application testing is executed before code is compiled into an artifact and pushed into production (testing/staging). If code doesn’t meet the quality gates, the test fails and the deployment is stopped. This gatekeeper approach causes significant bottlenecks in the development lifecycle and isn’t compatible with agile methodologies, which emphasize early feedback and development velocity.

Shifting the static application security testing to the left enables developers to identify and fix issues much earlier in the development lifecycle. This make the software development more efficient, improves quality, and enables faster deployments.

Semgrep can be integrated in most editors like Visual Studio or IntelliJ. By doing this developers get direct feedback from semgrep while they write code.