SAST: Joern 101
Introduction
This blog post will explore the workings of Joern, a static analysis tool. We will first understand the basics of program analysis before we get into how we can leverage Joern for static analysis.
What is AST, CFG, PDG and CPG?
Abstract Syntax Tree (AST):
An AST is a hierarchical tree representation of the source code. Each node represents the statements/declarations/expressions, and an edge represents the relationship between a parent and child node.
Control Flow Graph (CFG):
The Control Flow graph represents the order in which the code statements are executed and the conditions that must be met for a particular execution path. Each node in the graph is a statement, and the edges are the paths the program can traverse.
Program Dependence Graph (PDG):
A PDG has nodes and edges, where each node is a code statement. There are two types of edges: data dependency edge and control dependence edge. In a data dependency edge, one operation depends on the data generated by the value of the other operation. Meanwhile, in the control dependence edge, the execution of one operation depends on the control flow determined by another operation.
Code Property Graph (CPG):
When we combine all these three graphs, AST, CFG and PDG, into a single graph, we will have CPG.
Installing Joern and Setup
To get started with Joern, follow these steps for installation:
mkdir joern && cd joern
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod u+x joern-install.sh
./joern-install.sh --interactive
For detailed installation, refer to the official Joern documentation
Analysing vulnerable application with Joern
Once Joern is set up, we can analyse a vulnerable Java application. We will clone this application and start the Joern shell. Once we have the Joern shell, we first need to import the source code into Joern to create its CPG. We can do this using the following command.
importCode("path/to/vulnerable-java-application")
This command takes the path to the source code as input and generates a binary representation of the CPG. We will do all the analysis on top of the generated CPG graph.
Identifying Vulnerabilities
The vulnerable Java application we are analysing has an existing RCE vulnerability. Our objective is to identify the vulnerable function that an attacker can exploit. In this case, the sink is Runtime.getRuntime().exec()
, a method that can lead to RCE vulnerabilities. We must also identify the source, a spring-based application typically tied to methods annotated with @RequestMapping
, @GetMapping
etc.
Step 1: Identifying the Source
To begin the analysis, we first need to locate the source from which the user input can be passed. Usually, in Spring applications, methods handling HTTP requests have annotations like @RequestMapping
, @GetMapping
, etc., which are the entry points for user input to the application.
To find these methods, we can use the following Joern query:
def source = cpg.method.where(_.annotation.name(".*Mapping")).parameter
This query identifies all parameters of methods annotated with @Mapping. These parameters are considered potential sources of untrusted input.
Step 2: Identifying the Sink
Next, we identify the sink, which, in our case, is the Runtime.getRuntime().exec() method. This is a known dangerous function that can execute arbitrary system commands.
def sink = cpg.call.name("exec")
This query locates all instances where the exec() method is called.
Step 3: Tracing the Data Flow
After identifying the source and sink, the final step is to trace the data flow between them. This allows us to determine if a path from the source to the sink could lead to an RCE.
sink.reachableByFlows(source).p
This command traces the flow of data from the source to the sink. If a path exists, it indicates a potential vulnerability where untrusted input can reach the dangerous exec() method.