Research Topics in Software Quality – Session 6: Intra-procedural analysis with Soot

Soot is a Java optimization framework that allows to manipulate, analyze and optimize Java code, through intermediate representations. We already used it indirectly when using LCT, and we will use it in this exercise session to build data-flow analyses.

Slides on intra-procedural data-flow analysis: http://soft.vub.ac.be/~cderoove/software_quality/05_intradataflow.pdf

Soot API: http://www.sable.mcgill.ca/soot/doc/

1 Setup

Get soot-trunk.jar from here: https://ssebuild.cased.de/nightly/soot/lib/soot-trunk.jar
Create a new Java project in Eclipse that will contain your analyses, and add soot-trunk.jar to this project. Define the following class to use Soot from your program:
```
public class Main {
    public static void main(String[] args) {
        soot.Main.main(args);
    }
}
```
Create a new Java project in Eclipse that will be analyzed, and fill in with some example code.

2 Running Soot from Eclipse

Launch your main class a first time by pressing the run button. This will create a new run configuration. You then need to modify this run configuration to pass arguments to Soot:
- Right click on the project > Run as > Run configurations…
- Go to the Arguments tab
- Add the following in the Program arguments box, where SootTest is the project to analyze and Foo is a class in this project:
```
-pp -cp ${workspace_loc:SootTest/bin} -p jb use-original-names:true Foo
```
You can then use the run button to launch Soot with the given arguments.

This method will be repeated to run your own analyses later in this exercise session.

3 Introduction to Soot

3.1 Intermediate representations

Soot has 4 intermediate representations for Java code, called Baf, Grimp, Jimple, Shimple. Jimple is the one that is most used and the only one we will look at.

The advantage of using Jimple to analyze Java code is its simplicity: it only has 15 statement types, compared to more than 200 for Java. Therefore, when writing analyses, we can restrict ourselves to a small set of statements.

Soot can transform Java bytecode to Jimple code using the -f J parameter.

Exercise: Convert the following Java code to Jimple using Soot, and inspect the resulting Jimple file.

public class Foo {
    public static void main(String[] args) {
        Foo f = new Foo();
        int a = 7;
        int b = 14;
        int x = (f.bar(21) + a) * b;
    }

    public int bar(int n) {
        return n + 42;
    }
}

3.2 Type of analysis

Soot can perform multiple type of analyses, on the different intermediate representations. Each of these analyses is called a phase and take its name from the representation on which it operates and what operation it performs. We will only look at one phase: jtp, for Jimple Transformation Phase. It performs an intra-procedural analysis on Jimple code with a possible transformation of the code. It is the analysis performed by default.

3.3 Soot constructs

Soot's philosophy is that access to syntactic elements should be abstract whenever possible. It therefore uses the concept of units and boxes.

3.3.1 Units and boxes

A unit represent some part of the source code. To access data from a unit:

getDefBoxes(),
getUseBoxes(),
getUseAndDefBoxes().

For example, in x = y + z, we have the following:

def boxes: x (instance of ValueBox with value set to an instance of Local),
use boxes: y (Local), z (Local), y + z (Expr).

To retrieve the value of a ValueBox, use the getValue() method.

3.3.2 Flow sets

Flow sets are computed during the analysis and represent the data-flow you are interested in. The following operations are available on them:

a.intersection(b, c) means $c = a \cap b$
a.union(b, c) means $c = a \cup b$
a.difference(b, c) means $c = a \setminus b$
c.complement(d) means $d = \bar{c}$
d.add(v) means $d = d \cup \{v\}$
a.copy(b) copies a in b
a.clear() removes everything from a

There are multiple implementations of flow sets available, but we will only use the ArraySparseSet, as it is the most natural one: it implements a set of Java objects.

4 Performing intra-procedural data-flow analysis

4.1 Steps

Soot allows us to implement our own data-flow analysis by only writing code related to our analysis, and let us avoid having to write code that manipulates the CFG or implementation of the abstract domains.

To implement a data-flow analysis in Soot, we need to perform the following steps:

Implement a subclass of either ForwardFlowAnalysis or BackwardFlowAnalysis, depending on which way we want to perform the analysis. A forward analysis starts from the beginning of the program and goes over the statement in execution order, computing the value of the flow set after a statement from the value of the flow set before the statement (computes out from in) . A backward analysis starts from the end of the program and goes over the statements in reverse order, and computes in from out. The constructor of this class should call the doAnalysis() method to actually perform the analysis.
Implement the abstraction used by the analysis. To do so, decide what will be contained in the flow set of each node of the CFG. Then, define merge() and copy(). If the analysis is a may analysis, merge will likely be defined as an union. If the analysis is a must analysis, merge will likely be defined as an intersection.
Write equations for the analysis and implement them in the flowThrough() method. This is where most of the work is required.
Define initial flow set values by implementing newInitialFlow() and entryInitialFlow().

4.2 Skeleton

Here is a skeleton for a forward analysis where the flow set is composed of Foo objects:

import soot.G;
import soot.Unit;
import soot.toolkits.graph.DirectedGraph;
import soot.toolkits.scalar.FlowSet;
import soot.toolkits.scalar.ForwardFlowAnalysis;

public class FooAnalysis extends ForwardFlowAnalysis<Unit,FlowSet<Foo>> {
    public FooAnalysis(DirectedGraph<Unit> g) {
        super(g);
        // You can perform some initialization code here
        // Then we launch the analysis
        doAnalysis();
    }

    protected void merge(FlowSet<Foo> src1, FlowSet<Foo> src2, FlowSet<Foo> dest) {
        // Merges two source flow sets into a destination flow set
    }

    protected void copy(FlowSet<Foo> src, FlowSet<Foo> dest) {
        // Copies src to dest
    }

    protected FlowSet<Foo> entryInitialFlow() {
        // Returns the initial flow set (the flow set used a the beginning
        // of the analysis)
        return null;
    }

    protected FlowSet<Foo> newInitialFlow() {
        // Returns the flow set used when a new flow set is necessary
        // Not always the same as entryInitialFlow
        return null;
    }


    protected void flowThrough(FlowSet<Foo> src, Unit node, FlowSet<Foo> dest) {
        // The core of the analysis: analyses a node, reading the previous flow
        // set from src, writing the new flow set into dest
        // You can print the unit analyzed and the resulting flow set
        G.v().out.println(node.toString() + " // " + dest.toString());
    }
}

For a backward analysis, extend instead the soot.toolkits.scalar.BackwardFlowAnalysis class.

To use an analysis, you have to define a main class that wraps Soot's main function and contains the following code:

import java.util.Map;

import soot.Body;
import soot.BodyTransformer;
import soot.G;
import soot.PackManager;
import soot.Transform;
import soot.Unit;
import soot.toolkits.graph.ExceptionalUnitGraph;

public class FooMain {
    public static void main(String[] args) {
        PackManager.v().getPack("jtp").add(
            new Transform("jtp.myTransform", new BodyTransformer() {
                @SuppressWarnings("rawtypes")
                protected void internalTransform(Body body, String phase, Map options) {
                    G.v().out.println("Analyzing " + body.getMethod().getName());
                    // Create an instance of our analysis on the method's body
                    // This also performs the analysis.
                    FooAnalysis analysis = new FooAnalysis(new ExceptionalUnitGraph(body));
                    // Here you can do something like reporting results of
                    // the analysis. The results can be communicated from
                    // the analysis to here through a method in FooAnalysis.
                    // To print a value, use G.v().out.println() instead of System.out.println()
                }
            }));
        soot.Main.main(args);
    }
}

4.3 Live variables analysis

A variable is live if it will be used later in the program without being redefined. The analysis should detect variables that may be live. The following examples show the set of live variables between each statement:

int a = 0;
// {a}
int x = 1;
// {a}
int y = 2;
// {y,a}
x = 3;
// {x,y,a}
int z = a + x + y;
// {z}
System.out.println(z);

int a = 0;
// {a}
int x = 1;
// {a}
int y = 2;
// {y, a}
x = 3;
// {x,y,a}
if (rand())
  // {a,x}
  y = 1;
else
  // {a,y}
  x = 1;
// {a,x,y}
int z = a + x + y;
// {z}
System.out.println(z);

Exercise: answer the following questions:

Is this a backward or forward analysis? Do we compute values from start to end or from end to start?
What will the flow set contain? Is this analysis a may or must analysis? How would you then define the merge operation?
How would you compute the value of the out set given the in, or of the in given the out, depending on your answer to 1. We can often define the analysis in two operations: a kill which removes values from the flow set, depending on the operation, and a gen which adds element to the flow set. What would those operations be in this case?
What is the initial value of the flow set?

To run the analysis, fill-in the skeleton and paste in the main function, adapting the class names. Then, add the following argument to Soot, to ensure that only your analysis is executed (and no other optimization takes place):

-p jtp.myTransform on

4.4 Guaranteed definition analysis

Exercise: write an analysis that detects all variables guaranteed to be defined at a program point. A variable is guaranteed to be defined at a program point if it has been initialized.

For example:

int a = 0;
int b;                    // {a}
b = 0;                    // {a}
int c;                    // {a, b}
a = a + b;                // {a, b}
c = 5;                    // {a, b}
System.out.println(c);    // {a, b, c}

4.5 Available expressions analysis

Exercise: write an analysis that find available expressions at every program point. From the lecture's slides: an expression e is available at program point p if

e is computed on every path to p, and
the value of e has not changed since the last time e was computed on the paths to p

For example:

int a = 0;
int b = a+2;   // {a+2}
int c = b*b;   // {a+2, b*b}
int d = c+1;   // {a+2, b*b, c+1}
c = 4;         // {a+2, b*b}
if (b < c)     // {a+2, b*b, b < c}
  b = 2;       // {a+2}
else           // {a+2, b*b, b < c}
  c = a+1;     // {a+2, b*b}
System.out.println(a + ", " + b + ", " + c + ", " + d); // avoid having the branches removed
return;        // {a+2}

4.6 Very busy expressions analysis

Exercise: write a very busy expressions analysis. An expression is very busy at a point if it is guaranteed that the expression will be computed in the future, before its value change. For example, The following code is annotated on each line with the set of busy expressions after the corresponding line is evaluated:

int a = 1;
int b = 2;                 // {a+b}
int x = 100;               // {a+b,x-1}
int y = a+b;               // {x-1}
x = x-1;                   // {x > 0}
if (x > 0) {               // {}
    System.out.println(y); // {}
}

4.7 Nullness analysis

4.7.1 Lattice

In this analysis, the flowset will need to inform us whether a variable is certainly null, certainly not null, or if we don't know. For that, we will use a lattice composed of four values:

null,
not null,
maybe null,
a bottom value.

Exercise: define the ordering of the lattice using a Hasse diagram. Then, define the join operation of this lattice in Java, using the following skeleton:

class NullnessLattice {
    public interface Element {
        public Element join(Element x);
    }
    public static final Element BOTTOM = new Element() {
        public Element join(Element x) {
            // To fill...
        }
        public String toString() {
            return "Bottom";
        }
    };
    public static final Element MAYBE_NULL = new Element() {
        public Element join(Element x) {
            // To fill...
        }
        public String toString() {
            return "MaybeNull";
        }
    };
    public static final Element NOT_NULL = new Element() {
        public Element join(Element x) {
            // To fill...
        }
        public String toString() {
            return "NotNull";
        }
    };
    public static final Element NULL = new Element() {
        public Element join(Element x) {
            // To fill...
        }
        public String toString() {
            return "Null";
        }
    };
}

4.7.2 Analysis

Exercise: we will write an analysis that records the nullness of local variables. Answer the following questions:

Is it a forward or backward flow analysis?
What is the abstraction that will be used? What will the merge operation consist of?
For this analysis, we will not define gen and kill operations, but we will modify the flow value according to some cases:
- if statement with equality or non-equality check with null (e.g. if (x == null))
- field reference (e.g. z.bar)
- method invocation (e.g. x.foo())
- variable definition (e.g. x = null, x = this, x = y, x = foo())
How should the flow value be changed in those cases?
What is the initial flow value?

Annotate the following code by hand with the expected flow values between each statement (start with the bottom element for the flow set)O:

if (y == null) {
  x = new Foo();
} else {
  x.foo();
  y = null;
}
y = z.bar;

4.8 Sign analysis

Exercise:

Design a lattice that will be used to know whether a number is positive or negative. This lattice should have:
- a method to create a lattice value from an integer, and
- the four basic mathematical operations (+, *, /, -) between elements of this lattice defined.
Define a data-flow analysis that analyzes arithmetic operations in a program to find the sign of each variable at a certain point of a program. Do this incrementally:
- First, analyze only simple cases: assignment from a constant value to a variable, equality comparison between a local variable and an integer,
- Then, analyze more complex assignments doing arithmetic operations,
- Finally, add support for more comparison operators (>, etc.).
At every step, test your analysis on the following methods and verify the validity of the result:

int foo(int a, int b) {
    int x = 0;
    if (a != 0) {
        x = a;
    } else {
        x = 1;
    }
    int z = b / x;
    return z;
}

int bar(int a, int b) {
    int x = b - b;
    int i = x + 1;
    int z = a / x;
    return z;
}

5 Further material

This exercise session is mostly based on:

A Survivor's Guide to Java Program Analysis with Soot, by Árni Einarsson and Janus Dam Nielsen¹.
Analyzing Java Programs with Soot by Bruno Dufour².

Some exercises are also inspired from analyses defined in Soot's implementation.

Footnotes:

: http://www.brics.dk/SootGuide/

: http://www.iro.umontreal.ca/~dufour/cours/ift6315/docs/soot-tutorial.pdf