Research Topics in Software Quality – Session 6: Intra-procedural analysis with Soot
Soot is a Java optimization framework that allows to manipulate, analyze and optimize Java code, through intermediate representations. We already used it indirectly when using LCT, and we will use it in this exercise session to build data-flow analyses.
Slides on intra-procedural data-flow analysis: http://soft.vub.ac.be/~cderoove/software_quality/05_intradataflow.pdf
Soot API: http://www.sable.mcgill.ca/soot/doc/
1 Setup
- Get
soot-trunk.jar
from here: https://ssebuild.cased.de/nightly/soot/lib/soot-trunk.jar Create a new Java project in Eclipse that will contain your analyses, and add
soot-trunk.jar
to this project. Define the following class to use Soot from your program:public class Main { public static void main(String[] args) { soot.Main.main(args); } }
- Create a new Java project in Eclipse that will be analyzed, and fill in with some example code.
2 Running Soot from Eclipse
- Launch your main class a first time by pressing the run button. This will
create a new run configuration. You then need to modify this run
configuration to pass arguments to Soot:
- Right click on the project > Run as > Run configurations…
- Go to the Arguments tab
Add the following in the Program arguments box, where
SootTest
is the project to analyze andFoo
is a class in this project:-pp -cp ${workspace_loc:SootTest/bin} -p jb use-original-names:true Foo
- You can then use the run button to launch Soot with the given arguments.
This method will be repeated to run your own analyses later in this exercise session.
3 Introduction to Soot
3.1 Intermediate representations
Soot has 4 intermediate representations for Java code, called Baf, Grimp, Jimple, Shimple. Jimple is the one that is most used and the only one we will look at.
The advantage of using Jimple to analyze Java code is its simplicity: it only has 15 statement types, compared to more than 200 for Java. Therefore, when writing analyses, we can restrict ourselves to a small set of statements.
Soot can transform Java bytecode to Jimple code using the -f J
parameter.
Exercise: Convert the following Java code to Jimple using Soot, and inspect the resulting Jimple file.
public class Foo { public static void main(String[] args) { Foo f = new Foo(); int a = 7; int b = 14; int x = (f.bar(21) + a) * b; } public int bar(int n) { return n + 42; } }
3.2 Type of analysis
Soot can perform multiple type of analyses, on the different intermediate
representations. Each of these analyses is called a phase and take its name
from the representation on which it operates and what operation it performs. We
will only look at one phase: jtp
, for Jimple Transformation Phase. It
performs an intra-procedural analysis on Jimple code with a possible
transformation of the code. It is the analysis performed by default.
3.3 Soot constructs
Soot's philosophy is that access to syntactic elements should be abstract whenever possible. It therefore uses the concept of units and boxes.
3.3.1 Units and boxes
A unit represent some part of the source code. To access data from a unit:
getDefBoxes()
,getUseBoxes()
,getUseAndDefBoxes()
.
For example, in x = y + z
, we have the following:
- def boxes:
x
(instance ofValueBox
with value set to an instance ofLocal
), - use boxes:
y
(Local
),z
(Local
),y + z
(Expr
).
To retrieve the value of a ValueBox
, use the getValue()
method.
3.3.2 Flow sets
Flow sets are computed during the analysis and represent the data-flow you are interested in. The following operations are available on them:
a.intersection(b, c)
means \(c = a \cap b\)a.union(b, c)
means \(c = a \cup b\)a.difference(b, c)
means \(c = a \setminus b\)c.complement(d)
means \(d = \bar{c}\)d.add(v)
means \(d = d \cup \{v\}\)a.copy(b)
copies a in ba.clear()
removes everything froma
There are multiple implementations of flow sets available, but we will only use
the ArraySparseSet
, as it is the most natural one: it implements a set of Java
objects.
4 Performing intra-procedural data-flow analysis
4.1 Steps
Soot allows us to implement our own data-flow analysis by only writing code related to our analysis, and let us avoid having to write code that manipulates the CFG or implementation of the abstract domains.
To implement a data-flow analysis in Soot, we need to perform the following steps:
- Implement a subclass of either
ForwardFlowAnalysis
orBackwardFlowAnalysis
, depending on which way we want to perform the analysis. A forward analysis starts from the beginning of the program and goes over the statement in execution order, computing the value of the flow set after a statement from the value of the flow set before the statement (computes out from in) . A backward analysis starts from the end of the program and goes over the statements in reverse order, and computes in from out. The constructor of this class should call thedoAnalysis()
method to actually perform the analysis. - Implement the abstraction used by the analysis. To do so, decide what will
be contained in the flow set of each node of the CFG. Then, define
merge()
andcopy()
. If the analysis is a may analysis,merge
will likely be defined as an union. If the analysis is a must analysis,merge
will likely be defined as an intersection. - Write equations for the analysis and implement them in the
flowThrough()
method. This is where most of the work is required. - Define initial flow set values by implementing
newInitialFlow()
andentryInitialFlow()
.
4.2 Skeleton
Here is a skeleton for a forward analysis where the flow set is composed of
Foo
objects:
import soot.G; import soot.Unit; import soot.toolkits.graph.DirectedGraph; import soot.toolkits.scalar.FlowSet; import soot.toolkits.scalar.ForwardFlowAnalysis; public class FooAnalysis extends ForwardFlowAnalysis<Unit,FlowSet<Foo>> { public FooAnalysis(DirectedGraph<Unit> g) { super(g); // You can perform some initialization code here // Then we launch the analysis doAnalysis(); } protected void merge(FlowSet<Foo> src1, FlowSet<Foo> src2, FlowSet<Foo> dest) { // Merges two source flow sets into a destination flow set } protected void copy(FlowSet<Foo> src, FlowSet<Foo> dest) { // Copies src to dest } protected FlowSet<Foo> entryInitialFlow() { // Returns the initial flow set (the flow set used a the beginning // of the analysis) return null; } protected FlowSet<Foo> newInitialFlow() { // Returns the flow set used when a new flow set is necessary // Not always the same as entryInitialFlow return null; } protected void flowThrough(FlowSet<Foo> src, Unit node, FlowSet<Foo> dest) { // The core of the analysis: analyses a node, reading the previous flow // set from src, writing the new flow set into dest // You can print the unit analyzed and the resulting flow set G.v().out.println(node.toString() + " // " + dest.toString()); } }
For a backward analysis, extend instead the
soot.toolkits.scalar.BackwardFlowAnalysis
class.
To use an analysis, you have to define a main class that wraps Soot's main function and contains the following code:
import java.util.Map; import soot.Body; import soot.BodyTransformer; import soot.G; import soot.PackManager; import soot.Transform; import soot.Unit; import soot.toolkits.graph.ExceptionalUnitGraph; public class FooMain { public static void main(String[] args) { PackManager.v().getPack("jtp").add( new Transform("jtp.myTransform", new BodyTransformer() { @SuppressWarnings("rawtypes") protected void internalTransform(Body body, String phase, Map options) { G.v().out.println("Analyzing " + body.getMethod().getName()); // Create an instance of our analysis on the method's body // This also performs the analysis. FooAnalysis analysis = new FooAnalysis(new ExceptionalUnitGraph(body)); // Here you can do something like reporting results of // the analysis. The results can be communicated from // the analysis to here through a method in FooAnalysis. // To print a value, use G.v().out.println() instead of System.out.println() } })); soot.Main.main(args); } }
4.3 Live variables analysis
A variable is live if it will be used later in the program without being redefined. The analysis should detect variables that may be live. The following examples show the set of live variables between each statement:
int a = 0; // {a} int x = 1; // {a} int y = 2; // {y,a} x = 3; // {x,y,a} int z = a + x + y; // {z} System.out.println(z);
int a = 0; // {a} int x = 1; // {a} int y = 2; // {y, a} x = 3; // {x,y,a} if (rand()) // {a,x} y = 1; else // {a,y} x = 1; // {a,x,y} int z = a + x + y; // {z} System.out.println(z);
Exercise: answer the following questions:
- Is this a backward or forward analysis? Do we compute values from start to end or from end to start?
- What will the flow set contain? Is this analysis a may or must
analysis? How would you then define the
merge
operation? - How would you compute the value of the out set given the in, or of the in
given the out, depending on your answer to 1.
We can often define the analysis in two operations: a
kill
which removes values from the flow set, depending on the operation, and agen
which adds element to the flow set. What would those operations be in this case? - What is the initial value of the flow set?
To run the analysis, fill-in the skeleton and paste in the main function, adapting the class names. Then, add the following argument to Soot, to ensure that only your analysis is executed (and no other optimization takes place):
-p jtp.myTransform on
4.4 Guaranteed definition analysis
Exercise: write an analysis that detects all variables guaranteed to be defined at a program point. A variable is guaranteed to be defined at a program point if it has been initialized.
For example:
int a = 0; int b; // {a} b = 0; // {a} int c; // {a, b} a = a + b; // {a, b} c = 5; // {a, b} System.out.println(c); // {a, b, c}
4.5 Available expressions analysis
Exercise: write an analysis that find available expressions at every program point. From the lecture's slides: an expression e is available at program point p if
- e is computed on every path to p, and
- the value of e has not changed since the last time e was computed on the paths to p
For example:
int a = 0; int b = a+2; // {a+2} int c = b*b; // {a+2, b*b} int d = c+1; // {a+2, b*b, c+1} c = 4; // {a+2, b*b} if (b < c) // {a+2, b*b, b < c} b = 2; // {a+2} else // {a+2, b*b, b < c} c = a+1; // {a+2, b*b} System.out.println(a + ", " + b + ", " + c + ", " + d); // avoid having the branches removed return; // {a+2}
4.6 Very busy expressions analysis
Exercise: write a very busy expressions analysis. An expression is very busy at a point if it is guaranteed that the expression will be computed in the future, before its value change. For example, The following code is annotated on each line with the set of busy expressions after the corresponding line is evaluated:
int a = 1; int b = 2; // {a+b} int x = 100; // {a+b,x-1} int y = a+b; // {x-1} x = x-1; // {x > 0} if (x > 0) { // {} System.out.println(y); // {} }
4.7 Nullness analysis
4.7.1 Lattice
In this analysis, the flowset will need to inform us whether a variable is certainly null, certainly not null, or if we don't know. For that, we will use a lattice composed of four values:
- null,
- not null,
- maybe null,
- a bottom value.
Exercise: define the ordering of the lattice using a Hasse diagram. Then, define the join operation of this lattice in Java, using the following skeleton:
class NullnessLattice { public interface Element { public Element join(Element x); } public static final Element BOTTOM = new Element() { public Element join(Element x) { // To fill... } public String toString() { return "Bottom"; } }; public static final Element MAYBE_NULL = new Element() { public Element join(Element x) { // To fill... } public String toString() { return "MaybeNull"; } }; public static final Element NOT_NULL = new Element() { public Element join(Element x) { // To fill... } public String toString() { return "NotNull"; } }; public static final Element NULL = new Element() { public Element join(Element x) { // To fill... } public String toString() { return "Null"; } }; }
4.7.2 Analysis
Exercise: we will write an analysis that records the nullness of local variables. Answer the following questions:
- Is it a forward or backward flow analysis?
- What is the abstraction that will be used? What will the merge operation consist of?
For this analysis, we will not define
gen
andkill
operations, but we will modify the flow value according to some cases:- if statement with equality or non-equality check with
null
(e.g.if (x
==
null)
) - field reference (e.g.
z.bar
) - method invocation (e.g.
x.foo()
) - variable definition (e.g.
x = null
,x = this
,x = y
,x = foo()
)
How should the flow value be changed in those cases?
- if statement with equality or non-equality check with
- What is the initial flow value?
Annotate the following code by hand with the expected flow values between each statement (start with the bottom element for the flow set)O:
if (y == null) { x = new Foo(); } else { x.foo(); y = null; } y = z.bar;
4.8 Sign analysis
Exercise:
- Design a lattice that will be used to know whether a number is positive or
negative. This lattice should have:
- a method to create a lattice value from an integer, and
- the four basic mathematical operations (
+
,*
,/
,-
) between elements of this lattice defined.
- Define a data-flow analysis that analyzes arithmetic operations in a
program to find the sign of each variable at a certain point of a
program. Do this incrementally:
- First, analyze only simple cases: assignment from a constant value to a variable, equality comparison between a local variable and an integer,
- Then, analyze more complex assignments doing arithmetic operations,
- Finally, add support for more comparison operators (
>
, etc.).
- At every step, test your analysis on the following methods and verify the validity of the result:
int foo(int a, int b) { int x = 0; if (a != 0) { x = a; } else { x = 1; } int z = b / x; return z; } int bar(int a, int b) { int x = b - b; int i = x + 1; int z = a / x; return z; }