# Support for Libraries in Abstract Interpreters

Abstract interpretation is a static program analysis technique that can be used to reason over programs and their behavior. It consists of building an interpreter that works with abstract values: concrete values are approximated by sets of values (e.g., 5 can be approximated by the interval [0, 10], or by its type: Int). A frequently used approach to perform abstract interpretation is to describe the interpreter as an abstract state machine, and to abstract this machine. This is the AAM (Abstracting Abstract Machines) approach [1].

This approach performs whole-program analysis: it executes a program from top to bottom. It is able to handle programs of small to moderate size. Programs that depend on libraries are however more difficult to handle. Suppose we want to analyze the following program:

(load "math.scm") ;; 5000 LOC long library, that defines pi as well as many other values and functions

(define (circle-perimeter radius)

(* 2 pi radius))

(circle-perimeter 5)

A simple solution to analyze this program is just to replace the load operation by the full library file itself. However, evaluating the full library is far from necessary since only the value "pi" is being used in the program. Moreover, abstract interpretation works well on small programs, but evaluating thousands of lines of code of Scheme is still problematic. The library can however consist of any Scheme code, and just scanning for definitions in the file might not be enough. For example, consider the following library code:

(define pi 0.)

(define (approximate-pi n)

;; approximate pi: pi = 4 * Sum(((-1)^n)/(2n + 1))

(set! pi (+ pi (/ (if (= (modulo n 2) 0) -1 1)

(+ (* 2 n) 1))))

(if (= n 0)

(set! pi (* 4 pi))

(approximate-pi (- n 1))))

(approximate-pi 100)

The value of pi is not apparent from its definition. However, after the library is loaded, pi has a specific value that does not change anymore, and can be for example replaced by its value.

This example raises another problem: with abstract interpretation, the evaluation works on abstract values, where precision is deliberately lost in order to gain decidability. In library code however, it is known in advance that the evaluation will terminate, and that, in this case, (approximate-pi 100) can safely be evaluated with concrete values in order to have a precise value for pi.

Not many related work exist on this problem. Andreassen and Møller investigated support for jQuery in a static analyzer for Javascript [1] and is a good starting point for this work.

The goal of this thesis is to add support for the load construct in the abstract interpretation framework developped at the Software Languages Lab [2]. The student would have to investigate, implement and compare the following techniques:

- Baseline: performing abstract interpretation of the libraries when they are loaded
- Baseline concrete: performing concrete interpretation of the libraries when they are loaded, followed by abstracting the results.
- Determinacy: adapt the analysis described in [2] to Scheme programs
- One or more techniques that the student will come up with when developping the other techniques. Some ideas include:
- Performing abstract interpretation of the library in order to detect which parts have to be evaluated when a specific function or value is used. This information could be added as annotation, and the full library won't have to be loaded anymore.
- Loading the library in the concrete and marshalling the final concrete state. When the library is required, only this state has to be abstracted.

[1]: Abstacting Abstract Machines, M. Might, D. Van Horn, 2010. http://matt.might.net/papers/vanhorn2010abstract.pdf

[2]: Determinacy in static analysis for jQuery, E. Andreassen, A. Møller, 2014. http://cs.au.dk/~amoeller/papers/jquery/paper.pdf

[3]: scala-am: http://github.com/acieroid/scala-am