1.12 First-class Procedures: Code is Data
1.12.1 Preface: What’s wrong with Exprs-Lang v8?
Actually, not much. With structured data types, Exprs-lang v8 is a pretty good language now.
Exprs-bits-lang v8 is sufficiently expressive to act as a reasonable compiler backend for many languages. It’s roughly at the abstraction level of C. It provides raw access to memory, pointers, labels (function pointers) and bitwise operations, but also non-tail calls and some nice syntactic properties like algebraic expressions.
Exprs-lang v8 adds safety on top of that language, although this
safety does come at a cost.
One major limitation in Exprs-lang v8 is the lack of abstractions
over computation.
We have lots of abstraction over data, but it’s common to want to abstract
over computation—
In this chapter, we add the ability to easily abstract over computations at any point
via first-class procedures.
Many languages provide some version of this—
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Now, lambda can appear in any expression. We can still define procedures at the top-level using define, although the semantics will change.
We add a new data structure to the language as well: procedures. These are implicitly constructed by lambda. They support two operations, in addition to call: the tag-checking predicate procedure? and procedure-arity, for inspecting how many parameters the procedure expects take.
This is a syntactically small change, but it has massive implications.
1.12.2 Procedures, Closures and Closure Conversion
So far, procedures in our language have been compiled directly to labeled
code—
1.12.2.1 The First-class Procedure
To support first-class procedures, we need to compile procedures to a data structure. This data structure allows us to construct a procedure value, pass it around and return it, call it (safety), and captures both the procedure’s code, but also any information we need about the procedure.
To add a new data structure, we need a new primary tag and a new collection of primitive operations. We use the tag #b010 for procedures.
#b001, pairs
#b010, (first-class) procedures
#b011, vectors
#b100, unused
#b101, unused
#b110, non-fixnum immediates (booleans, etc)
#b111, unused
In the source language, we expose the primitive operations procedure? and procedure-arity. However, the compiler intermediate languages expose a few more operations that the compiler needs to make use of to implement procedure calls.
Every instance of lambda compiles to a procedure. The procedure now has three pieces of information: its arity for dynamic checking; the label to its code, the computation it executes when invoked; and its environment, the values of the free variables used in the definition of the procedure. We compile each application of a procedure to dereference and call the label of the procedure, but also to pass a reference to the procedure itself as a parameter. Essentially, the procedure is an object, and receives itself as an argument. Each "free variable" x is a field of that object, and are compiled to references to self.x.
(make-procedure value_label value_arity value_size)
Creates a procedure whose label is value_label, which expects value_arity number of arguments, and has an environment of size value_size.
make-procedure does not perform any error checking; it must be applied to a label and two fixnum ptrs. This is safe because no user can access make-procedure directly. Only the compiler generates uses of this operator, and surely our compiler uses it correctly.
In the source language, make-procedure is not exposed directly; instead, lambda is compiled down to this primitive.
(unsafe-procedure-ref value_proc value_index)
Return the value at index value_index in the environment of the procedure value_proc.
As with all unsafe operators, this does not perform any checking.
In the source language, unsafe-procedure-ref is not exposed directly. This is used to access variables outside of the procedure’s scope, but in scope at the time the procedure is created. We use this to implement closures, which we describe shortly.
(unsafe-procedure-set! e_proc e_index e_val)
Set the value at index value_index in the environment of the procedure value_proc to be value_val.
In the source language, unsafe-procedure-set! is not exposed directly.
(unsafe-procedure-label value_proc)
Returns the label to the code for the procedure value_proc.
(call value_label value ...)
Call the code whose label is value_label with the arguments value.
This is essentially the same as the call primitive in previous chapters, although we now allow labels to be computed and passed as values. It is unsafe and with no dynamic checks, so some earlier pass must insert dynamic checks to ensure safety.
Our procedure data structure is essentially a vector containing a label to the code and the values of each free variable in its environment.
The challenge in implementing procedures is primarily in compiling lambda down to the procedure primitives, then specifying the representation of these procedure primitives in terms of calls to labelled code. All compiler passes below specify-representation remain unchanged.
Until now, all procedures were bound at the top-level in a set of mutually-recursive definitions. To work with first-class procedures in intermediate languages, we need to be able to represent sets of mutually recursive definitions that appear as expressions. We introduce the letrec construct to aid with this. (letrec ([aloc e] ...) e_2) binds each aloc in each e, including its own right-hand-side, as well as binding all alocs in e_2. For now, we only consider a restricted form of letrec that only binds procedures.
1.12.2.2 The Closure
Our procedure data structure implements a closure, a procedure’s code paired with the values of free variables from the environment in which the procedure was created. This allows us to create procedures that refer to variables outside of their own scope, but still retain references to those variables even when the procedure is passed to a different scope.
As an intermediate step in compiling first-class procedures, we introduce explicit closure primitives which compile to the procedure primitives. There is no primary tag for this data structure, since it will be implemented by the lower-level procedure data type.
Closures support two operations. First, you can call a closure with (closure-call e es ...), which essentially extracts the label from the closure e and calls the procedure at that label with the argument (es ...). Second, you can dereference an environment variable from the closure with (closure-ref e value_i), extracting the value at index value_i from the environment of the closure e.
Because we want to implement safe procedure application, we add a third field to the closure: its arity, the number of arguments expected by the code of the closure.
(make-closure value_label value_arity value_i ...)
Creates a closure whose code is at label value_label, which expects value_arity number of arguments, and has the values value_i in its environment.
(closure-call value_c value ...)
Safely call the closure value_c, invoking its code, with the arguments (value ...).
(closure-ref value_c value_i)
Deference the value at index value_i in the environment of the closure value_c. Since this dereference is only generated by the compiler, it always succeeds and performs no dynamic checks. The environment is 0-indexed.
1.12.2.3 Closure Conversion
The main problem with compiling first-class procedure is that we need to lift their code to the top-level, but they have references to free variables which go out of scope if we move the procedure definition. We deal with this by converting all procedures to closures, rebinding the free variables in the code as explicit dereferences from the closure’s environment, then lifting the now-closed code definitions to the top-level. This process is called closure conversion.
Closure conversion is not the only way to implement first-class procedures. An alternative that can avoid some of the allocation cost of closures is defunctionalization, but this does not work well with separate compilation.
`(lambda (,alocs ...) ,value) => `(lambda ,(info-set '() 'free (set-subtract (free-var value) alocs)) (,alocs ...) ,value)
A variable is considered free in a scope if it is not in the set of variables bound by that scope, if it is referenced in any expression in which the scope binds variables, and if the reference is not bound. A variable is bound if it is referenced inside a scope for which it is declared in the set of variables bound by that scope.
Note that all variables are bound, which is enforced by check-exprs-lang, but they can be free relative to a particular scope.
Transform each lambda. Each lambda is transformed to take a new formal parameter, which is its closure, and to be bound to a label in its enclosing letrec. We can think of this as adding a this or self argument to each procedure.
The abstract location to which the the lambda was previously bound must now be bound to a closure. The closure has n + 2 fields, where n is the number of free variables in the lambda. The first field is the label to which the closure’s code is bound. The final n fields are references to the lexical variables in the environment of the closure.
In essence, we transform- Transform each call. Every procedure now takes an extra argument, its closure, so we have to expand each call. The essence of the translation is:
1.12.3 Administrative Passes
Allowing procedures to be bound in two different ways is great for programmer convenience, but annoying for a compiler writer. Before we get to implementing procedures, we simplify and regularize how procedures appear in our language.
The first big benefit to the programmer comes in check-exprs-lang. Since we finally have a procedure data type, and procedure primitives to enable dynamic checking, we can finally stop type checking programs. Now, it is valid and does not cause undefined behaviour to pass procedures as arguments, return procedures, or call an arbitrary variables with an arbitrary number of arguments. The language will dynamically check whether any of those expressions is safe before attempting to execute them
procedure
p : any Validates that input is a well-bound Exprs-lang v9 program. There are no other static restrictions.
As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc.
Unlike in previous versions, there are no labels after uniquify. All of our procedures are data, not merely code, and cannot easily be lifted to the top level yet, so it is now the job of a later pass to introduce labels.
Below we define Exprs-unique-lang v9. We typeset the changes with respect to Exprs-lang v9.
p | ::= | (module (define aloc label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= |
p | ::= | (module (define aloc x (lambda (aloc x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | x | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc x ...) value) | |||
x | ::= | name? | ||
| | prim-f | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= |
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
procedure
p : exprs-lang-v9? Resolves all lexical identifiers into unique abstract locations.
Not much changes in implement-safe-primops. The target language of the pass, Exprs-unsafe-data-lang v9, is defined below.
p | ::= | (module (define aloc label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
primop | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
primop unsafe-fx* | ::= | unsafe-fx+ | ||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | unop | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity |
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Note that this pass does not implement safe call,
but can be safely applied to arbitrary data—
procedure
p : exprs-unique-lang-v9? Implement safe primitive procedures by inserting procedure definitions for each primitive operation which perform dynamic tag checking, to ensure type and memory safety.
Now we implement call in terms of unsafe-procedure-call and unsafe-procedure-arity. Note that we cannot simply define call as a procedure, like we did with other safe wrappers, since it must count its arguments, and we must support a variable number of arguments to the procedure.
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) |
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
`(call ,e ,es ...) => `(if (procedure? ,e) (if (eq? (unsafe-procedure-arity ,e) ,(length es)) (unsafe-procedure-call ,e ,es ...) ,bad-arity-error) ,bad-proc-error)
If we statically track the arity of procedures, we can optimize this transformation in some cases.
procedure
p : exprs-unsafe-data-lang-v9? Implement call as an unsafe procedure call with dynamic checks.
Some procedures now appear in local expressions, and some appear defined at the
top-level.
This presents two problems.
First, our compiler later assumes that all data (as opposed to code) is
locally defined—
To deal with these, we introduce two small administrative passes: define->letrec and dox-lambdas.
First, in define->letrec, we elaborate define into a local binding form letrec, which will be used to bind all procedures.
letrec, unlike let, supports multiple bindings in a single form, and each bound expression can refer to any variable in the set of bindings for the letrec. This is important to capture mutually-recursive functions, and has the same binding structure as our top-level defines.
Design digression:In general, a language might impose additional semantics on define, such as allowing defined data to be exported and imported at module boundaries. This would require additional handling of define, and the ability to generate labelled data in the back-end of the compiler. We continue to ignore separate compilation and linking, so we treat define as syntactic sugar for letrec.
Below we define Just-Exprs-lang v9.
p | ::= | (module (define aloc (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) |
p | ::= | (module value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
procedure
p : exprs-unsafe-lang-v9? Transform all top-level bindings into local bindings.
Before we start compiling lambdas, we should try to get rid of
them.
Direct calls to lambdas, such as (call (lambda (x) x) 1), are simple to rewrite to a let binding, such as
(let ([x 1]) x).
A human programmer may not write this kind of code much, but most programs are
not written by humans—
Design digression:
procedure
p : just-exprs-lang-v9? Inline all direct calls to first-class procedures.
(call (lambda (x f) (call f x x)) 1 (lambda (x y) (call + x y)))
This is great for functional programmers, who value freedom, but bad for compilers who want to keep track of everything.
We bind all procedures to names to simplify lifting code to the top-level and assigning labels later.
We transform each `(lambda (,alocs ...) ,e) into `(letrec ([,tmp (lambda (,alocs ...) ,e)]) ,tmp), where tmp is a fresh aloc.
We define Lam-opticon-lang v9, in which we know the name of every procedure.
p | ::= | (module value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) value) |
p | ::= | (module value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
procedure
p : just-exprs-lang-v9? Explicitly binds all procedures to abstract locations.
1.12.4 Implementing Closure Conversion
The rest of our compiler expects procedures to be little more than labeled blocks of code. Unfortunately, now our first-class procedures can contain references to free variables in their lexical scope. This means we cannot simply lift first-class procedure definitions to the top-level, stick on a label, and generate a labeled procedure.
First, we uncover the free variables in each lambda. We add these as an annotation on the lambda, which the next pass uses to generate closures.
p | ::= | (module value) | ||
info | ::= | ((free (aloc ...)) any ...) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda info (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal |
p | ::= | (module value) | ||
info | ::= | ((free (aloc ...)) any ...) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (unsafe-procedure-call value value ...) | |||
| | (letrec ([aloc (lambda info (aloc ...) value)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
To find the free abstract locations, we traverse the body of each lambda remembering any abstract locations that have been bound (by let, lambda, or letrec), and return the set of abstract locations that have been used but were not in the defined set. On entry to the (lambda (aloc ...) e), only the formal parameters (aloc ...) are considered bound.
procedure
p : lam-opticon-lang-v9? Explicitly annotate procedures with their free variable sets.
> (uncover-free `(module (letrec ([x.1 (lambda () (unsafe-procedure-call x.1))]) x.1)))
'(module
(letrec ((x.1 (lambda ((free (x.1))) () (unsafe-procedure-call x.1)))) x.1))
> (uncover-free `(module (letrec ([f.1 (lambda () (letrec ([x.1 (lambda () (unsafe-procedure-call x.1))]) x.1))]) f.1)))
'(module
(letrec ((f.1
(lambda ((free ()))
()
(letrec ((x.1
(lambda ((free (x.1)))
()
(unsafe-procedure-call x.1))))
x.1))))
f.1))
After we know the free variables, we make closures explicit.
Strictly speaking, all the previous languages had
closures—
Below, we define Closure-lang v9.
p | ::= | (module value) | ||
info | ::= | ((free (aloc ...)) any ...) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (closure-ref value value) | |||
| | (closure-call value value ...) | |||
| | (call unsafe-procedure-call value value ...) | |||
| | (letrec ([label aloc (lambda info (aloc ...) value)] ...) value) | |||
| | (cletrec ([aloc (make-closure label value ...)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
label | ::= | label? |
p | ::= | (module value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (closure-ref value value) | |||
| | (closure-call value value ...) | |||
| | (call value value ...) | |||
| | (letrec ([label (lambda (aloc ...) value)] ...) value) | |||
| | (cletrec ([aloc (make-closure label value ...)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Closure conversion changes letrec to bind labels to procedure
code.
After this pass, the body of lambda does not contain any
free variables, and will not be a procedure data type—
To encode closures, we temporarily add a new data type for closures, which we compile to a lower-level data type. We add a new form, cletrec, which only binds closures. Closures can, in general, have mutually recursive references, so this is a variant of the letrec form. We also add a new form for dereferencing the value of free variables from the closure (closure-ref e e).
We assume that the cletrec form only ever appears as the body of a letrec form, but we do not make this explicit in the syntax for readability. This assumption is not necessary for correctness, but simplifies an optimization presented later.
We add call, the primitive operation for calling a label directly, to enable optimizing closures, an important optimization.
procedure
p : lam-free-lang-v9? Performs closure conversion, converting all procedures into explicit closures.
If the operator is already an aloc, we can avoid introducing an extra let:
`(unsafe-procedure-call ,aloc ,values ...) => `(closure-call ,aloc ,aloc ,values ...) This also simplifies the optimization optimize-known-calls.
Closures can cause a lot of indirection, and thus performance penalty. We essentially transform all calls into indirect calls. This causes an extra memory dereference and indirect jump, both of which can have performance penalties.
`(closure-call ,value ,values ...) => `(call ,label ,values ...)
`(letrec ([,label_c ,lam]) (cletrec ([,aloc_c (make-closure ,label_c ,values ...)]) ,value))
procedure
p : closure-lang-v9? Optimizes calls to known closures.
Now that all lambdas are closed and labelled, we can lift them to top-level defines.
We define Hoisted-lang v9 below.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (closure-ref value value) | |||
| | (closure-call value value ...) | |||
| | (call value value ...) | |||
| | (letrec ([label (lambda (aloc ...) value)] ...) value) | |||
| | (cletrec ([aloc (make-closure label value ...)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (closure-ref value value) | |||
| | (closure-call value value ...) | |||
| | (call value value ...) | |||
| | (cletrec ([aloc (make-closure label value ...)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | unsafe-procedure-arity | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
The only difference is the letrec is remove and define blocks are re-added.
procedure
p : closure-lang-v9? Hoists code to the top-level definitions.
Now we implement closures as the procedure data structure described earlier. Our procedure data structure represents a value that can be called, and has some associated information useful for dynamic type checking.
A procedure looks like an extension of a vector. It has at least three fields: the label, the arity, and a size. The size indicates how large the environment of the procedure is. The environment will be uninitialized after make-procedure, and instead the environment will be initialized manually using unsafe-procedure-set!, similar to vector initialization. As with closures, unsafe-procedure-label dereferences the label and unsafe-procedure-ref dereferences a value from the procedure’s environment, given an index into the environment. However, we also have unsafe-procedure-arity to dereference the arity of a procedure.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (closure-ref value value) | |||
| | (closure-call value value ...) | |||
| | (call value value ...) | |||
| | (cletrec ([aloc (make-closure label value ...)] ...) value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | make-procedure | |||
| | unsafe-procedure-arity | |||
| | unsafe-procedure-label | |||
| | unsafe-procedure-ref | |||
| | unsafe-procedure-set! | |||
label | ::= | label? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | make-procedure | |||
| | unsafe-procedure-arity | |||
| | unsafe-procedure-label | |||
| | unsafe-procedure-ref | |||
| | unsafe-procedure-set! | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
- Transform make-closure
- Transform closure-ref.
`(closure-ref ,c ,i) => `(unsafe-procedure-ref ,c_ ,i) We can use unsafe-procedure-ref since we generate all uses of closure-ref. - Transform closure-call.
`(closure-call ,c ,values ...) => `(call (unsafe-procedure-label ,c) ,values ...) Recall that closure-call is generated from an unsafe-procedure-call, so it is equally safe to call unsafe-procedure-label.
procedure
p : hoisted-lang-v9 Implement closures in terms of the procedure data structure.
Finally, we need to implement procedure data type. It is intentionally designed to be similar to the vector data type.
The target language is Exprs-bits-lang v8, which is unchanged from the previous chapter.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (mref value value) | |||
| | (alloc primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! value value value) | ||
| | (primop value ...) | |||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | procedure? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
| | make-procedure | |||
| | unsafe-procedure-arity | |||
| | unsafe-procedure-label | |||
| | unsafe-procedure-ref | |||
| | unsafe-procedure-set! | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (mref value value) | |||
| | (alloc value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! value value value) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
When implementing make-procedure, we assume the size of the environment is a fixnum constant, since this is guaranteed by how our compiler generates make-procedure
procedure
p : proc-exposed-lang-v9?