On this page:
1.12.1 Preface:   What’s wrong with Exprs-Lang v8?
1.12.2 Procedures, Closures and Closure Conversion
1.12.2.1 The First-class Procedure
1.12.2.2 The Closure
1.12.2.3 Closure Conversion
1.12.3 Administrative Passes
check-exprs-lang
uniquify
implement-safe-primops
implement-safe-call
define->letrec
optimize-direct-calls
dox-lambdas
1.12.4 Implementing Closure Conversion
uncover-free
convert-closures
optimize-known-calls
hoist-lambdas
implement-closures
specify-representation
1.12.5 Appendix:   Overview

1.12 First-class Procedures: Code is Data

1.12.1 Preface: What’s wrong with Exprs-Lang v8?

Actually, not much. With structured data types, Exprs-lang v8 is a pretty good language now.

Exprs-bits-lang v8 is sufficiently expressive to act as a reasonable compiler backend for many languages. It’s roughly at the abstraction level of C. It provides raw access to memory, pointers, labels (function pointers) and bitwise operations, but also non-tail calls and some nice syntactic properties like algebraic expressions.

Exprs-lang v8 adds safety on top of that language, although this safety does come at a cost. One major limitation in Exprs-lang v8 is the lack of abstractions over computation. We have lots of abstraction over data, but it’s common to want to abstract over computation—first-class procedures, objects, function pointers, etc. Exprs-lang v8 prevents even passing function pointers to ensure safety.

In this chapter, we add the ability to easily abstract over computations at any point via first-class procedures. Many languages provide some version of this—Python, JavaScript, Ruby, Racket, Scheme, Java, and many more. They enable the programmer to create a suspended computation, and pass it around as a value. The procedure closes over the environment in which it was created, capturing any free variables and essentially creating an object with private fields. They can be used as the foundations for object systems, to represent call backs or continuations, and provide a safe, lexically scoped alternative to function pointers.

In Exprs-lang v9, we add first-class procedures as values:

  p ::= (module (define x (lambda (x ...) value)) ... value)
     
  value ::= triv
  | (let ([x value] ...) value)
  | (if value value value)
  | (call value value ...)
     
  triv ::= x
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (x ...) value)
     
  x ::= name?
  | prim-f
     
  prim-f ::= binop
  | unop
     
  binop ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
     
  unop ::= fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
  p ::= (module (define x (lambda (x ...) value)) ... value)
     
  value ::= triv
  | (let ([x value] ...) value)
  | (if value value value)
  | (call value value ...)
     
  triv ::= x
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (x ...) value)
     
  x ::= name?
  | prim-f
     
  prim-f ::= binop
  | unop
     
  binop ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
     
  unop ::= fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

Now, lambda can appear in any expression. We can still define procedures at the top-level using define, although the semantics will change.

We add a new data structure to the language as well: procedures. These are implicitly constructed by lambda. They support two operations, in addition to call: the tag-checking predicate procedure? and procedure-arity, for inspecting how many parameters the procedure expects take.

This is a syntactically small change, but it has massive implications.

1.12.2 Procedures, Closures and Closure Conversion

So far, procedures in our language have been compiled directly to labeled codea suspended computation that is closed except for its declared parameters. We have not treated procedures as values, nor considered what happens if a procedure appears in value context. The closest representation of the value of a procedure we had was the label to its code. In earlier source languages, we disallowed passing procedures as values, to ensure safety.

1.12.2.1 The First-class Procedure

To support first-class procedures, we need to compile procedures to a data structure. This data structure allows us to construct a procedure value, pass it around and return it, call it (safety), and captures both the procedure’s code, but also any information we need about the procedure.

To add a new data structure, we need a new primary tag and a new collection of primitive operations. We use the tag #b010 for procedures.

Here is our updated list of tags:
  • #b000, fixnums, fixed-sized integers

  • #b001, pairs

  • #b010, (first-class) procedures

  • #b011, vectors

  • #b100, unused

  • #b101, unused

  • #b110, non-fixnum immediates (booleans, etc)

  • #b111, unused

In the source language, we expose the primitive operations procedure? and procedure-arity. However, the compiler intermediate languages expose a few more operations that the compiler needs to make use of to implement procedure calls.

Every instance of lambda compiles to a procedure. The procedure now has three pieces of information: its arity for dynamic checking; the label to its code, the computation it executes when invoked; and its environment, the values of the free variables used in the definition of the procedure. We compile each application of a procedure to dereference and call the label of the procedure, but also to pass a reference to the procedure itself as a parameter. Essentially, the procedure is an object, and receives itself as an argument. Each "free variable" x is a field of that object, and are compiled to references to self.x.

The low-level first-class procedure interface is described below:
  • (make-procedure value_label value_arity value_size)

    Creates a procedure whose label is value_label, which expects value_arity number of arguments, and has an environment of size value_size.

    make-procedure does not perform any error checking; it must be applied to a label and two fixnum ptrs. This is safe because no user can access make-procedure directly. Only the compiler generates uses of this operator, and surely our compiler uses it correctly.

    In the source language, make-procedure is not exposed directly; instead, lambda is compiled down to this primitive.

  • (unsafe-procedure-ref value_proc value_index)

    Return the value at index value_index in the environment of the procedure value_proc.

    As with all unsafe operators, this does not perform any checking.

    In the source language, unsafe-procedure-ref is not exposed directly. This is used to access variables outside of the procedure’s scope, but in scope at the time the procedure is created. We use this to implement closures, which we describe shortly.

  • (unsafe-procedure-set! e_proc e_index e_val)

    Set the value at index value_index in the environment of the procedure value_proc to be value_val.

    In the source language, unsafe-procedure-set! is not exposed directly.

  • (unsafe-procedure-label value_proc)

    Returns the label to the code for the procedure value_proc.

  • (call value_label value ...)

    Call the code whose label is value_label with the arguments value.

    This is essentially the same as the call primitive in previous chapters, although we now allow labels to be computed and passed as values. It is unsafe and with no dynamic checks, so some earlier pass must insert dynamic checks to ensure safety.

Our procedure data structure is essentially a vector containing a label to the code and the values of each free variable in its environment.

The challenge in implementing procedures is primarily in compiling lambda down to the procedure primitives, then specifying the representation of these procedure primitives in terms of calls to labelled code. All compiler passes below specify-representation remain unchanged.

Until now, all procedures were bound at the top-level in a set of mutually-recursive definitions. To work with first-class procedures in intermediate languages, we need to be able to represent sets of mutually recursive definitions that appear as expressions. We introduce the letrec construct to aid with this. (letrec ([aloc e] ...) e_2) binds each aloc in each e, including its own right-hand-side, as well as binding all alocs in e_2. For now, we only consider a restricted form of letrec that only binds procedures.

1.12.2.2 The Closure

Our procedure data structure implements a closure, a procedure’s code paired with the values of free variables from the environment in which the procedure was created. This allows us to create procedures that refer to variables outside of their own scope, but still retain references to those variables even when the procedure is passed to a different scope.

As an intermediate step in compiling first-class procedures, we introduce explicit closure primitives which compile to the procedure primitives. There is no primary tag for this data structure, since it will be implemented by the lower-level procedure data type.

Closures support two operations. First, you can call a closure with (closure-call e es ...), which essentially extracts the label from the closure e and calls the procedure at that label with the argument (es ...). Second, you can dereference an environment variable from the closure with (closure-ref e value_i), extracting the value at index value_i from the environment of the closure e.

Because we want to implement safe procedure application, we add a third field to the closure: its arity, the number of arguments expected by the code of the closure.

The closure interface is described below:
  • (make-closure value_label value_arity value_i ...)

    Creates a closure whose code is at label value_label, which expects value_arity number of arguments, and has the values value_i in its environment.

  • (closure-call value_c value ...)

    Safely call the closure value_c, invoking its code, with the arguments (value ...).

  • (closure-ref value_c value_i)

    Deference the value at index value_i in the environment of the closure value_c. Since this dereference is only generated by the compiler, it always succeeds and performs no dynamic checks. The environment is 0-indexed.

Each of these primitives compile down to the analogous procedure primitives.

1.12.2.3 Closure Conversion

The main problem with compiling first-class procedure is that we need to lift their code to the top-level, but they have references to free variables which go out of scope if we move the procedure definition. We deal with this by converting all procedures to closures, rebinding the free variables in the code as explicit dereferences from the closure’s environment, then lifting the now-closed code definitions to the top-level. This process is called closure conversion.

Closure conversion is not the only way to implement first-class procedures. An alternative that can avoid some of the allocation cost of closures is defunctionalization, but this does not work well with separate compilation.

Before we can perform closure conversion, we must discover which variables in a lambda are free with respect to the procedure’s scope. We first annotation all lambda with their free variable sets.
`(lambda (,alocs ...) ,value)
=>
`(lambda ,(info-set '() 'free (set-subtract (free-var value) alocs))
         (,alocs ...) ,value)
We add a pass to perform this just prior to closure conversion.

A variable is considered free in a scope if it is not in the set of variables bound by that scope, if it is referenced in any expression in which the scope binds variables, and if the reference is not bound. A variable is bound if it is referenced inside a scope for which it is declared in the set of variables bound by that scope.

In our languages, lambda, let, and letrec introduce new scopes. Calculating the free variables of an expression is relatively straightforward, but we have to be careful with the binding structures of letrec and let.

Note that all variables are bound, which is enforced by check-exprs-lang, but they can be free relative to a particular scope.

There are two parts to closure conversion:
  • Transform each lambda. Each lambda is transformed to take a new formal parameter, which is its closure, and to be bound to a label in its enclosing letrec. We can think of this as adding a this or self argument to each procedure.

    The abstract location to which the the lambda was previously bound must now be bound to a closure. The closure has n + 2 fields, where n is the number of free variables in the lambda. The first field is the label to which the closure’s code is bound. The final n fields are references to the lexical variables in the environment of the closure.

    In essence, we transform
    `(letrec ([,x (lambda ((free (,ys ...))) (,xs ...)  ,values)] ...)
       ,value)
    =>
    `(letrec ([,l (lambda (,c ,xs ...)
                    (let ([,ys (closure-ref ,c ,i)] ...)
                      ,value))] ...)
        (cletrec ([,x (make-closure ,l ,(length xs) ,ys ...)] ...)
          ,value))
    where l is a fresh label and c is a fresh abstract location. The cletrec form is like letrec but restricted to bind closures. We add the number of arguments as a field in the closure to implement safe application later.

  • Transform each call. Every procedure now takes an extra argument, its closure, so we have to expand each call. The essence of the translation is:
    `(call ,value ,values ...)
    =>
    `(let ([,x ,value])
       (closure-call ,x ,x ,values ...))
    We use closure-call to call the (label of the) closure to the closure itself and its usual arguments. We need to bind the operator to avoid duplicating code.

1.12.3 Administrative Passes

Allowing procedures to be bound in two different ways is great for programmer convenience, but annoying for a compiler writer. Before we get to implementing procedures, we simplify and regularize how procedures appear in our language.

The first big benefit to the programmer comes in check-exprs-lang. Since we finally have a procedure data type, and procedure primitives to enable dynamic checking, we can finally stop type checking programs. Now, it is valid and does not cause undefined behaviour to pass procedures as arguments, return procedures, or call an arbitrary variables with an arbitrary number of arguments. The language will dynamically check whether any of those expressions is safe before attempting to execute them

procedure

(check-exprs-lang p)  exprs-lang-v9

  p : any
Validates that input is a well-bound Exprs-lang v9 program. There are no other static restrictions.

As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc.

Unlike in previous versions, there are no labels after uniquify. All of our procedures are data, not merely code, and cannot easily be lifted to the top level yet, so it is now the job of a later pass to introduce labels.

Below we define Exprs-unique-lang v9. We typeset the changes with respect to Exprs-lang v9.

  p ::= (module (define aloc label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
     
  triv ::= aloc
  | label
  | prim-f
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  binop ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  p ::= (module (define aloc x (lambda (aloc x ...) value)) ... value)
     
  value ::= triv
  | (call value value ...)
  | (let ([aloc x value] ...) value)
  | (if value value value)
  | (call value value ...)
     
  triv ::= aloc
  | prim-f
  | x
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc x ...) value)
     
  x ::= name?
  | prim-f
     
  binop ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
     
  triv ::= aloc
  | prim-f
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  prim-f ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

procedure

(uniquify p)  exprs-unique-lang-v9?

  p : exprs-lang-v9?
Resolves all lexical identifiers into unique abstract locations.

Not much changes in implement-safe-primops. The target language of the pass, Exprs-unsafe-data-lang v9, is defined below.

  p ::= (module (define aloc label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  primop ::= binop
  | unop
     
  binop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
     
  primop unsafe-fx* ::= unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | unop
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | prim-f
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  prim-f ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

Note that this pass does not implement safe call, but can be safely applied to arbitrary data—a later pass will implement dynamic checking for application.

Implement safe primitive procedures by inserting procedure definitions for each primitive operation which perform dynamic tag checking, to ensure type and memory safety.

Now we implement call in terms of unsafe-procedure-call and unsafe-procedure-arity. Note that we cannot simply define call as a procedure, like we did with other safe wrappers, since it must count its arguments, and we must support a variable number of arguments to the procedure.

Next, we implement safe call. Below we define Exprs-unsafe-lang v9.

  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

We implement call in terms of procedure?, unsafe-procedure-call and unsafe-procedure-arity. The essence of the transformation is:
`(call ,e ,es ...)
=>
`(if (procedure? ,e)
     (if (eq? (unsafe-procedure-arity ,e) ,(length es))
         (unsafe-procedure-call ,e ,es ...)
         ,bad-arity-error)
     ,bad-proc-error)

If we statically track the arity of procedures, we can optimize this transformation in some cases.

Implement call as an unsafe procedure call with dynamic checks.

Some procedures now appear in local expressions, and some appear defined at the top-level. This presents two problems. First, our compiler later assumes that all data (as opposed to code) is locally defined—we have no way to define top-level, labelled data. Since procedures are data, we need to transform top-level bindings of procedures into local bindings, so the rest of the compiler will "just work". Second, our compiler later assumes that all code (as opposed to data) is globally defined—we expect code to appear only at the top-level. That’s not true when lambda is used in a local expression.

To deal with these, we introduce two small administrative passes: define->letrec and dox-lambdas.

First, in define->letrec, we elaborate define into a local binding form letrec, which will be used to bind all procedures.

letrec, unlike let, supports multiple bindings in a single form, and each bound expression can refer to any variable in the set of bindings for the letrec. This is important to capture mutually-recursive functions, and has the same binding structure as our top-level defines.

Design digression:
In general, a language might impose additional semantics on define, such as allowing defined data to be exported and imported at module boundaries. This would require additional handling of define, and the ability to generate labelled data in the back-end of the compiler. We continue to ignore separate compilation and linking, so we treat define as syntactic sugar for letrec.

Below we define Just-Exprs-lang v9.

  p ::= (module (define aloc (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
  p ::= (module value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

Transform all top-level bindings into local bindings.

Before we start compiling lambdas, we should try to get rid of them. Direct calls to lambdas, such as (call (lambda (x) x) 1), are simple to rewrite to a let binding, such as (let ([x 1]) x). A human programmer may not write this kind of code much, but most programs are not written by humans—compilers write far more programs. This optimization will speed-up compile time and run time for such simple programs.

Design digression:
This optimization is a special case of procedure inlining. A direct call to a procedure guarantees the procedure occurs exactly once, and is therefore completely safe to inline. This pass could be replaced by a more general inlining optimization.

Inline all direct calls to first-class procedures.

Next, we explicitly name all lambdas in dox-lambdas. The source language supports anonymous procedures, that is, first-class procedure values that are not necessarily bound to names. For example, we can write the following in Racket, creating and using procedures without ever binding them to names in a letrec or let form.

Example:
> ((lambda (x f) (f x x)) 1 (lambda (x y) (+ x y)))

2

The equivalent in Exprs-lang v9 is:

(call (lambda (x f) (call f x x)) 1 (lambda (x y) (call + x y)))

This is great for functional programmers, who value freedom, but bad for compilers who want to keep track of everything.

We bind all procedures to names to simplify lifting code to the top-level and assigning labels later.

We transform each `(lambda (,alocs ...) ,e) into `(letrec ([,tmp (lambda (,alocs ...) ,e)]) ,tmp), where tmp is a fresh aloc.

We define Lam-opticon-lang v9, in which we know the name of every procedure.

  p ::= (module value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (aloc ...) value)
  p ::= (module value)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

Explicitly binds all procedures to abstract locations.

1.12.4 Implementing Closure Conversion

The rest of our compiler expects procedures to be little more than labeled blocks of code. Unfortunately, now our first-class procedures can contain references to free variables in their lexical scope. This means we cannot simply lift first-class procedure definitions to the top-level, stick on a label, and generate a labeled procedure.

First, we uncover the free variables in each lambda. We add these as an annotation on the lambda, which the next pass uses to generate closures.

Below we define Lambda-free-lang v9.

  p ::= (module value)
     
  info ::= ((free (aloc ...)) any ...)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda info (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  p ::= (module value)
     
  info ::= ((free (aloc ...)) any ...)
     
  value ::= triv
  | (primop value ...)
  | (unsafe-procedure-call value value ...)
  | (letrec ([aloc (lambda info (aloc ...) value)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

To find the free abstract locations, we traverse the body of each lambda remembering any abstract locations that have been bound (by let, lambda, or letrec), and return the set of abstract locations that have been used but were not in the defined set. On entry to the (lambda (aloc ...) e), only the formal parameters (aloc ...) are considered bound.

Explicitly annotate procedures with their free variable sets.

The only complicated case is for letrec. Even a variable bound in a letrec is considered free in the body of a lambda.

Example:
> (uncover-free
   `(module
      (letrec ([x.1 (lambda () (unsafe-procedure-call x.1))])
        x.1)))

'(module

   (letrec ((x.1 (lambda ((free (x.1))) () (unsafe-procedure-call x.1)))) x.1))

However, the letrec does bind those variables, so they do not contribute to the free variable set for the context surrounding the letrec.

Example:
> (uncover-free
   `(module
      (letrec ([f.1 (lambda ()
                      (letrec ([x.1 (lambda () (unsafe-procedure-call x.1))])
                        x.1))])
        f.1)))

'(module

   (letrec ((f.1

             (lambda ((free ()))

               ()

               (letrec ((x.1

                         (lambda ((free (x.1)))

                           ()

                           (unsafe-procedure-call x.1))))

                 x.1))))

     f.1))

After we know the free variables, we make closures explicit.

Strictly speaking, all the previous languages had closuresprocedures that (implicitly) close over their lexical environment. However, our earlier languages forbid us from ever creating procedures that had a non-empty environment, so all our closures were trivial to compile to labelled code. Closure conversion is the process of compiling first-class procedures into an explicit closure data type.

Below, we define Closure-lang v9.

  p ::= (module value)
     
  info ::= ((free (aloc ...)) any ...)
     
  value ::= triv
  | (primop value ...)
  | (closure-ref value value)
  | (closure-call value value ...)
  | (call unsafe-procedure-call value value ...)
  | (letrec ([label aloc (lambda info (aloc ...) value)] ...) value)
  | (cletrec ([aloc (make-closure label value ...)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  label ::= label?
  p ::= (module value)
     
  value ::= triv
  | (primop value ...)
  | (closure-ref value value)
  | (closure-call value value ...)
  | (call value value ...)
  | (letrec ([label (lambda (aloc ...) value)] ...) value)
  | (cletrec ([aloc (make-closure label value ...)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  label ::= label?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

Closure conversion changes letrec to bind labels to procedure code. After this pass, the body of lambda does not contain any free variables, and will not be a procedure data type—it is just like a procedure from Values-lang v6.

To encode closures, we temporarily add a new data type for closures, which we compile to a lower-level data type. We add a new form, cletrec, which only binds closures. Closures can, in general, have mutually recursive references, so this is a variant of the letrec form. We also add a new form for dereferencing the value of free variables from the closure (closure-ref e e).

We assume that the cletrec form only ever appears as the body of a letrec form, but we do not make this explicit in the syntax for readability. This assumption is not necessary for correctness, but simplifies an optimization presented later.

We add call, the primitive operation for calling a label directly, to enable optimizing closures, an important optimization.

Performs closure conversion, converting all procedures into explicit closures.

If the operator is already an aloc, we can avoid introducing an extra let:
`(unsafe-procedure-call ,aloc ,values ...)
=>
`(closure-call ,aloc ,aloc ,values ...)
This also simplifies the optimization optimize-known-calls.

Closures can cause a lot of indirection, and thus performance penalty. We essentially transform all calls into indirect calls. This causes an extra memory dereference and indirect jump, both of which can have performance penalties.

Many calls, particularly to named functions, can be optimized to direct calls. We essentially perform the following transformation on all calls where we can determine the label of the operator:
`(closure-call ,value ,values ...)
=>
`(call ,label ,values ...)
where label is known to be the label of the closure value. Because value is already an aloc, we can safely discard it; we do not need to force evaluation to preserve any side effects.

We perform this optimization by recognizing letrec and cletrec as a single composite form:
`(letrec ([,label_c ,lam])
   (cletrec ([,aloc_c (make-closure ,label_c ,values ...)])
     ,value))
All uses of (closure-call aloc_c values ...) in value and lam can be transformed into (call label_c values ...). We have to recognize these as a single composite form to optimize recursive calls inside lam, which will benefit the most from the optimization. This relies on the name aloc_c binding in two places: once to define the closure, and once when dereferenced in a recursive closure.

Optimizes calls to known closures.

Now that all lambdas are closed and labelled, we can lift them to top-level defines.

We define Hoisted-lang v9 below.

  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (closure-ref value value)
  | (closure-call value value ...)
  | (call value value ...)
  | (letrec ([label (lambda (aloc ...) value)] ...) value)
  | (cletrec ([aloc (make-closure label value ...)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (closure-ref value value)
  | (closure-call value value ...)
  | (call value value ...)
  | (cletrec ([aloc (make-closure label value ...)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | unsafe-procedure-arity
     
  aloc ::= aloc?
     
  label ::= label?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

The only difference is the letrec is remove and define blocks are re-added.

procedure

(hoist-lambdas p)  hoisted-lang-v9?

  p : closure-lang-v9?
Hoists code to the top-level definitions.

Now we implement closures as the procedure data structure described earlier. Our procedure data structure represents a value that can be called, and has some associated information useful for dynamic type checking.

A procedure looks like an extension of a vector. It has at least three fields: the label, the arity, and a size. The size indicates how large the environment of the procedure is. The environment will be uninitialized after make-procedure, and instead the environment will be initialized manually using unsafe-procedure-set!, similar to vector initialization. As with closures, unsafe-procedure-label dereferences the label and unsafe-procedure-ref dereferences a value from the procedure’s environment, given an index into the environment. However, we also have unsafe-procedure-arity to dereference the arity of a procedure.

The language Proc-exposed-lang v9 is defined below.

  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (closure-ref value value)
  | (closure-call value value ...)
  | (call value value ...)
  | (cletrec ([aloc (make-closure label value ...)] ...) value)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | make-procedure
  | unsafe-procedure-arity
  | unsafe-procedure-label
  | unsafe-procedure-ref
  | unsafe-procedure-set!
     
  label ::= label?
  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  value ::= triv
  | (primop value ...)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if value value value)
  | (begin effect ... value)
     
  effect ::= (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | make-procedure
  | unsafe-procedure-arity
  | unsafe-procedure-label
  | unsafe-procedure-ref
  | unsafe-procedure-set!
     
  aloc ::= aloc?
     
  label ::= label?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?

To transform closures into procedures, we do a three simple translations:
  • Transform make-closure
    `(cletrec ([,aloc (make-closure ,label ,arity ,values ...)] ...)
       ,value)
    =>
    `(let ([,aloc (make-procedure ,label ,arity ,n)] ...)
       (begin
         (unsafe-procedure-set! ,aloc 0 ,(list-ref values 0))
         ...
         (unsafe-procedure-set! ,aloc ,n ,(list-ref values n))
         ,value))
    where n is (length values), the number of values in the environment.

  • Transform closure-ref.
    `(closure-ref ,c ,i)
    =>
    `(unsafe-procedure-ref ,c_ ,i)
    We can use unsafe-procedure-ref since we generate all uses of closure-ref.

  • Transform closure-call.
    `(closure-call ,c ,values ...)
    =>
    `(call (unsafe-procedure-label ,c) ,values ...)
    Recall that closure-call is generated from an unsafe-procedure-call, so it is equally safe to call unsafe-procedure-label.

Implement closures in terms of the procedure data structure.

Finally, we need to implement procedure data type. It is intentionally designed to be similar to the vector data type.

The target language is Exprs-bits-lang v8, which is unchanged from the previous chapter.

  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  pred ::= (relop value value)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
  | (begin effect ... pred)
     
  value ::= triv
  | (binop value value)
  | (mref value value)
  | (alloc primop value ...)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if pred value value value)
  | (begin effect ... value)
     
  effect ::= (mset! value value value)
  | (primop value ...)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | int64
     
  binop ::= *
  | +
  | -
  | bitwise-and
  | bitwise-ior
  | bitwise-xor
  | arithmetic-shift-right
     
  triv ::= aloc
  | label
  | fixnum
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
     
  primop ::= unsafe-fx*
  | unsafe-fx+
  | unsafe-fx-
  | eq?
  | unsafe-fx<
  | unsafe-fx<=
  | unsafe-fx>
  | unsafe-fx>=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | vector?
  | procedure?
  | cons
  | unsafe-car
  | unsafe-cdr
  | unsafe-make-vector
  | unsafe-vector-length
  | unsafe-vector-set!
  | unsafe-vector-ref
  | make-procedure
  | unsafe-procedure-arity
  | unsafe-procedure-label
  | unsafe-procedure-ref
  | unsafe-procedure-set!
     
  aloc ::= aloc?
     
  label ::= label?
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?
  p ::= (module (define label (lambda (aloc ...) value)) ... value)
     
  pred ::= (relop value value)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
  | (begin effect ... pred)
     
  value ::= triv
  | (binop value value)
  | (mref value value)
  | (alloc value)
  | (call value value ...)
  | (let ([aloc value] ...) value)
  | (if pred value value)
  | (begin effect ... value)
     
  effect ::= (mset! value value value)
  | (begin effect ... effect)
     
  triv ::= aloc
  | label
  | int64
     
  binop ::= *
  | +
  | -
  | bitwise-and
  | bitwise-ior
  | bitwise-xor
  | arithmetic-shift-right
     
  aloc ::= aloc?
     
  label ::= label?
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?

When implementing make-procedure, we assume the size of the environment is a fixnum constant, since this is guaranteed by how our compiler generates make-procedure

Compiles data types and primitive operations into their implementations as ptrs and primitive bitwise operations on ptrs.

1.12.5 Appendix: Overview

%3LxExprs-lang v9Lx->Lx check-exprs-langLyExprs-unique-lang v9Lx->Ly uniquifyLzExprs-unsafe-data-lang v9Ly->Lz implement-safe-primopsLz1Exprs-unsafe-lang v9Lz->Lz1 implement-safe-callLz2Just-exprs-lang v9Lz1->Lz2 define->letrecLz2->Lz2 optimize-direct-callsLz3Lam-opticon-lang v9Lz2->Lz3 dox-lambdasLz4Lam-free-lang v9Lz3->Lz4 uncover-freeLz5Closure-lang v9Lz4->Lz5 convert-closuresLz5->Lz5 optimize-known-callsLz6Hoisted-lang v9Lz5->Lz6 hoist-lambdasLz7Proc-exposed-lang v9Lz6->Lz7 implement-closuresL0Exprs-bits-lang v8Lz7->L0 specify-representationL1Values-bits-lang v8L0->L1 remove-complex-opera*L3Imp-mf-lang v8L1->L3 sequentialize-letL2Proc-imp-cmf-lang v8L3->L2 normalize-bindL4Imp-cmf-lang v8L2->L4 impose-calling-conventionsL5_1Asm-alloc-lang v8L4->L5_1 select-instructionsL5Asm-pred-lang v8L5_1->L5 expose-allocation-pointerL6Asm-pred-lang v8/localsL5->L6 uncover-localsL7Asm-pred-lang v8/undeadL6->L7 undead-analysisL8Asm-pred-lang v8/conflictsL7->L8 conflict-analysisL81Asm-pred-lang v8/pre-framedL8->L81 assign-call-undead-variablesL82Asm-pred-lang v8/framedL81->L82 allocate-framesL83Asm-pred-lang v8/spilledL82->L83 assign-registersL9Asm-pred-lang v8/assignmentsL83->L9 assign-frame-variablesL10Nested-asm-lang-fvars v8L9->L10 replace-locationsL10_1Nested-asm-lang v8L10->L10_1 implement-fvarsL11Block-pred-lang v8L10_1->L11 expose-basic-blocksL12Block-asm-lang v8L11->L12 resolve-predicatesL12_1Para-asm-lang v8L12->L12_1 flatten-programL15_1Paren-x64-mops v8L12_1->L15_1 patch-instructionsL14x64L15integerL14->L15 executeL16Paren-x64 v8L15_1->L16 implement-mopsL16->L14 generate-x64L16->L15 interp-paren-x64L17Paren-x64-rt v8L16->L17 link-paren-x64L17->L15 interp-loop

Figure 10: Overview of Compiler Version 0