1.13 Recursive Data
1.13.1 Preface: What’s wrong with Exprs-Lang v10
This Chapter still needs too much work. Contains too many hacks.
Exprs-lang v9 gave us the ability to write first-class lexically scoped local procedures, a powerful tool for abstracting over computation. However, without the ability to write recursive data, we could only write simple non-recursive local procedures. This makes implementing local loops annoying; we had to go through top-level definitions instead. By adding recursive data, we can implement first-class lexically-scoped recursive procedures (generalized loops), but also enacode other cyclic data such as stream.
Many languages support cyclic data through imperative mutation, but as we’ll see in this assignment, that’s a low-level implementation technique that the user need not be required to use. Instead, we can express cyclic data more abstractly through recursion and suspended computation. Unfortunately, the only suspended computation primitive we have is lambda, and our surface language is call-by-value, so this still requires a little encoding.
Most of Racket is implemented using its macro system, thereby extending Racket in Racket, which all elaborates to a simple core language.
p | ::= | (module (define x value (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([x value] ...) value) | |||
| | (letrec ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | () | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
prim-f * | ::= | + | ||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | unop | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
We add letrec, which we saw in the previous chapter, to the surface language. Unlike the previous version, this letrec is not restricted to bind only procedures. This apparently simple change has deep consequences.
In the surface language, letrec is restricted to only bind each name once in a single letrec. A letrec binds all names in a single mutually recursive scope, so later bindings are accessible to earlier bindings, and thus it wouldn’t make sense to have the same name bound multiple times in the same scope.
1.13.2 Compiling Recursion to Back-Patching Mutation
The rest of our compiler assumes that letrec only binds procedures. Compiling first-class procedures is complicated for reasons others than recursion, and keeping them separated is useful. To maintain this separation and avoid extending the closure conversion passes unnnecessarily, we can purify letrec, by compiling all non-procedure recursive bindings into a lower-level implementation of recursion.
This implements recursion through back-patching; we create an uninitialized variable, x.1, and reference this in a suspended computation before it is initialized. Then we initialize x.1 to the value that contains a pointer x.1. After that, later uses of x.1 can call that suspended computation, which will get the right value.
We don’t want to do this transformation for every letrec, because set! will be more expensive to compile than what we can do for letrec-bound procedures.
We’ve actually already seen this transformation; we did it when implementing
closures, although its details were somewhat obscured.
Closures are a form of recursive data—
Design digression:This is a hack. I just haven’t gotten around to writing a chapter on how to extend the surface language with set!.But, you’ll sometimes find you’re forced to use such hacks when working with abstraction boundaries you have no control over.
1.13.3 Preliminary Passes
As usual, we don’t want to manipulate any bindings before we’ve uniquified, so first we modify uniquify.
Below we define Racketish-Unique.
p | ::= | (module (define aloc x value) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc x value] ...) value) | |||
| | (letrec ([alocx x value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | x | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc. But we need to pay attention to the mutually recursive scope of letrec.
procedure
(uniquify p) → racketish-unique?
p : racketish-core? Resolves all lexical identifiers into unique abstract locations.
Next, we need to move define->letrec up, as the pass just after uniquify. This enables define to bind arbitrary expressions, but also simplifies our implementation of recursive binding.
procedure
(define->letrec p) → just-exprs-lang-v10?
p : racketish-unique-v10? Transform all top-level bindings into local bindings.
This complicates implement-safe-primop since they introduce defines...
Want implement-safe* to be just after uniquify. They should get to take advantage of as many surface features as possible.
1.13.4 Resolving Recursive Binding
Our first goal is ensure that letrec only binds procedures.
p | ::= | (module (define aloc value) ... value) | ||
effect | ::= | (set! aloc value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (letrec let ([aloc value] ...) value) | |||
| | (let (info ((assigned (aloc ...)))) letrec ([aloc alocx value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | fixnum | ||
| | prim-f | |||
| | aloc | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc x ...) e value) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module value) | ||
effect | ::= | (set! aloc value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (letrec ([aloc value] ...) value) | |||
| | (let (info ((assigned (aloc ...)))) ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | fixnum | ||
| | prim-f | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
In this language letrec only binds procedures. We also allow n-ary binding in let, to simplify the compiler pass. We add a single computation, (set! aloc e). Each let form is annotated with an info structure containing the list of alocs that are assigned in its body, and is used to keep track of mutable alocs.
`(let (info ((assigned ()))) ([,xs_s ,es_s] ...) (let (info ((assigned (,xs_c ...)))) ([,xs_c (void)] ...) (letrec ([,xs_l ,es_l] ...) (let (info ((assigned ()))) ([,xs_t ,es_c] ...) (begin (set! ,xs_c ,xs_t) ... ,e)))))
The restriction on applications is to avoid problems if we ever add control-flow constructs, such as call/cc, to our language.
A binding `[,x ,e] is a lambda binding if e is literally a lambda value.
Otherwise, the binding is complex.
Complex bindings `[,x ,e] are compiled as follows. First, we set up an initial let-binding to x to (void). Then, we introduce an auxiliary binding `[,xt ,e]. In it, e now has the correct binding, referring to x. But x does not have the right value. So we finally mutate x to have the value of xt, so now both xt and x have the same value, and the right recursive binding structure. This implements recursion using back-patching mutation, a technique called Landin’s Knot.
procedure
(purify-letrec p) → assigned-lang-v10?
p : racketish-unique? Compiles Racketish-Unique to Assigned-lang v10, transforming all letrec expressions to bind only procedures, and implementing recursive data structures through back-patching.
Previously, we did not expose set! beyond the backend of the compiler, so we must implement set!. We cannot simply expose set! from the backend of the compiler, because this version of set! behaves differently. It acts much more like Racket’s set-box!.
(let ([x.1 (void)]) (let ([x.2 (cons 1 (lambda () x.1))]) (begin (set! x.1 x.2) (+ (car x.1) (car ((cdr x.1)))))))
(let ([x.1 (box (void))]) (let ([x.2 (cons 1 (lambda () (unbox x.1)))]) (begin (set-box! x.1 x.2) (+ (car (unbox x.1)) (car ((cdr (unbox x.1))))))))
So now, we need to implement set! and the assigned locations to give them a box-like semantics, rather than a move semantics.
Vectors use more memory than we need. We could reduce memory usage by adding a primitive box data type. However, this requires adding a new tag, modifying complex passes, and has little pedagogical value.
Below we define Exprs-safe-lang v10. We no longer have set! or impure computation; everything must be let-bound once more.
p | ::= | (module (define aloc value) ... value) | ||
effect | ::= | (set! aloc value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let letrec ([aloc value] ...) value) | |||
| | (letrec let (info ((assigned (aloc ...)))) ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | fixnum | |||
| | prim-f | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x aloc ...) value e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define aloc value) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (letrec ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
aloc | ::= | aloc? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Remember that this pass happens before implement-safe-primops, and the safe version of vector-set! must be let-bound since it might return an error value.
`(let (info ((assigned (x.1)))) ([x.1 (void)]) (let (info ((assigned ()))) ([tmp.2 (call cons 1 (lambda () x.1))]) (begin (set! x.1 tmp.2) x.1))) => `(let ([x.1 (call make-vector 1)]) (let ([tmp.2 (call cons 1 (lambda () (call vector-ref x.1 0)))]) (let ([tmp.3 (call vector-set! x.1 0 tmp.2)]) (call vector-ref x.1 0))))
procedure
(convert-assigned p) → exprs-safe-lang-v10?
p : assigned-lang-v10? Implement assigned variables using lower-level heap-allocated mutable data.