1.13 Recursive Data

1.13.1 Preface: What’s wrong with Exprs-Lang v10

This Chapter still needs too much work. Contains too many hacks.

Exprs-lang v9 gave us the ability to write first-class lexically scoped local procedures, a powerful tool for abstracting over computation. However, without the ability to write recursive data, we could only write simple non-recursive local procedures. This makes implementing local loops annoying; we had to go through top-level definitions instead. By adding recursive data, we can implement first-class lexically-scoped recursive procedures (generalized loops), but also encode other cyclic data such as streams.

Many languages support cyclic data through imperative mutation, but as we’ll see in this assignment, that’s a low-level implementation technique that the user need not be required to use. Instead, we can express cyclic data more abstractly through recursion and suspended computation. Unfortunately, the only suspended computation primitive we have is lambda, and our surface language is call-by-value, so this still requires a little encoding.

This week, we’ll add letrec to the surface language, defining Racketish Core v10, a tiny Racket-like core language. It’s missing a few key features from Racket, but it’s a substantial start.

Most of Racket is implemented using its macro system, thereby extending Racket in Racket, which all elaborates to a simple core language.

p	::=	(module (define x value (lambda (x ...) value)) ... value)

value	::=	triv
	\|	(call value value ...)
	\|	(let ([x value] ...) value)
	\|	(letrec ([x value] ...) value)
	\|	(if value value value)
	\|	(call value value ...)

triv	::=	x
	\|	fixnum
	\|	#t
	\|	#f
	\|	()
	\|	empty
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) value)

x	::=	name?
	\|	prim-f

prim-f	::=	binop
	\|	unop

binop	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

prim-f *	::=	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	unop
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

We add letrec, which we saw in the previous chapter, to the surface language. Unlike the previous version, this letrec is not restricted to bind only procedures. This apparently simple change has deep consequences.

In the surface language, letrec is restricted to only bind each name once in a single letrec. A letrec binds all names in a single mutually recursive scope, so later bindings are accessible to earlier bindings, and thus it wouldn’t make sense to have the same name bound multiple times in the same scope.

1.13.2 Compiling Recursion to Back-Patching Mutation

The rest of our compiler assumes that letrec only binds procedures. Compiling first-class procedures is complicated for reasons others than recursion, and keeping them separated is useful. To maintain this separation and avoid extending the closure conversion passes unnecessarily, we can purify letrec, by compiling all non-procedure recursive bindings into a lower-level implementation of recursion.

Consider following Racket example:

Example:

> (letrec ([x.1 (cons 1 (lambda () x.1))])
(+ (car x.1) (car ((cdr x.1)))))
2

x.1 is an encoding of an infinite stream of 1s. The encoding binds x.1 not to a procedure, but to a pair that has a delayed self-reference to x.1, where suspended computation is implemented using a first-class procedure.

To translate a recursive data structure into a let-bound data structure, we need mutable variables. We add mutable variables to the target language, and transform the above example into:

Example:

> (let ([x.1 (void)])
    (let ([x.2 (cons 1 (lambda () x.1))])
      (begin
        (set! x.1 x.2)
        (+ (car x.1) (car ((cdr x.1)))))))
2

This implements recursion through back-patching; we create an uninitialized variable, x.1, and reference this in a suspended computation before it is initialized. Then we initialize x.1 to the value that contains a pointer x.1. After that, later uses of x.1 can call that suspended computation, which will get the right value.

We don’t want to do this transformation for every letrec, because set! will be more expensive to compile than what we can do for letrec-bound procedures.

We’ve actually already seen this transformation; we did it when implementing closures, although its details were somewhat obscured. Closures are a form of recursive data—they have a binding to themselves in their environment. When we implemented closures as procedures, we translated the cletrec form into a let followed by several unsafe-procedure-set! calls to tie the knot.

Unfortunately, since we haven’t exposed many imperative commands high in the compiler yet, we’ll have to use a bad abstraction to implement back-patching. We’ll compile our uninitialized values to vectors, and compile set! to vector-set!.

Design digression:
This is a hack. I just haven’t gotten around to writing a chapter on how to extend the surface language with set!.
But, you’ll sometimes find you’re forced to use such hacks when working with abstraction boundaries you have no control over.

1.13.3 Preliminary Passes

As usual, we don’t want to manipulate any bindings before we’ve uniquified, so first we modify uniquify.

Below we define Racketish-Unique.

p	::=	(module (define aloc x value) ... value)

value	::=	triv
	\|	(call value value ...)
	\|	(let ([aloc x value] ...) value)
	\|	(letrec ([alocx x value] ...) value)
	\|	(if value value value)

triv	::=	aloc
	\|	prim-f
	\|	x
	\|	fixnum
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) value)

x	::=	name?
	\|	prim-f

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

aloc	::=	aloc?

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc. But we need to pay attention to the mutually recursive scope of letrec.

procedure
(uniquify p) → racketish-unique?
p : racketish-core?
Resolves all lexical identifiers into unique abstract locations.

Next, we need to move define->letrec up, as the pass just after uniquify. This enables define to bind arbitrary expressions, but also simplifies our implementation of recursive binding.

procedure
(define->letrec p) → just-exprs-lang-v10?
p : racketish-unique-v10?
Transform all top-level bindings into local bindings.

This complicates implement-safe-primop since they introduce defines...
Want implement-safe* to be just after uniquify. They should get to take advantage of as many surface features as possible.

1.13.4 Resolving Recursive Binding

Our first goal is ensure that letrec only binds procedures.

We define Assigned-lang v10.

Diff vs SourceAssigned-lang v10

p	::=	(module (define aloc value) ... value)

effect	::=	(set! aloc value)

value	::=	triv
	\|	(call value value ...)
	\|	(letrec let ([aloc value] ...) value)
	\|	(let (info ((assigned (aloc ...)))) letrec ([aloc alocx value] ...) value)
	\|	(if value value value)
	\|	(begin effect ... value)

triv	::=	fixnum
	\|	prim-f
	\|	aloc
	\|	prim-f
	\|	fixnum
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (aloc x ...) e value)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

aloc	::=	aloc?

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

p	::=	(module value)

effect	::=	(set! aloc value)

value	::=	triv
	\|	(call value value ...)
	\|	(letrec ([aloc value] ...) value)
	\|	(let (info ((assigned (aloc ...)))) ([aloc value] ...) value)
	\|	(if value value value)
	\|	(begin effect ... value)

triv	::=	fixnum
	\|	prim-f
	\|	aloc
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (aloc ...) e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

In this language letrec only binds procedures. We also allow n-ary binding in let, to simplify the compiler pass. We add a single computation, (set! aloc e). Each let form is annotated with an info structure containing the list of alocs that are assigned in its body, and is used to keep track of mutable alocs.

To purify letrec to only bind procedures, we transform `(letrec ([,xs ,es] ...) ,e) into the following, partitioning the original bindings `([,xs ,es] ...).

`(let (info ((assigned ())))
   ([,xs_s ,es_s] ...)
   (let (info ((assigned (,xs_c ...))))
     ([,xs_c (void)] ...)
       (letrec ([,xs_l ,es_l] ...)
          (let (info ((assigned ())))
            ([,xs_t ,es_c] ...)
              (begin
                (set! ,xs_c ,xs_t) ...
                 ,e)))))

where

`([,xs_s ,es_s] ...) are all simple bindings
`([,xs_l ,es_l] ...) are all lambda bindings
`([,xs_c ,es_c] ...) are all complex bindings
`(,xs_t ...) are all fresh auxiliary alocs.

A binding `[,x ,e] is simple binding if the bound expression e contains no occurrences of the variables xs bound by the letrec expression and no applications unless the application is a prim-f with simple operands. We can also reduce compile time by considering letrec non-simple; otherwise, the simple check may be non-linear in the number of let-bindings. We may get better run-time performance by trying to check if a nested letrec is really simple, though.

The restriction on applications is to avoid problems if we ever add control-flow constructs, such as call/cc, to our language.

A binding `[,x ,e] is a lambda binding if e is literally a lambda value.

Otherwise, the binding is complex.

Complex bindings `[,x ,e] are compiled as follows. First, we set up an initial let-binding to x to (void). Then, we introduce an auxiliary binding `[,xt ,e]. In it, e now has the correct binding, referring to x. But x does not have the right value. So we finally mutate x to have the value of xt, so now both xt and x have the same value, and the right recursive binding structure. This implements recursion using back-patching mutation, a technique called Landin’s Knot.

procedure
(purify-letrec p) → assigned-lang-v10?
p : racketish-unique?
Compiles Racketish-Unique to Assigned-lang v10, transforming all letrec expressions to bind only procedures, and implementing recursive data structures through back-patching.

Previously, we did not expose set! beyond the backend of the compiler, so we must implement set!. We cannot simply expose set! from the backend of the compiler, because this version of set! behaves differently. It acts much more like Racket’s set-box!.

This set! is forcing the variable to act like a heap-allocated location that is implicitly dereferenced. In the backend of our compiler, set! simply moved a value from one location to another. That can’t be what happens here. Consider the Racket example from earlier:

(let ([x.1 (void)])
  (let ([x.2 (cons 1 (lambda () x.1))])
    (begin
      (set! x.1 x.2)
      (+ (car x.1) (car ((cdr x.1)))))))

If this set! had a move semantics, i.e., it simply moved values around, then first the value (void) would be moved into x.1, and next the value (cons 1 (lambda () (void))) would be moved into x.2, so we would never get a recursive data structure.

Instead, this set! is forcing the variable x.1 to act as a box (essentially, as a pointer), and any reference to x.1 acts as an implicit dereference of the box. The example would make a great deal more sense if written as:

(let ([x.1 (box (void))])
  (let ([x.2 (cons 1 (lambda () (unbox x.1)))])
    (begin
      (set-box! x.1 x.2)
      (+ (car (unbox x.1)) (car ((cdr (unbox x.1))))))))

So now, we need to implement set! and the assigned locations to give them a box-like semantics, rather than a move semantics.

We do this using vectors.

Vectors use more memory than we need. We could reduce memory usage by adding a primitive box data type. However, this requires adding a new tag, modifying complex passes, and has little pedagogical value.

Below we define Exprs-safe-lang v10. We no longer have set! or impure computation; everything must be let-bound once more.

Diff vs SourceExprs-safe-lang v10

p	::=	(module (define aloc value) ... value)

effect	::=	(set! aloc value)

value	::=	triv
	\|	(call value value ...)
	\|	(let letrec ([aloc value] ...) value)
	\|	(letrec let (info ((assigned (aloc ...)))) ([aloc value] ...) value)
	\|	(if value value value)
	\|	(begin effect ... value)

triv	::=	aloc
	\|	prim-f
	\|	fixnum
	\|	prim-f
	\|	aloc
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x aloc ...) value e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

aloc	::=	aloc?

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

p	::=	(module (define aloc value) ... value)

value	::=	triv
	\|	(call value value ...)
	\|	(let ([aloc value] ...) value)
	\|	(letrec ([aloc value] ...) value)
	\|	(if value value value)

triv	::=	aloc
	\|	prim-f
	\|	fixnum
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) value)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

aloc	::=	aloc?

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

Remember that this pass happens before implement-safe-primops, and the safe version of vector-set! must be let-bound since it might return an error value.

We translate each assigned abstract location into a vector, and each reference to an assigned location to a dereference from the vector, and each set! to a vector-set!. For example:

`(let (info ((assigned (x.1))))
      ([x.1 (void)])
   (let (info ((assigned ())))
     ([tmp.2 (call cons 1 (lambda () x.1))])
     (begin
       (set! x.1 tmp.2)
       x.1)))
=>
`(let ([x.1 (call make-vector 1)])
   (let ([tmp.2 (call cons 1 (lambda () (call vector-ref x.1 0)))])
     (let ([tmp.3 (call vector-set! x.1 0 tmp.2)])
       (call vector-ref x.1 0))))

procedure
(convert-assigned p) → exprs-safe-lang-v10?
p : assigned-lang-v10?
Implement assigned variables using lower-level heap-allocated mutable data.

contents ← prev up next →

1	Book
2	Project Logistics
3	Project Iterations
4	CPSC 411 Infrastructure Package
5	Credits

1.1	A Compiler Begins with a Language
1.2	Abstracting Boilerplate (v1)
1.3	Abstract Locations
1.4	Value Orientation
1.5	Register Allocation
1.6	Structured Control Flow
1.7	Procedural Abstraction: Call
1.8	Procedural Abstraction: Return
1.9	Algebraic Expressions
1.10	Data types: Immediates
1.11	Data types: Structured Data and Heap Allocation
1.12	First-class Procedures: Code is Data
1.13	Recursive Data
1.14	Syntactic Sugar
1.15	Appendix: Racket Preliminaries
1.16	Appendix: Compiler Design Recipe

1.13.1	Preface: What’s wrong with Exprs-Lang v10
1.13.2	Compiling Recursion to Back-Patching Mutation
1.13.3	Preliminary Passes
1.13.4	Resolving Recursive Binding