1.9 Algebraic Expressions

1.9.1 Preface: What’s wrong with Values-lang v6

Values-lang v6 gained the ability to express non-tail calls, an important step forward in writing real programs. However, when writing programs, we have to manually sequence computation in an annoying and unnatural way. We cannot merely write the natural definition of factorial:

(module
  (define fact
    (lambda (n)
      (if (= n 0)
          1
          (* n (fact (- n 1))))))
  (fact 5))

In Values-lang v6, we’re forced to, manually, systematically, rewrite this to introduce a bunch of temporary names and bind all the intermediate computations:

(module
  (define fact
    (lambda (n)
      (if (= n 0)
          1
          (let ([y (- n 1)])
            (let ([z (fact y)])
              (* n z))))))
  (fact 5))

As compiler designers, any tedious, systematic transformation should scream to us "this is a job for a compiler!".

In this chapter, we add algebraic expressions—the ability to nest expressions arbitrarily, rather than manually sequence and name their values. We exclude predicate position, since predicates are still their own sub-language until we have a boolean data type.

Algebraic here refers to the fact that these expressions can be modeled mathematically as an algebraic structure—they have a set of elements (the 64-bit integers), and some operations on those elements (*, +, -, etc.) that satisfy various algebraic laws, including commutativity, associativity, etc. This allows the user of such expressions to reason algebraically, manipulating expressions according to those laws, instead of reasoning about execution steps of the underlying machine.

1.9.2 Algebraic Expressions

We design Exprs-lang v6.5, that allows algebraic expressions in most positions. Now, there is one top-level context: the value context. Nearly every position, aside from the operator in a call, and the predicate position of if, expects a value-producing expression. tail is eliminated, as the user of this language no longer needs to know about the distinction between value and tail context. The predicate position of if expressions is still restricted, since we cannot introduce algebraic expressions in predicate position without a run-time representation of their value, i.e., without booleans. Similarly, call must still be statically restricted until we can dynamically distinguish procedures from machine integers.

Diff vs v6Exprs-lang v6.5

p	::=	(module (define x (lambda (x ...) value tail)) ... value tail)

pred	::=	(relop triv triv)
	\|	(true)
	\|	(false)
	\|	(not pred)
	\|	(let ([x value] ...) pred)
	\|	(if pred pred pred)

tail	::=	value
	\|	(let ([x value] ...) tail)
	\|	(if pred tail tail)
	\|	(call x triv ...)

value	::=	triv
	\|	(binop value value triv triv)
	\|	(let ([x value] ...) value)
	\|	(if pred value value)
	\|	(call x value triv ...)

triv	::=	x
	\|	int64

x	::=	name?

binop	::=	*
	\|	+
	\|	-

relop	::=	<
	\|	<=
	\|	=
	\|	>=
	\|	>
	\|	!=

int64	::=	int64?

p	::=	(module (define x (lambda (x ...) value)) ... value)

pred	::=	(relop triv triv)
	\|	(true)
	\|	(false)
	\|	(not pred)
	\|	(let ([x value] ...) pred)
	\|	(if pred pred pred)

value	::=	triv
	\|	(binop value value)
	\|	(let ([x value] ...) value)
	\|	(if pred value value)
	\|	(call x value ...)

triv	::=	x
	\|	int64

x	::=	name?

binop	::=	*
	\|	+
	\|	-

relop	::=	<
	\|	<=
	\|	=
	\|	>=
	\|	>
	\|	!=

int64	::=	int64?

We can see a significant change in the syntax. In essence, previously, value had to be explicitly bound by a let, and operands to our primitive operations had to be trivial. Now, operands can be arbitrarily nested expressions that appeared in value context.

Ignoring uniquify for a moment, consider how to implement this in terms of Values-unique-lang v6. Looking at this grammar, it may not be obvious how to transform this. We can see the structure of the pass more clearly if we expand the grammar slightly. Below, we define the expanded grammar Exprs-unique-lang v6.5/context. We typeset the differences compared to Values-unique-lang v6

Diff vs v6Exprs-unique-lang v6.5/context

p	::=	(module (define label (lambda (aloc ...) tail)) ... tail)

pred	::=	(relop opand opand)
	\|	(true)
	\|	(false)
	\|	(not pred)
	\|	(let ([aloc value] ...) pred)
	\|	(if pred pred pred)

tail	::=	value
	\|	(let ([aloc value] ...) tail)
	\|	(if pred tail tail)
	\|	(call triv value opand ...)

value	::=	triv
	\|	(binop value value opand opand)
	\|	(let ([aloc value] ...) value)
	\|	(if pred value value)
	\|	(call triv value opand ...)

opand	::=	aloc
	\|	int64

triv	::=	opand
	\|	label

binop	::=	*
	\|	+
	\|	-

relop	::=	<
	\|	<=
	\|	=
	\|	>=
	\|	>
	\|	!=

aloc	::=	aloc?

label	::=	label?

int64	::=	int64?

p	::=	(module (define label (lambda (aloc ...) tail)) ... tail)

pred	::=	(relop opand opand)
	\|	(true)
	\|	(false)
	\|	(not pred)
	\|	(let ([aloc value] ...) pred)
	\|	(if pred pred pred)

tail	::=	value
	\|	(let ([aloc value] ...) tail)
	\|	(if pred tail tail)
	\|	(call triv value ...)

value	::=	triv
	\|	(binop value value)
	\|	(let ([aloc value] ...) value)
	\|	(if pred value value)
	\|	(call triv value ...)

opand	::=	aloc
	\|	int64

triv	::=	opand
	\|	label

binop	::=	*
	\|	+
	\|	-

relop	::=	<
	\|	<=
	\|	=
	\|	>=
	\|	>
	\|	!=

aloc	::=	aloc?

label	::=	label?

int64	::=	int64?

Now we can see the the main difference, semantically, is replacing all trivial opands. Calls and binary operations can now take nested value expressions rather than opands. While this means we can collapse the syntax, separating the syntax semantically helps us see the true difference and see that the essence of the transformation is let-binding intermediate results to make all operands trivial.

To transform this into Values-unique-lang v6, we need to perform the monadic-form translation. In essence, we recursively translate any nested expression that is not a value into one by let-binding all intermediate computations. For example, we would transform a call `(call ,e_1 ,e_2) into something like the following:

`(let ([,x_1 ,(normalize e_1)])
(let ([,x_2 ,(normalize e_2)])
(call ,x_1 ,x_2)))

If you’ve written any examples or tests in Values-lang v6, you’ve probably done this transformation by hand many times.

Design digression:
The above example is intuitively all we need to do, and would produce valid output. However, we can do a better job if we design each function around the template for the output language while processing the input language. In this way, instead of designing a function merely to process the input value, we design a function to output aloc, from some input. These functions describe in what context the current term is being processed.
For example, consider transforming (lambda (aloc ...) value). We must produce a tail (from the input value). While we could do this by producing (let ([x value]) x), by designing a function that expects to produce something in tail context, we can recognize that any input is valid in tail, so no intermediate binding is required.
On the other hand, when processing (binop value_1 value_2), we want to process value_1 in opand context, since that’s the restriction in the target language. Since the function is designed around the output, it knows it can avoid binding certain inputs, but not others.
If we were following the template for the source, both of these would merely call a helper for values; instead, each of them should call a helper for the target context: tail or opand, respectively.

1.9.3 Implementing Algebraic Expressions

Implementing algebraic expressions via the above translation is quite easy, and we really could have done it any time before now. The primary reason we haven’t is because, so far, we haven’t needed them. We have been designing and writing compiler passes rather than programs, and most of our compiler passes necessarily needed to target low-level languages with low-level instructions, and not languages with algebraic expressions.

We’ll add a new pass, which we’ll call remove-complex-opera*, because it makes complex operands (and later, operators) trivial. This pass will be just after uniquify, so only new high-level abstractions will be able to take advantage of it.

We must still rule out many undefined uses of procedures, so we still forbid algebraic expressions in operator position, and require a validator to check procedure calls. The heuristics remain the same as in v5:check-values-lang.

procedure
(check-values-lang p) → exprs-lang-v6.5?
p : exprs-lang-v6.5?
Validates that the Exprs-lang v6.5 is well bound and well typed: all procedure calls pass the correct number of arguments, and all binop and relop are never used on procedures.

As always, we implement uniquify. Collapsing the syntax and following the template for Exprs-lang v6.5 simplifies the design and implementation, without any cost.

procedure
(uniquify p) → exprs-unique-lang-v6.5?
p : exprs-lang-v6.5?
Resolves top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.

Finally, we implement remove-complex-opera*, which transforms each complex operand by explicitly unnesting and let-binding all intermediate computation. We have to pick an evaluation order to do this, which is arbitrary. While this is not observable yet, it can quickly become observable if we ever add mutation, printing, or other kinds of statements to the source language. To ensure consistency, we decide that all operands should be evaluated in order from left to right.

This pass is simpler to design following the template for the source, but produces fewer unnecessary expressions if we follow the template for the target.

procedure
(remove-complex-opera* p) → values-unique-lang-v6?
p : exprs-unique-lang-v6.5?
Performs the monadic form transformation, unnesting all non-trivial operators and operands to binops, and calls making data flow explicit and simple to implement imperatively.

All the other passes remain completely unchanged from Procedural Abstraction: Return.

1.9.4 Appendix: Overview

Figure 7: Overview of Compiler Version 6.5

contents ← prev up next →

1	Book
2	Project Logistics
3	Project Iterations
4	CPSC 411 Infrastructure Package
5	Credits

1.1	A Compiler Begins with a Language
1.2	Abstracting Boilerplate (v1)
1.3	Abstract Locations
1.4	Value Orientation
1.5	Register Allocation
1.6	Structured Control Flow
1.7	Procedural Abstraction: Call
1.8	Procedural Abstraction: Return
1.9	Algebraic Expressions
1.10	Data types: Immediates
1.11	Data types: Structured Data and Heap Allocation
1.12	First-class Procedures: Code is Data
1.13	Recursive Data
1.14	Syntactic Sugar
1.15	Appendix: Racket Preliminaries
1.16	Appendix: Compiler Design Recipe

1.9.1	Preface: What’s wrong with Values-lang v6
1.9.2	Algebraic Expressions
1.9.3	Implementing Algebraic Expressions
1.9.4	Appendix: Overview