1.9 Algebraic Expressions
1.9.1 Preface: What’s wrong with Values-lang v6
(module (define fact (lambda (n) (if (= n 0) 1 (* n (fact (- n 1)))))) (fact 5))
(module (define fact (lambda (n) (if (= n 0) 1 (let ([y (- n 1)]) (let ([z (fact y)]) (* n z)))))) (fact 5))
As compiler designers, any tedious, systematic transformation should scream to us "this is a job for a compiler!".
Algebraic here refers to the fact that these expressions can be modeled mathematically as an algebraic structure—
they have a set of elements (the 64-bit intergers), and some operations on those elements (*, +, -, etc) that satsify various algebraic laws, including commutativity, associativity, etc. This allows the user of such expressions to reason algebraically, manipulating expressions according to those laws, instead of reasoning about execution steps of the underlying machine.
1.9.2 Algebraic Expressions
We design Exprs-lang v6.5, that allows algebraic expressions in most positions. Now, there is one top-level context: the value context. Nearly every position, aside from the operator in a call, and the predicate position of if, expects a value-producing expression. tail is eliminated, as the user of this language no longer needs to know about the distinction between value and tail context. The predicate position of if expressions is still restricted, since we cannot introduce algebraic expressions in predicate position without a run-time representation of their value, i.e., without booleans. Similarly, call must still be statically restricted until we can dynamically distinguish procedures from machine integers.
p | ::= | (module (define x (lambda (x ...) value tail)) ... value tail) | ||
pred | ::= | (relop triv triv) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([x value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([x value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call x triv ...) | |||
value | ::= | triv | ||
| | (binop value value triv triv) | |||
| | (let ([x value] ...) value) | |||
| | (if pred value value) | |||
| | (call x value triv ...) | |||
triv | ::= | x | ||
| | int64 | |||
x | ::= | name? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
pred | ::= | (relop triv triv) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([x value] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (let ([x value] ...) value) | |||
| | (if pred value value) | |||
| | (call x value ...) | |||
triv | ::= | x | ||
| | int64 | |||
x | ::= | name? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
we can see a significant change in the syntax. In essence, previously, value had to be explicitly bound by a let, and operands to our primitive operations had to be trivial. Now, operands can be arbitrarily nested expressions that appeared in value context.
Ignoring uniquify for a moment, consider how to implement this in terms of Values-unique-lang v6. Looking at this grammar, it may not be obvious how to transform this. We can see the structure of the pass more clearly if we expand the grammar slightly. Below, we define the expanded grammar Exprs-unique-lang v6.5/context. We typeset the differences compared to Values-unique-lang v6
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv value opand ...) | |||
value | ::= | triv | ||
| | (binop value value opand opand) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv value opand ...) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv value ...) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv value ...) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
Now we can see the the main difference, semantically, is replacing all trivial opands. Calls and binary operations can now take nested value expressions rather than opands. While this means we can collapse the syntax, separating the syntax semantically helps us see the true difference and see that the essence of the transformation is let-binding intermediate results to make all operands trivial.
`(let ([,x_1 ,(normalize e_1)]) (let ([,x_2 ,(normalize e_2)]) (call ,x_1 ,x_2)))
Design digression:The above examples is intuitively all we need to do, and would produce valid output. However, we can do a better job if we design each function around the template for the output language while processing the input language. In this way, instead of designing a function merely to process the input value, we design a function to output aloc, from some input. These functions describe in what context the current term is being processed.For example, consider transforming (lambda (aloc ...) value). We must produce a tail (from the input value). While we could do this by producing (let ([x value]) x), by designing a function that expects to produce something in tail context, we can recognize that any input is valid in tail, so no intermediate binding is required.
On the other hand, when processing (binop value_1 value_2), we want to process value_1 in opand context, since that’s the restriction in the target language. Since the function is designed around the output, it knows it can avoid binding certain inputs, but not others.
If we were following the template for the source, both of these would merely call a helper for values; instead, each of them should call a helper for the target context: tail or opand, respectively.
1.9.3 Implementing Algebraic Expressions
Implementing algebraic expressions via the above translation is quite easy, and we really could have done it any time before now. The primary reason we haven’t is because, so far, we haven’t needed them. We have been designing and writing compiler passes rather than programs, and most of our compiler passes necessarily needed to target low-level languages with low-level instructions, and not languages with algebraic expressions.
We’ll add a new pass, which we call this pass remove-complex-opera*, because it makes complex operands (and later, operators) trivial. This pass will be just after uniquify, so only new high-level abstractions will be able to take advantage of it.
We must still rule out many undefined uses of procedures, so we still forbid algebraic expressions in operator position, and require a validator to check procedure calls. The heuristics remain the same as in v5:check-values-lang.
procedure
(check-values-lang p) → exprs-lang-v6.5?
p : exprs-lang-v6.5? Validates that the Exprs-lang v6.5 is well bound and well typed: all procedure calls pass the correct number of arguments, and all binop and relop are never used on procedures.
As always, we implement uniquify. Collapsing the syntax and following the template for Exprs-lang v6.5 simplifies the design and implementation, without any cost.
procedure
p : exprs-lang-v6.5? Resolves top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.
Finally, we implement remove-complex-opera*, which transforms each complex operand by explicitly unnesting and let-binding all intermediate computation. We have to pick an evaluation order to do this, which is arbitrary. While this is not observable yet, it can quickly become observable if we ever add mutation, printing, or other kinds of statements to the source language. To ensure consistency, we decide that all operands should be evaluated in order from left to right.
This pass is simpler to design following the template for the source, but produces fewer unnecessary expressions if we follow the template for the target.
procedure
p : exprs-unique-lang-v6.5? Performs the monadic form transformation, unnesting all non-trivial operators and operands to binops, and calls making data flow explicit and simple to implement imperatively.
All the other passes remain completely unchanged from Procedural Abstraction: Return.