1.10 Data types: Immediates
1.10.1 Preface: What’s wrong with Exprs-lang v6.5
Exprs-lang v6.5 gained the ability to nest expressions pretty much arbitrarily. However, our functions are still limited to work on machine integers. A realistic language would allow us to express programs over more interesting data types than mere machine integers.
Unfortunately, once we add data types, we have a problem distinguishing between any two data types. Everything is eventually represented as 64-bit integers in x64. We need some way to distinguish a collection of 64 bits as an integer, from a boolean, from the empty list, from a pointer to a procedure. Otherwise, we will have serious problems: undefined behaviours, unsafe casts, and/or memory safety errors.
This is not only a problem for ruling out unsafe behaviour in the source language. A static type system could take care to prevent the user from calling an integer as a procedure, for example. However, the language itself may need to distinguish different kinds of data at run time. For example, a garbage collector may need to traverse data structures, but not immediate data like integers; a pretty printer may want to print different data differently.
To enable the language to distinguish different kinds of data, we can steal a few of our 64 bits to represent a data type tag. This limits the range of machine integers, but allows us to us to distinguish data dynamically, enabling safety and abstractions that must differentiate data dynamically.
Our goal in this assignment is to implement the following language, Exprs-lang v7.
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
pred | ::= | (relop triv triv) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([x value] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (let ([x value] ...) value) | |||
| | (if value pred value value) | |||
| | (call value x value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
triv | ::= | x | ||
| | int64 | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
We restrict ASCII characters to the printable ones, so we don’t have to figure out how to print non-printable characters. The run-time system will work with some non-printable characters, but the results will not be converted to Racket properly.
We want to allow Exprs-lang v7 programs to return any of these values, which requires additional support from the run-time system. The run-time system must be able to distinguish these different values in order to pretty print or otherwise return them to the user (such as via an exit code in the case of the error value). This means our run-time representation of values cannot just be 64-bit machine integers; we need some way for the run-time system to differentiate values dynamically.
The support libraries include an implementation of this new run-time system in cpsc411/ptr-run-time, which supports pretty printing all the new values. cpsc411/compiler-lib also contains parsers for the values. execute takes a new optional second parameter, which can be used to change your view of the result. By default, it parsers the printed value as a Racket value. You can pass nasm-run/print-string to get back the string representation of the result, which will be handy to view the result of (void) correctly. You can pass nasm-run/exit-code to get the exit code, which is helpful for viewing the result of (error uint8), which sets the exit code to uint8.
We also add new primitive operations, primarily predicates on our new data types. The interpretation of several operations will also change to add dynamic type checking. This will prevent those operations from being a source of undefined behaviour.
With proper booleans, we can finally allow an arbitrary value in the predicate position of an if expression in the surface language.
1.10.2 Introduction to Tagged Object Representation
The key challenge we need to solve in adding data types to our language is how to implement data-type checking. That is, given some value triv, branch on whether it is a particular type data type such as a boolean or a number: (if (fixnum? triv) value_1 value_2) and (if (boolean? triv) value_1 value_2) should be supported. Ideally, each data type is unique: (boolean? triv) and (fixnum? triv) should never both be true.
These don’t necessarily need to be dynamic checks, as long as they are somehow expressible by the compiler, whether as expressions that dynamically check or meta-data that is statically checked. However, even in a statically typed language, we may need some ability for the run-time system to distinguish different kinds of data dynamically.
Depending on how these type checks are implement, this alone is enough to let us implement different data types safely. We could use procedural abstraction to wrap existing binary operations to ensure, e.g., that + is only operating on numbers, and not booleans. We could expand if to check that the value in predicate position is both a boolean (that is, that (boolean? triv) is equal to some value using a relop), and that the value is equal to some representation of true.
Tagging, i.e., given some machine integer, tag it to indicate what data type it represents, producing a tagged representation of the underlying data. This tagged representation will happen to correspond to some machine integer, since all sequences of bits do, but maybe not in any meaningful way.
Untagging, i.e., given some tagged representation, remove the tag returning the underlying data that can be used with primitive x64 instructions.
Tag checking, i.e., given some tagged representation, get the tag or compare the tag to something.
Some of our clever tag choices, and some terminology, is borrowed from R. Kent Dybvig. You can learn more about it’s use in Chez Scheme from this talk by his former PhD student Andy Keep: https://www.youtube.com/watch?v=BcC3KScZ-yA.
Each data type in our language will now be represented by a ptr (pronounced like footer). A ptr is a machine word whose n least-significant bits represent the primary tag, and whose upper (- (* 8 (current-word-size-bytes)) n) bits represent the data.
In our setting, we have 64-bit words and will use the 3 least-significant bits
for primary tags.
With 3 bits, we can represent 8 primary data types in a ptr.
We want to reserve these for the most common and most performance critical data
that can fit in a ptr, but we have additional constraints.
This will limit the size of data we can represent, but gives us additional data
types—
If we want more than 8 data types, which most languages support, we must reserve one tag for "everything else" (a pointer to memory). The structure in memory has infinite space in which to store additional tags. This seems to leave us only 7 possible data types without going to memory.
Some data types require fewer than 64 bits, and we can exploit this fact to
gain a few extra data types.
For example, booleans really only require 1 bit.
The empty list and the void object only require a tag, since they do not contain
any data.
ASCII characters only require 8 bits.
We can make all of these share a single primary tag, and steal some of
the unnecessary high-order bits for a secondary tag.
This gives us 4 data types for the price of only 1 primary tag—
This leaves us 6 primary tags. One obvious choice is to use one for fixed sized integers; these will now be 61-bit integers, which we call fixnums. Integer operations are fast, and we don’t want to slow them down by making them go through heap allocation and pointer indirection. If we wanted to support graphics and scientific operations, we would reserve one for floating-point numbers too.
We reserve the remaining primary tags for tagged pointers, pointers to common heap-allocated structures. Storing the tag on the pointer instead of in memory avoids the expense of an additional memory access to check a tag and the additional memory overhead of storing the tag in memory. We’ll address the implementation of tagged pointers and heap-allocated structured data in the next chapter.
Racket supports binary literals, and automatically interprets them as two’s complement integers. If you type #b111, you get back 7, and the two values are equal?. This will be helpful when writing this assignment and writing tests.
#b00000110, for #f
#b00001110, for #t
#b00010110, for empty
#b00011110, for (void)
#b00101110, for an ASCII character
#b00111110, for the error exit-code value
For annoying technical reasons, our representation of the empty list differs slightly from Racket. We use empty, while Racket uses '().
The integer 7 is #b111 in base 2, and would be represented, in base 2, as the ptr #b111000. The three low-order bits are the tag #b000, and the high-order bits are the data #b111 (implicitly padded with 0 to 64-bits).
> (arithmetic-shift 56 -3) 7
All these notes are typeset using base-2 literals, but Racket cannot distinguish them from integers, so they may get accidentally rendered in base 10.
A handy fact about the choice of tag for fixnums is that any number n is represented as (* 8 n). This fact allows for implementing fast fixnum operations on the ptr representation directly. That is, we can implement binary operations on fixnums without detagging followed by tagging, by relying on some algebraic facts about operations on integers.
See https://en.wikipedia.org/wiki/ASCII#Printable_characters for the representation of ASCII characters.
> (integer->char (arithmetic-shift 14126 -8)) #\7
> (eq? (bitwise-and 56 7) 0) #t
> (eq? (bitwise-and 14126 7) 0) #f
> (eq? (bitwise-and 14126 255) 46) #t
The representation of booleans is a bit tricky. We can think of booleans either as the 61-bit values 0 and 1, with tag #b110, or as two separate secondary tags (with no payload) #b00001110 and #b00000110. I typically think in the the latter interpretation, where we represent #t and #f as separate data types, whose tags are #b110 (6) and #b1110 (14).
> (eq? (bitwise-and 14 247) 6) #t
> (eq? (bitwise-and 6 247) 6) #t
> (eq? (arithmetic-shift 6 -3) 0) #t
> (eq? (arithmetic-shift 14 -3) 0) #f
> (eq? 6 6) #t
> (eq? 14 6) #f
The representation of error, which represents an exit code, is inefficient. We only require 8 bits of data for the error code, so 24 bits are wasted. But we’re not adding more data types with secondary tags, so it is not worth over-engineering. Besides, a better error data type would use at least a string payload, which requires allocating space on the heap anyway.
The particular choice of tags is not important for correctness, although clever choices like above can help us implement common operations more efficiently. We therefore should introduce abstractions to keep the compiler abstract with respect to particular tag choices, so we can change our minds later if a better representation occurs to us.
To implement ptrs, we need bitwise operations such as arithmetic-shift so that we can implement tag checking, tagging, and untagging. This means we need to expose bitwise operations from x64, all the way through the compiler pipeline. This is not very interesting, so we begin by implementing tagged objects in terms of these, and then extending the rest of the compiler to expose bitwise operations.
1.10.3 Extending the source and front-end passes
We start by discussing the design of Exprs-lang v7.
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
pred | ::= | (relop triv triv) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([x value] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (let ([x value] ...) value) | |||
| | (if value pred value value) | |||
| | (call value x value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
triv | ::= | x | ||
| | int64 | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
> (interp-exprs-lang-v7 '(module (let ([* 1] [- 2]) (call + * -)))) 3
> (interp-exprs-lang-v7 '(module (call * #f #\y)))
Dynamic type error in *
*: contract violation
expected: number?
given: #f
'(error 255)
We could distinguish procedure calls from primitive operations, and implement primitive operations by expanding to expressions, but this risks duplicating code (although, not much code). But, at some point, primitive operations will need to be wrapped as procedures if we want to use them with functional abstractions, so we might as well do it now, and leave some other pass to implement procedure inlining.
> (interp-exprs-lang-v7 '(module (call fixnum? #t))) #f
> (interp-exprs-lang-v7 '(module (call fixnum? 5))) #t
These choices mean we can change our validator, check-exprs-lang. We no longer need to type check the arguments (previously, operands) to primitive operations, except to call, which still must call a statically known procedure (since we don’t have a tagged representation of procedures, yet). Instead, we statically allow expressions like (call * #f #\y), but expect this to raise a dynamic error.
procedure
p : any/c Checks that a Exprs-lang v7 program is well typed (only that procedures are called with the right number of arguments), and well scoped.
We extend uniquify as usual. Below, we design Exprs-unique-lang v7.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc e] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (call value triv value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value pred value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
prim-f | ::= | binop | ||
| | unop | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | < | |||
| | eq? | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? | ||
int64 | ::= | int64? |
p | ::= | (module (define label x (lambda (aloc x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | x | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | < | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | < | |||
| | eq? | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
As usual, we transform names into either abstract locations or labels. Data types do not complicate this.
procedure
p : exprs-lang-v7? Resolves top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.
And now, we figure out where to place our new passes.
As we saw in Introduction to Tagged Object Representation, many of the operations we want to perform on ptrs are easily expressed as algebraic expressions. For example, the expression (call fixnum? 7) is compiled to as (= (bitwise-and 7 7) 0).
Thankfully, we just added algebraic expressions. If we position our new passes above remove-complex-opera*, we don’t need to manually introduce auxiliary variables, or generate code with additional let-bindings. The compiler will do this for us, and allow us to write the code we want to write.
1.10.4 Specifying Data Type Representation
First we design Exprs-unsafe-data-lang v7. In this language, we compile our dynamically typed safe primitive procedures into lower level primitive operations on data types.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | < | |||
| | eq? | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
To make clear the distinction between the source and target implementations of these operations, we prefix the unsafe operators that require a dynamic check with unsafe-. call is still unsafe, since we do not know how to tag procedures yet.
(module (call + 1 2))
(module (define L.+.1 (lambda (tmp.3 tmp.4) (if (fixnum? tmp.4) (if (fixnum? tmp.3) (unsafe-fx+ tmp.3 tmp.4) (error 2)) (error 2)))) (call L.+.1 1 2))
Each safe procedure should produce some error code indicating that kind of error that occurred. Be sure to document your error codes. For now, we have enough error codes that we can encode which operation and which argument caused the error.
procedure
p : exprs-unique-lang-v7?
Design digression:To avoid producing unnecessary definitions, you may consider how to design this pass so that a definition for a primitive procedure is only added to the module if that procedure is actually used.You may also consider an alternative implementation of this pass that avoids inserting dynamic checks at all. Note that the the language definitions allow this behaviour for all well-typed programs.
Next we design Exprs-bits-lang v7, a language with all values represented as 64 bits and only primitive bitwise operations, but that allows algebraic expressions in most positions. The predicate position of if is again restricted to the predicate sublanguage.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
This is the hard part of adding data types—
First, we translate each value literal to a ptr.
You can find some parameters for ptrs defined in cpsc411/compiler-lib.
(bitwise-ior (arithmetic-shift (char->integer #\x) 8) #b00101110)
(bitwise-ior (arithmetic-shift 7 3) #b000)
> (bitwise-ior (arithmetic-shift (char->integer #\x) 8) 46) 30766
> (integer->char (arithmetic-shift (bitwise-ior (arithmetic-shift (char->integer #\x) 8) 46) -8)) #\x
> (= (bitwise-ior (arithmetic-shift 7 3) 0) (arithmetic-shift 7 3)) #t
For operations on fixnums, we can use some handy algebraic facts to avoid some tagging and untagging. Recall that every fixnum n is represented by a ptr whose value is (* 8 n). For + and -, this means we don’t need to do anything at all, since 8x + 8y = 8(x + y), and similarly 8x - 8y = 8(x - y). Similarly, <, <=, >, >=, and = can all work on ptrs directly without untagging. However, these are boolean operations in Exprs-unsafe-data-lang v7, so their implementation must return a boolean ptr.
Only * poses a problem, since 8x * 8y = 64(x * y). However, we do not need to adjust both arguments: we observe that 8x * y = 8(x * y), and similarly x * 8y = 8(x * y). We only need to shift one operand before performing * to get the correct result as a ptr. If either argument is constant, we can perform the shift at compile time, completely eliminating the additional overhead. Otherwise, we translate (* e_1 e_2) to (roughly) (* e_1 (arithmetic-shift-right e_2 3)).
(= value #,(current-false-ptr))
When translating the booleans unops and binops on ptrs, we need to produce something that the translation of if can consume. if is expecting a boolean value, so each unop should be translated to an expression that returns a boolean. As we saw earlier, type predicates are implemented by masking the ptr using bitwise-and, and comparing the result to the tag using =. But the target language = is a relop, not a boolean operation, so we translate (fixnum? value) to (if (= (bitwise-and value 7) 0) #t #f) (although, of course, #t and #f must be compiled to ptrs too).
procedure
p : exprs-unsafe-data-lang-v7?
1.10.5 Exposing Bitwise Operations
Specifying the tagged representation happens very close to our source language. We need to expose these bitwise operations all the way up to what was previously Values-unique-lang-v6. Below, we design the new Values-bits-lang v7, typeset with differences compared to Values-unique-lang-v6.
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
p | ::= | (module (define label (lambda (aloc ...) tail value)) ... tail value) | ||
pred | ::= | (relop opand opand value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
value | ::= | triv | ||
| | (binop opand opand value value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | aloc | |||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
The new operations do not have large effects on the language designs or compiler passes between remove-complex-opera* and generate-x64, so these details are left unspecified in this chapter. Redesigning the intermediate languages with the new operations is left as an exercise for the reader.
A specification of these languages is given in the support library, if you wish to use it.
procedure
p : exprs-bits-lang-v7? Performs the monadic form transformation, unnesting all non-trivial operators and operands to binops, calls, and relopss, making data flow explicit and and simple to implement imperatively.
1.10.6 Exposing Bitwise Operations in Paren-x64
sar loc, int
Perform arithmetic right-shift by int on loc.
This instruction requires that its second operand be an integer literal between 0 and 63, inclusive, i.e., 0 <= int <= 63. We will assume this constraint in the intermediate languages, and never expose this operation in Exprs-lang v7.
and loc, op
Compute the bitwise "and" of loc and op, storing the result in loc. Like with other binary operations, when op is an integer, it must be an int32, and when op is an addr, loc cannot also be an addr.
or loc, op
Compute the bitwise "inclusive or" of loc and op, storing the result in loc. Like with other binary operations, when op is an integer, it must be an int32, and when op is an addr, loc cannot also be an addr.
xor loc, op
Compute the bitwise "exclusive or" of loc and op, storing the result in loc. Like with other binary operations, when op is an integer, it must be an int32, and when op is an addr, loc cannot also be an addr.
We add each of these operations as a binop to Paren-x64 v7 below.
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
triv | ::= | trg | ||
| | int64 | |||
opand | ::= | reg | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
trg | ::= | reg | ||
| | label | |||
triv | ::= | trg | ||
| | int64 | |||
opand | ::= | reg | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
reg | ::= | rsp | ||
| | rbp | |||
| | rax | |||
| | rbx | |||
| | rcx | |||
| | rdx | |||
| | rsi | |||
| | rdi | |||
| | r8 | |||
| | r9 | |||
| | r10 | |||
| | r11 | |||
| | r12 | |||
| | r13 | |||
| | r14 | |||
| | r15 | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
dispoffset | ::= | dispoffset? | ||
label | ::= | label? |
procedure
(generate-x64 p) → (and/c string? x64-instructions?)
p : paren-x64-v7? Compile the Paren-x64 v7 program into a valid sequence of x64 instructions, represented as a string.