1.11 Data types: Structured Data and Heap Allocation
1.11.1 Preface: What’s wrong with Exprs-Lang v7
Exprs-lang v7 gained proper data types, which is a huge step forward
in expressivity and high-level reasoning.
However, it still does not allow us to express structured data.
Real languages require structured heap-allocated data—
To express data larger than a single word, we need support from the low-level
languages to get access to locations larger than a single word in size.
All our locations so far, registers and frame locations, are only a single word
in size.
We need access to the heap—
Once we have the ability to allocate arbitrary space in heap memory, and pointers to that memory, we can add structured data types. We’ll use the same tagging approach introduced in the last chapter to tag pointers, so we can distinguish pointers to vectors vs pointers to lists, perform dynamic checking, etc.
Since our low-level languages don’t have any operations on memory address except frame variables, we’ll need to introduce new low-level abstraction. Our run-time system will also need to provide access to the heap memory in some way.
1.11.2 Implementing Structured Data
Pairs are constructed using (cons e_1 e_2). The predicate pair? should return #t when passed any value constructed this way, and #f for any other value—
(eq? (pair? (cons e_1 e_2)) #t). (car e) returns the value of the first element of the pair, and (cdr e) returns the value of the second element. As usual, we want to ensure safety in our source language. That is, (eq? (car (cons e_1 e_2)) e_1) and (eq? (cdr (cons e_1 e_2)) e_2), while (if (not (pair? e)) (eq? (cdr e) (error uint8)) #t). We do not need to worry about uninitialized values, since the only constructors ensures all parts of the pair are initialized.
Vectors are arrays that know their length. They are constructed using (make-vector e); the constructor takes the length of the vector as the argument. The predicate vector? should return #t for any value constructed this way, and #f for any other value—
(eq? (vector? (make-vector e)) #t). (vector-ref e_1 e_2) returns the value at index e_2 in the vector e_1. (vector-set! e_1 e_2 e_3) mutates the index e_2 in the vector e_1, setting its value to the value of e_3, and returning (void) To ensure safety, the appropriate arguments must be ensured to be vectors, the index arguments must be positive integers, and make-vector or vector-ref must work to ensure that we never get an uninitialized value. We’ll implement these with dynamic checks and by making make-vector initialize all values in the vector to 0.
These two are interesting, since one has a statically known size, while one has a dynamically determined size. Each of these properties requires attention in our compiler. We could add more, but these two are enough to demonstrate the general approach to compiling structured data.
To add immediate data, we needed three operations: tagging, untagging, and tag checking. We need all of these to implement structured data types and follow the same pattern as Data types: Immediates for them, but we also need three additional operations: memory allocation, dynamically computed memory assignment, and dynamically computed memory dereference.
First, to implement constructors, we need the ability to allocate memory. For now, we’ll assume the existence of some abstraction alloc that can do this (which we become responsible for implementing later). cons creates a pair, so it should allocate two words of space by producing `(alloc ,(current-pair-size)). make-vector allocates one word for the length, and then one word for every element of the vector. That is, it should allocate n+1 words for a vector of length n.
Second, to initialize and mutate structured data, we need the ability to assign to a memory location that is dynamically computed. With frame variables, we added displacement mode operands, which could statically offset from the frame base pointer (whose location is statically known), and we could use this as an operand to a set! instruction. However, for structured data, we need something more general. We want the ability to (1) use an arbitrary location storing the base address, since we are dynamically allocating structured data, and passing that pointer around as a value, so the address itself could end up in any register or frame variable and (2) a dynamically determined offset from that base address, since we may want pass as an argument the index into vector. This is quite different from the abstractions we used to add frame variables.
We’ll assume a new abstraction, (mset! value value value), where the first operand is an arbitrary expression that evaluates to a base pointer (and is assumed to be produced from alloc), the second operand is an expression that evaluates to an offset in bytes from that base pointer, and the third operand evaluates to the value to be stored in the memory location computed by adding the offset to the base. As usual, we become responsible for implementing this abstraction at some pointer further down the compiler pipeline.
(let ([x.1 (alloc 16)]) (begin (mset! x.1 0 value_1) (mset! x.1 8 value_2) x.1))
However, we must also tag the pointer as a pair, to create a ptr.
Previously, we used a combination of shifting and bitwise-and to tag data.
This limited the range of our immediate data.
To tag pointers, we need to similar limit the range of addresses that we get
pointers to, to ensure some bits are available as tags.
This is relatively easy to achieve—
(let ([x.1 (bitwise-ior (alloc 16) 1)]) (begin (mset! (bitwise-xor x.1 1) 0 value_1) (mset! (bitwise-xor x.1 1) 8 value_2) x.1))
(let ([x.1 (+ (alloc 16) 1)]) (begin (mset! x.1 -1 value_1) (mset! x.1 7 value_2) x.1))
Finally, we need some way to actually access memory locations to implement the
destructors for our data types.
This requires similar functionality to mset!—
In summary, our strategy is to add three intermediate abstractions that can be used to easily create new data structures for its surface language.
(alloc value) allocates a number of bytes specified by value and returns a pointer to the base address of those bytes.
(mref value_base value_index) dereferences the pointer at value_base with the offset specified by value_index. Thinking in terms of pointer arithmetic, this dereferences (+ value_base value_index). The value of (+ value_base value_index) should always be word-aligned, i.e., a multiple of 8, and point to a uninitialized heap allocated value.
(mset! value_base value_index value) stores the value of value in the address (+ value_base value_index), i.e., in the address given by pointer at value_base with the offset specified by value_index. The value of (+ value_base value_index) should always be word-aligned, i.e., a multiple of 8.
To implement these new memory operations, or mops (pronounced em ops), we need to expose additional features from x64 and from the run-time system. Namely, we need low-level pointer operations that are different from those for frame variables, and we need run-time support for getting aligned addresses from the operating system to enable tagging pointers efficiently.
We then reuse our object tagging approach from the previous chapter to tag, untag, and tag check pointers to our new structured data types.
1.11.3 Memory Allocation and Access in x64
To implement the memory allocation and access abstractions we’ve just introduced, we need additional support from x64 and from the run-time system.
To access memory, we expose the index-mode operand from x64. It is written QWORD [reg_base + reg_offset], and refers to the memory location whose base pointer is stored in the register reg_base and whose index is stored in the register reg_offset. Unlike displacement-mode operands, both components of this operand are registers, so both the base and offset can be computed dynamically. We also enable a generalization of the displacement-mode operand, so we can access statically known offsets from base pointers other than the frame base pointer. This generalized operand, QWORD [reg_base + int32], adds some static offset to a base pointer to compute an address. Note that unlike the version used to implement frame variables, we add to (rather than subtract from) the base pointer, and the index is not restricted to a multiple of 8.
mov r10, 0 |
mov QWORD [r12 + r10], 0 |
add QWORD [r12 + r10], 1 |
mov rax, QWORD [r12 + r10] |
To implement allocation, we need some strategy for managing memory. Our run-time system or compiler needs to know what memory is in use, how to allocate (mark free memory as in use), and how to deallocate memory (return in-use memory to the pool of free memory and ultimately to the system). There are many strategies for this, such as "let the user deal with it", "add a process to the run-time system that dynamically manages memory", or "make the type system so complex that the compiler can statically manage memory". Each of these has merits.
Design digression:In general, a language implementation might abstract access to the mmap system call for allocation, and implement a strategy (such as garbage collection) to deallocate memory that is no longer used. Garbage collection is tricky to implement so we avoid it for now to focus on implement structured data.For a quick introduction to garbage collection, see this short video https://twitter.com/TartanLlama/status/1296413612907663361?s=20.
To implement our allocation-only strategy, we need the run-time system to provide an initial base pointer from which we can start allocating. We reserve another register, the current-heap-base-pointer-register (abbreviated hbp). The run-time system initializes this register to point to the base of the heap, with all positive-integer indexes after this base as free memory. Allocation is implemented by copying the current value of this pointer, and incrementing it by the number of bytes we wish to allocate. The pointer must only be incremented by word-size multiples of bytes, to ensure the 3 low-order bits are 0 and the pointer can be tagged. Any other access to this register is now undefined behaviour, similar to accesses to fbp that do not obey the stack of frames discipline.
1.11.4 Implementing Structured Data
We design Exprs-lang v8 below. The language is large, as we include several new structured data types and their primitives.
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Since the number of primitive operations is growing, we simplify the syntax to only give primops, rather than distinguishing unops, binops, and so on, so we can easily group like primops with like.
As usual, we first task is to uniquify. Below we define the target language, Exprs-unique-lang v8.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | eq? | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref |
p | ::= | (module (define label x (lambda (aloc x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | x | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
triv | ::= | aloc | ||
| | label | |||
| | prim-f | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
procedure
(uniquify p) → exprs-unique-lang-v8
p : exprs-lang-v8 Resolves top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.
Next, we implement new safe primitive operations. All the accessors for the new data types can result in undefined behaviour if used on the wrong ptr. Similarly, vector reference can access undefined values if the vector is constructed but never initialized. We fix this by wrapping each primitive to perform dynamic checks and remove undefined behaviour.
We design the target language, Exprs-unsafe-data-lang v8, below.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
value | ::= | triv | ||
| | (primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (primop value ...) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | fixnum | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
primop | ::= | binop | ||
| | unop | |||
binop | ::= | unsafe-fx* | ||
| | unsafe-fx+ | |||
| | unsafe-fx- | |||
| | eq? | |||
| | unsafe-fx< | |||
| | unsafe-fx<= | |||
| | unsafe-fx> | |||
| | unsafe-fx>= | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
Note that in this language, we add expose effect context and begin. In the source, we have an effect procedure, namely vector-set!. The user must manually call impure functions and bind the result. The result of an effectful function is either void, or an error. The unsafe variant, unsafe-vector-set!, does not need to produce a value, so we expose effect context.
Introducing this contextual distinction, we requrie primops to respect contexts. Only primops that produce values can appear in value context, while only imperative primops (only unsafe-vector-set! so far) can be directly called in effect context. Otherwise, we’d need to supprot values in effect context sometime later, which we haven’t considered how to do yet.
To implement the safe language, we wrap all accessors to perform dynamic tag checking before using the unsafe operations. We also wrap unsafe-make-vector to initialize all elements to 0.
; Symbol x Symbol x (List-of Parameter-Types) ; The first symbol is the name of a function in the source language. ; The second is either the name of a primop or a label in the target language implementing the ; behaviour safely, assuming well-typed parameters. ; The third is list of predicates, one for each argument to the source ; function, to check the parameters with. `any?` is specially recognized to ; not be checked. (define prim-f-specs `((* unsafe-fx* (fixnum? fixnum?)) (+ unsafe-fx+ (fixnum? fixnum?)) (- unsafe-fx- (fixnum? fixnum?)) (< unsafe-fx< (fixnum? fixnum?)) (<= unsafe-fx<= (fixnum? fixnum?)) (> unsafe-fx> (fixnum? fixnum?)) (>= unsafe-fx>= (fixnum? fixnum?)) (make-vector ,make-init-vector-label (fixnum?)) (vector-length unsafe-vector-length (vector?)) (vector-set! ,unsafe-vector-set!-label (vector? fixnum? any?)) (vector-ref ,unsafe-vector-ref-label (vector? fixnum?)) (car unsafe-car (pair?)) (cdr unsafe-cdr (pair?)) ,@(map (lambda (x) `(,x ,x (any?))) '(fixnum? boolean? empty? void? ascii-char? error? pair? vector? not)) ,@(map (lambda (x) `(,x ,x (any? any?))) '(cons eq?))))
procedure
p : exprs-unique-lang-v8?
Finally, we must extend specify-representation to implement the new data structures and primitives. We design the target Exprs-bits-lang v8 below.
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (mref value value) | |||
| | (alloc value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! value value value) | ||
| | (begin effect ... effect) |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (mref value value) | |||
| | (alloc primop value ...) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! value value value) | ||
| | (primop value ...) | |||
| | (begin effect ... effect) | |||
primop | ::= | binop | ||
| | unop | |||
unop | ::= | fixnum? | ||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | vector? | |||
| | cons | |||
| | unsafe-car | |||
| | unsafe-cdr | |||
| | unsafe-make-vector | |||
| | unsafe-vector-length | |||
| | unsafe-vector-set! | |||
| | unsafe-vector-ref | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? |
p | ::= | (module (define label (lambda (aloc ...) value)) ... value) | ||
pred | ::= | (relop value value) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
value | ::= | triv | ||
| | (binop value value) | |||
| | (mref value value) | |||
| | (alloc value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! value value value) | ||
| | (begin effect ... effect) | |||
triv | ::= | aloc | ||
| | label | |||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? |
We implement the new data structures using the approach described earlier, combining tagging with the new allocation and memory abstractions. We compile each constructor, namely cons and unsafe-make-vector, to alloc plus tagging, producing the tagged pointer as the value. We compile each destructor to mref, detagging or statically adjusting the index to the pointer first. Initialization, done by cons, and mutation, done by unsafe-vector-set!, are both compiled to mset!.
procedure
p : exprs-unsafe-data-lang-v8?
> (specify-representation '(module (cons 5 6)))
'(module
(let ((tmp.1 (+ (alloc 16) 1)))
(begin (mset! tmp.1 -1 40) (mset! tmp.1 7 48) tmp.1)))
> (specify-representation '(module (unsafe-car (cons 5 6))))
'(module
(mref
(let ((tmp.2 (+ (alloc 16) 1)))
(begin (mset! tmp.2 -1 40) (mset! tmp.2 7 48) tmp.2))
-1))
> (specify-representation '(module (unsafe-vector-ref (unsafe-make-vector 3) 6)))
'(module
(mref
(let ((tmp.3 (+ (alloc 32) 3))) (begin (mset! tmp.3 -3 24) tmp.3))
53))
1.11.5 Front-end Extensions
We’ve added effect context and a new effect higher in the compiler pipeline than previously. This requires some attention to a few of our front-end passes.
Our next pass, remove-complex-opera*, is responsible for imposing left-to right evaluation for operands and arguments, and explicitly binding all algebraic expressions. We design Values-bits-lang v8 below.
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! aloc opand value) | ||
| | (let ([aloc value] ...) effect) | |||
| | (begin effect ... effect) | |||
opand | ::= | aloc | ||
| | int64 |
p | ::= | (module (define label (lambda (aloc ...) value tail)) ... value tail) | ||
pred | ::= | (relop value value opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
value | ::= | triv | ||
| | (binop value value opand opand) | |||
| | (mref value value) | |||
| | (alloc value) | |||
| | (call value value ...) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (begin effect ... value) | |||
| | (call triv opand ...) | |||
effect | ::= | (mset! value value value) | ||
| | (begin effect ... effect) |
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
| | (begin effect ... value) | |||
effect | ::= | (mset! aloc opand value) | ||
| | (let ([aloc value] ...) effect) | |||
| | (begin effect ... effect) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
aloc | ::= | aloc? | ||
label | ::= | label? |
We add an effect context to support
mset!, and a begin expression for
convenience.
Previously, all expressions in the Values-lang languages were pure—
To deal with this, we introduce a contextual distinction in the language. We add the non-terminal effect to represent an impure computation. A effect represents an expression that does not have a value, and is executed only for its effect. We can use effect in certain expression contexts using begin. If we’re already in an impure context, that is, in a effect, then we can freely nest other effects.
This contextual distinction is similar to the one we introduce to distinguish tail calls from non-tail calls.
Supporting effect context requires paying attention to order when designing remove-complex-opera*, but does not significantly complicate anything.
procedure
p : exprs-bits-lang-v8? Performs the monadic form transformation, unnesting all non-trivial operators and operands, making data flow explicit and and simple to implement imperatively.
Next, we transform into imperative instructions. Now that effects can appear on the right-hand side of a let expression, it MAY not longer be safe to reorder them. This is a design choice: we could make it clear to the programmer that let does not guarantee a particular order of evaluation for its bindings, but then effects on the right-hand side lead to undefined behaviour. Or, we could impose a particular order, such as left-to-right, forbidding possible optimizations. A middle ground is to impose such an order only if any effects are detected in the right-hand side of a let (or rather, if we can guarantee no effects are present, because Rice still does not let us know for sure).
As usual, we choose to forbid exposing undefined behaviour to the source language, so design sequentialize-let to impose left-to-right (unless we know its safe to reorder).
We design the target language, Imp-mf-lang v8, below.
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (call triv opand ...) | |||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (begin effect ... value) | |||
| | (if pred value value) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand value) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) |
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (let ([aloc value] ...) pred) | |||
| | (if pred pred pred) | |||
| | (begin effect ... pred) | |||
tail | ::= | value | ||
| | (let ([aloc value] ...) tail) | |||
| | (if pred tail tail) | |||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (call triv opand ...) | |||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (begin effect ... value) | |||
| | (let ([aloc value] ...) value) | |||
| | (if pred value value) | |||
| | (call triv opand ...) | |||
| | (begin effect ... value) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand value) | |||
| | (let ([aloc value] ...) effect) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) |
p | ::= | (module (define label (lambda (aloc ...) tail)) ... tail) | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (call triv opand ...) | |||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (begin effect ... value) | |||
| | (if pred value value) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand value) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
procedure
p : values-bits-lang-v8? Picks a particular order to implement let expressions using set!.
With the addition of an mset!, normalize-bind must be updated. This operator enables the same kind of nesting as set!, and like set!, its operand needs to be normalized.
Next we design Proc-imp-cmf-lang v8
p | ::= | (module (define label (lambda (aloc ...) entry)) ... entry) | ||
entry | ::= | tail | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (call triv opand ...) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
p | ::= | (module (define label (lambda (aloc ...) entry tail)) ... entry tail) | ||
entry | ::= | tail | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (call triv opand ...) | |||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (call triv opand ...) | |||
| | (begin effect ... value) | |||
| | (if pred value value) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand triv value) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
int64 | ::= | int64? |
p | ::= | (module (define label (lambda (aloc ...) entry)) ... entry) | ||
entry | ::= | tail | ||
pred | ::= | (relop opand opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | value | ||
| | (call triv opand ...) | |||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
value | ::= | triv | ||
| | (binop opand opand) | |||
| | (mref aloc opand) | |||
| | (alloc opand) | |||
| | (call triv opand ...) | |||
effect | ::= | (set! aloc value) | ||
| | (mset! aloc opand triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
opand | ::= | aloc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
aloc | ::= | aloc? | ||
label | ::= | label? |
Question: Why can’t (or shouldn’t) we allow the index position to also be a value?
procedure
p : imp-mf-lang-v8?
|
| = |
|
| |||||||
|
| = |
|
| |||||||
|
| = |
|
| |||||||
|
| = |
|
|
impose-calling-conventions and select-instructions require only minor changes, so we leave those changes as an exercise for the reader.
procedure
p : proc-imp-cmf-lang-v8? Compiles Proc-imp-cmf-lang v8 to Imp-cmf-lang v8 by imposing calling conventions on all calls (both tail and non-tail calls), and entry points.
procedure
p : imp-cmf-lang-v8? Selects appropriate sequences of abstract assembly instructions to implement the operations of the source language.
1.11.6 Implementing Allocation
As previously discussed, our allocation strategy is to simply grab the current heap base pointer, and increment beyond the bytes we want to use for our data structure. The question is where to implement this operation. We want to do this after we have access to physical locations, so after impose-calling-conventions, but before the register allocator passes, so we do not have to update those passes to know that alloc introduces a reference to a register. This puts the pass right between select-instructions and uncover-locals.
Below, we design Asm-alloc-lang v8, the source language for this pass. We typeset the differences compared to asm-pred-lang-v7.
p | ::= | (module info (define label info tail) ... tail) | ||
info | ::= | (#:from-contract (info/c (new-frames (frame ...)))) | ||
frame | ::= | (aloc ...) | ||
pred | ::= | (relop loc opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | (jump trg loc ...) | ||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
effect | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (set! loc (alloc index)) | |||
| | (mset! loc index triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
| | (return-point label tail) | |||
index | ::= | loc | ||
| | int64 | |||
int32 | ::= | int32? |
p | ::= | (module info (define label info tail) ... tail) | ||
info | ::= | (#:from-contract (info/c (new-frames (frame ...)))) | ||
frame | ::= | (aloc ...) | ||
pred | ::= | (relop loc opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | (jump trg loc ...) | ||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
effect | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (set! loc (alloc index)) | |||
| | (mset! loc index triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
| | (return-point label tail) | |||
opand | ::= | loc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
loc | ::= | rloc | ||
| | aloc | |||
trg | ::= | loc | ||
| | label | |||
index | ::= | loc | ||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
rloc | ::= | register? | ||
| | fvar? |
This language contains our intermediate mops, including (set! loc (alloc index)). This is the low-level form of our allocation operation.
We design the target language, Asm-pred-lang v8, below. This language removes the (alloc index) form and is the highest-level language parameterized by current-heap-base-pointer-register. This language, and all languages between it and x64, assumes all accesses to hbp obey the restrictions described in Paren-x64 v8.
p | ::= | (module info (define label info tail) ... tail) | ||
info | ::= | (#:from-contract (info/c (new-frames (frame ...)))) | ||
frame | ::= | (aloc ...) | ||
pred | ::= | (relop loc opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | (jump trg loc ...) | ||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
effect | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (set! loc (alloc index)) | |||
| | (mset! loc index triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
| | (return-point label tail) | |||
index | ::= | loc | ||
| | int64 |
p | ::= | (module info (define label info tail) ... tail) | ||
info | ::= | (#:from-contract (info/c (new-frames (frame ...)))) | ||
frame | ::= | (aloc ...) | ||
pred | ::= | (relop loc opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | (jump trg loc ...) | ||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
effect | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (mset! loc index triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
| | (return-point label tail) | |||
index | ::= | loc | ||
| | int64 |
p | ::= | (module info (define label info tail) ... tail) | ||
info | ::= | (#:from-contract (info/c (new-frames (frame ...)))) | ||
frame | ::= | (aloc ...) | ||
pred | ::= | (relop loc opand) | ||
| | (true) | |||
| | (false) | |||
| | (not pred) | |||
| | (begin effect ... pred) | |||
| | (if pred pred pred) | |||
tail | ::= | (jump trg loc ...) | ||
| | (begin effect ... tail) | |||
| | (if pred tail tail) | |||
effect | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (mset! loc index triv) | |||
| | (begin effect ... effect) | |||
| | (if pred effect effect) | |||
| | (return-point label tail) | |||
opand | ::= | loc | ||
| | int64 | |||
triv | ::= | opand | ||
| | label | |||
loc | ::= | rloc | ||
| | aloc | |||
trg | ::= | loc | ||
| | label | |||
index | ::= | loc | ||
| | int64 | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
aloc | ::= | aloc? | ||
label | ::= | label? | ||
rloc | ::= | register? | ||
| | fvar? |
`(begin (set! ,loc ,hbp) (set! ,hbp (+ ,hbp ,index)))
procedure
p : asm-alloc-lang-v8? Implements the allocation primitive in terms of pointer arithmetic on the current-heap-base-pointer-register.
1.11.7 Implementing mops
The new mops require minor changes to most of the pipeline between Imp-cmf-lang-v8 to Para-asm-lang v8, where we will start implementing these abstractions in the low-level languages.
Like when we implemented fvars to support working with frame locations, we added mops to simplify working with heap addresses. We want to keep that simplification around as long as possible, to avoid complicating the already complex logic for rewriting set! instructions.
The last pass that needs to do non-trivial rewriting of set! is patch-instructions, and it would benefit from that simplification. So we decide to leave mops in place and implement them after patch-instruction.
Below, we design Para-asm-lang v8, the new source language for patch-instructions.
p | ::= | (begin s ...) | ||
s | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (mset! loc_1 index triv) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare loc opand) | |||
| | (jump-if relop trg) | |||
triv | ::= | opand | ||
| | label | |||
loc | ::= | reg | ||
| | addr | |||
index | ::= | loc | ||
| | int64 | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
int32 | ::= | int32? | ||
dispoffset | ::= | dispoffset? |
p | ::= | (begin s ...) | ||
s | ::= | (set! loc triv) | ||
| | (set! loc_1 (binop loc_1 opand)) | |||
| | (set! loc_1 (mref loc_2 index)) | |||
| | (mset! loc_1 index triv) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare loc opand) | |||
| | (jump-if relop trg) | |||
trg | ::= | loc | ||
| | label | |||
triv | ::= | opand | ||
| | label | |||
opand | ::= | loc | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
index | ::= | loc | ||
| | int64 | |||
reg | ::= | rsp | ||
| | rbp | |||
| | rax | |||
| | rbx | |||
| | rcx | |||
| | rdx | |||
| | rsi | |||
| | rdi | |||
| | r8 | |||
| | r9 | |||
| | r12 | |||
| | r13 | |||
| | r14 | |||
| | r15 | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
label | ::= | label? | ||
dispoffset | ::= | dispoffset? |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg loc triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 loc_1 (binop reg_1 loc loc_1 opand)) | |||
| | (set! reg_1 (mref reg_2 index)) | |||
| | (mset! reg_1 index int32) | |||
| | (mset! reg_1 index trg) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg loc opand) | |||
| | (jump-if relop label trg) | |||
triv | ::= | trg | ||
| | int64 | |||
| | opand | |||
| | label | |||
loc | ::= | reg | ||
| | addr | |||
index | ::= | int32 | ||
| | reg | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
int32 | ::= | int32? | ||
dispoffset | ::= | dispoffset? |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (set! reg_1 (mref reg_2 index)) | |||
| | (mset! reg_1 index int32) | |||
| | (mset! reg_1 index trg) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
triv | ::= | trg | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
index | ::= | int32 | ||
| | reg | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
int32 | ::= | int32? | ||
dispoffset | ::= | dispoffset? |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (set! reg_1 (mref reg_2 index)) | |||
| | (mset! reg_1 index int32) | |||
| | (mset! reg_1 index trg) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
trg | ::= | reg | ||
| | label | |||
triv | ::= | trg | ||
| | int64 | |||
opand | ::= | reg | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
index | ::= | int32 | ||
| | reg | |||
reg | ::= | rsp | ||
| | rbp | |||
| | rax | |||
| | rbx | |||
| | rcx | |||
| | rdx | |||
| | rsi | |||
| | rdi | |||
| | r8 | |||
| | r9 | |||
| | r10 | |||
| | r11 | |||
| | r12 | |||
| | r13 | |||
| | r14 | |||
| | r15 | |||
addr | ::= | (fbp - dispoffset) | ||
fbp | ::= | frame-base-pointer-register? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
label | ::= | label? | ||
dispoffset | ::= | dispoffset? |
As usual, patch-instructions makes a lot of changes to the operands of each instructions.
By leaving mops separate forms, we only need to patch the new instructions, and leave patching of old instructions untouched. We do not need to pay attention to and patch a new kind of operand.
The main restrictions on mops are that the operands must be registers, and that the index must be a 32-bit integer. The operands for mops have to be registers since they are compiled to low-level instructions where one operand is implicitly an index-mode operand. Similar to patching displacement-mode operands, both operands cannot be reference to memory. The index must be patches to 32-bit literals only for the same reason that binary operations must only contain 32-bit literals. This makes index and opand coincide, syntactically, but they are conceptually different so we maintain separate non-terminal definitions.
procedure
p : para-asm-lang-v8 Patches instructions that have no x64 analogue into to a sequence of instructions and an auxiliary register from current-patch-instructions-registers.
Finally, we can translate mops into index-mode operands and displacement-mode operands. Note that the compiler pass doesn’t need to know which it is generating; their syntax is the same except for type of the second operand. The target language is Paren-x64 v8, which we define below.
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (set! reg_1 (mref reg_2 index)) | |||
| | (mset! reg_1 index int32) | |||
| | (mset! reg_1 index trg) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
addr | ::= | (fbp - dispoffset) | ||
| | (reg + int32) | |||
| | (reg + reg) |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
addr | ::= | (fbp - dispoffset) | ||
| | (reg + int32) | |||
| | (reg + reg) |
p | ::= | (begin s ...) | ||
s | ::= | (set! addr int32) | ||
| | (set! addr trg) | |||
| | (set! reg loc) | |||
| | (set! reg triv) | |||
| | (set! reg_1 (binop reg_1 int32)) | |||
| | (set! reg_1 (binop reg_1 loc)) | |||
| | (with-label label s) | |||
| | (jump trg) | |||
| | (compare reg opand) | |||
| | (jump-if relop label) | |||
trg | ::= | reg | ||
| | label | |||
triv | ::= | trg | ||
| | int64 | |||
opand | ::= | reg | ||
| | int64 | |||
loc | ::= | reg | ||
| | addr | |||
reg | ::= | rsp | ||
| | rbp | |||
| | rax | |||
| | rbx | |||
| | rcx | |||
| | rdx | |||
| | rsi | |||
| | rdi | |||
| | r8 | |||
| | r9 | |||
| | r10 | |||
| | r11 | |||
| | r12 | |||
| | r13 | |||
| | r14 | |||
| | r15 | |||
addr | ::= | (fbp - dispoffset) | ||
| | (reg + int32) | |||
| | (reg + reg) | |||
fbp | ::= | frame-base-pointer-register? | ||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | bitwise-and | |||
| | bitwise-ior | |||
| | bitwise-xor | |||
| | arithmetic-shift-right | |||
relop | ::= | < | ||
| | <= | |||
| | = | |||
| | >= | |||
| | > | |||
| | != | |||
int64 | ::= | int64? | ||
int32 | ::= | int32? | ||
dispoffset | ::= | dispoffset? | ||
label | ::= | label? |
The language contains a new addr representing the x64 index-mode operand (reg + reg). This supports accessing a memory location by the index stored in another register. For example, in x64, we represent loading the nth element of an array into r10 using mov r10 [r11 + r12], where the base of the array is stored at r11 and the value of n is stored in r12.
The index-mode operand is not restricted to use a particular register, unlike the displacement-mode operand from Paren-x64 v7.
procedure
(implement-mops p) → paren-x64-v8?
p : paren-x64-mops-v8? Compiles mops to instructions on pointers with index- and displacement-mode operands.
Finally, we update generate-x64 to emit the string representation of the index-mode operand.
procedure
(generate-x64 p) → string?
p : paren-x64-v8? Compile the Paren-x64 v8 program into a valid sequence of x64 instructions, represented as a string.