On this page:
1.8.1 Preface:   What’s wrong with our language?
1.8.2 Designing a source language with return
1.8.3 The Stack (of Frames)
1.8.4 Extending our Calling Convention
1.8.5 Extending front-end with support for non-tail calls
check-values-lang
uniquify
sequentialize-let
normalize-bind
1.8.6 Extending Calling Convention
impose-calling-conventions
1.8.7 Implementing A Stack of Frames
1.8.7.1 Updating intermediate passes
select-instructions
1.8.7.2 Analyzing Return Points and Non-tail Calls
uncover-locals
undead-analysis
conflict-analysis
1.8.7.3 Frame Allocation
assign-call-undead-variables
allocate-frames
1.8.7.4 Adjusting the Register Allocator
assign-registers
assign-frame-variables
replace-locations
1.8.8 Adjusting Frame Variables
implement-fvars
1.8.9 Implementing Return Points
expose-basic-blocks
1.8.10 Final Passes
resolve-predicates
flatten-program
patch-instructions
generate-x64
1.8.11 Appendix:   Overview

1.8 Procedural Abstraction: Return

1.8.1 Preface: What’s wrong with our language?

In Values-lang v5, we added a limited form of procedure which supported only the call abstraction. This necessarily limited in which context procedures could be called, restricting them to tail context. This is unfortunate, since we may want to use procedural abstraction to abstract over side effects, and use such procedures in effect context, or to compute values as part of some computation to be used in value context. We can emulate these behaviours by manually writing our code in continuation-passing style (CPS) (sometimes known as "callback hell"), but we don’t want to be sent to The Hague, so we will not design a language that forces programmers use it.

To allow calls in arbitrary context, we need the return abstraction, which allows the language to essentially jump back from a procedure call into the middle of a computation. Thankfully, we already have the low-level abstractions required to implement this: labels, jumps, and a calling convention. All we need to do is slightly generalize the calling convention to introduce a new label at the point to which a procedure call must return after performing its effect or computing a value, store that label somewhere, and arrange for the caller to jump back to that label.

Design digression:
We might think we could pre-process Values-lang v6 to lift all non-tail calls into new procedures and rewrite the program to have only tail calls. This is possible, but would correspond to a CPS transformation. It would also be difficult to implement without the closure abstraction, which we do not have yet, and difficult to optimize, particularly at this level of abstraction. Each non-tail call would introduce an entire procedure call setup to the "rest" of the body, as a continuation. Any parameters live across the non-tail call would need to be packaged and explicitly passed as arguments to the continuation. By creating a return point abstraction, later in the compiler when we have access to lower-level abstraction (basic blocks), we’ll instead be able to generate a single jump back to this return point, instead of a whole procedure call.

1.8.2 Designing a source language with return

Below, we define Values-lang v6 by extending Values-lang-v5 with a return, and with calls in arbitrary contexts.

  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
  | (call x triv ...)
     
  binop ::= *
  | +
  | -
  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  pred ::= (relop triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
  | (call x triv ...)
     
  triv ::= x
  | int64
     
  x ::= name?
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?

The major syntactic change is the addition of (call triv triv ...) in value context. The implementation of this requires a semantic change to call, to ensure it returns. Otherwise, an expression such as (let ([x (call f 5)]) (+ 1 x)) would return the value x, and not (+ 1 x) as intended.

Note that tail and value context now coincide. We do not collapse them yet, as imposing a distinction will let us systematically derive a compiler design that transforms tail calls (which need not return) separately from non-tail calls, i.e., calls in any context other than tail context. This will let us maintain the performance characteristics of tail calls, namely that they use a constant amount of stack space, and do not need to jump after their computation is complete.

We also add subtraction as a binop. This week, we start to need subtraction a lot more and it’s tiring to encode it when x64 supports it. This requires almost no changes to the compiler if it parameterized by the set of binops.

1.8.3 The Stack (of Frames)

The key challenge in implementing return has little to do with implementing control flow—tweaking the the calling convention add labels and jumps and a convention on how use them is pretty easy. The hard part is how to keep track of all the values of variables that are live after a call.

For example, consider the following Values-lang v6 program:
(module
  ....
  (define f (lambda (x) ....))
  (let ([z 6]
        [y (call f 5)])
    (+ z y)))
The value of z is needed after the non-tail call to f. To which physical location can we assign z to ensure its value is not overwritten after the call?

Locally, we have no way of knowing. f could overwrite any register and any frame variable. Recall that our allocator assumes by the end of a block, nothing is live. It may overwrite any register not otherwise reserved. It also starts counting from frame variable fv0 when spilling locations, and could, in principle, use any number of frame variables, so the caller has no way of picking a frame variable with a high enough index that is safe. To make matters worse, the callee itself could execute many non-tail calls, and have its own values it needs to retain until they return, and so on recursively.

We do have one such location where values are safe, although we have not used or even consider it: negatively-indexed locations on the stack, relative to the callee. The allocator always starts indexing new frame variables from 0 at (rbp - 0); if we hide the values that are live across a non-tail calls at, say, (rbp + 8), (rbp + 16), and so on, they would be safe. (Recall the stack grows downwards, decrementing from the base pointer, so accessing prior values is implementing by incrementing from the base pointer.)

(rbp - n*8)

...

(rbp - 8)

(rbp - 0)

Might be overwritten

(rbp + 8)

Allocator cannot overwrite

(rbp + 16)

...

(rbp + m*8)

We cannot directly express these backwards offsets on the stack since our frame variables use natural indexes, and even if we could, this alone does not handle the general case, where the callee has further non-tail calls. But it gives us the core of the idea: we want to hide our values on the stack, but at a location before the 0-index from which the callee will start counting.

We do this by pushing the callers’s frame onto the stack prior to a non-tail call. A frame is a procedure’s set of frame variables needed after a non-tail call. We arrange that all values live after a non-tail call are stored in frame variables, so they are automatically saved by pushing the frame onto the stack. We push the frame by decrementing the frame base pointer past the last frame variable. This resets the 0-index for the callee, hiding the caller’s frame variables from the callee, so from the callee’s perspective, all frame variables starting at fv0 are unused. After returning from a call, we pop the caller’s frame from the stack by incrementing the frame base pointer back to its original value. This handles the general case: every non-tail call pushes a frame, create a new one for the callee and saving the caller’s values on the stack, and each procedure accesses its own frame starting from frame variable fv0, the same as before. This means our compiler users the stack not to push and pop values, but to push and pop frames: it is a stack of frames.

It’s only when we push a frame that stack space is allocated. Until then, the stack space is essentially treated as temporary, reclaimed at the end of a procedurea tail call is free to overwrite everything starting from the frame base pointer.

Implementing frame allocation is our core challenge in adding non-tail calls. The caller in a non-tail call will need to access both its own and the callee’s frame, since it may need to pass some arguments on the stack as part of the callee’s frame. We will also need to determine how large the caller’s frame is at a non-tail call, so we know how many words to increment the frame base pointer in order to implement pushing and popping a frame to and from the stack.

1.8.4 Extending our Calling Convention

In Procedural Abstraction: Call, we designed a calling convention, but only aimed to support calls without return. We need to modify it in three ways to support return, and non-tail calls.

First, we modify how we transform each procedure. When generating code for a procedure, we cannot know in general (Rice’s Theorem) whether it will need to return or not, i.e., whether it will be called via a tail call or non-tail call. We therefore design our calling convention to enable any procedure to return.

We modify our calling convention, designating a register current-return-address-register in which to pass the return address, the label for the instruction following a non-tail call. By default, we use r15 as the current-return-address-register. Every procedure will jump to this return address after it is finish executing. On entry to any procedure, we load the current-return-address-register into a fresh abstract location, which we call tmp-ra. This is necessary since the current-return-address-register needs to be available immediately for any new calls, but we will not need the return address until the end of the procedure.

To clarify this idea of on entry, we introduce a contextual distinction between the first tail, and other tails in the syntax of the source language. An entry point is the top-level tail expression that begins execution of code, and each entry point must load the current-return-address-register, preserve its value, and return to it when finished. This happens either as the body of a procedure, or as the initial tail in a module (module tail).

  p ::= (module (define label (lambda (aloc ...) entry)) ... entry)
     
  entry ::= tail
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)

Design digression:
This generalization lets us treat all returns, including returning the final value to the run-time system, uniformly, and allows us to eliminate the halt instruction. This isn’t necessary; we could continue to support halt and treat the module-level entry point separate from procedures. However, there is no benefit to doing so, and it clutters the compiler with special cases. The only benefit of keeping halt would be saving a single jump instruction, but this has almost no cost, will probably be predicted by a CPU’s branch predictor, and could easily be optimized away by a small pass nearly anywhere in the compiler pipeline.

Second, we need to a way to introduce a return address at a non-tail call, so we can actually return to the middle of a computation. Since we do not have access to labels Imp-mf-lang v5, we need to introduce an abstraction for creating a return address in our new intermediate language. We introduce a return point, which creates a label for a tail in effect context, so the effect position for this language is:

  effect ::= (set! loc value)
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)

A return point contains a tail, since that computation "ends" locally, jumping elsewhere, but is an effect, since it updates the state of the machine to return to control this location. After returning, the code at the return address will read from a designated register and continue the rest of the computation. We reuse the current-return-value-register as this designated register.

Third, we need to explicitly return a value in tail position. Previously, the final value in tail position was also the final value of the program. This value was implicitly returned to the run-time system. However, now, a value in tail position may either be returned to the run-time system, or may be returned to some other computation from a non-tail call. We transform a value in tail position by moving it into the current-return-value-register, and jumping to the return address stored in tmp-ra.

Finally, when setting up a non-tail call, we must ensure that arguments are placed on the callee’s frame instead of the caller’s frame. Recall that our calling convention passes parameters in frame variables when they don’t fit in registers.

Making this all concrete, we implement our new calling convention with the following transformations:

1.8.5 Extending front-end with support for non-tail calls

As we don’t require exposing any new low-level primitives, we modify our compiler proceeding top-down instead of bottom-up.

The main required changes are in the calling convention, so we begin by extending all the passes between the source language and impose-calling-conventions to support non-tail calls.

We first update check-values-lang to allow non-tail calls. The heuristics remain the same as in v5:check-values-lang.

Validates that the Values-lang v6 is well bound and well typed: all procedure calls pass the correct number of arguments, and all binop and relop are never used on procedures.

Next, we extend uniquify. First, of course, we design the updated Values-unique-lang v6. We typeset the differences with respect to Values-unique-lang v5.

  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  value ::= triv
  | (binop opand opand)
  | (let ([aloc value] ...) value)
  | (if pred value value)
  | (call triv opand ...)
     
  binop ::= *
  | +
  | -
  p ::= (module (define label x (lambda (aloc x ...) tail)) ... tail)
     
  pred ::= (relop opand opand triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([aloc x value] ...) tail)
  | (if pred tail tail)
  | (call x triv opand ...)
     
  value ::= triv
  | (binop opand opand triv triv)
  | (let ([aloc x value] ...) value)
  | (if pred value value)
  | (call x triv opand ...)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
  | x
  | int64
     
  x ::= name?
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([aloc value] ...) tail)
  | (if pred tail tail)
  | (call triv opand ...)
     
  value ::= triv
  | (binop opand opand)
  | (let ([aloc value] ...) value)
  | (if pred value value)
  | (call triv opand ...)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

This requires no changes specific to non-tail calls, so the changes compared to v5:uniquify are trivial.

procedure

(uniquify p)  values-unique-lang-v6?

  p : values-lang-v6?
Compiles Values-lang v6 to Values-unique-lang v6 by resolving top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.

Finally, we expose non-tail calls through sequentialize-let. Below we define the target language, Imp-mf-lang v6, where we transform lexical binding into sequential imperative assignments.

  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (call triv opand ...)
  | (binop opand opand)
  | (begin effect ... value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  binop ::= *
  | +
  | -
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (let ([aloc value] ...) tail)
  | (if pred tail tail)
  | (call triv opand ...)
     
  value ::= triv
  | (call triv opand ...)
  | (binop opand opand)
  | (begin effect ... value)
  | (let ([aloc value] ...) value)
  | (if pred value value)
  | (call triv opand ...)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (call triv opand ...)
  | (binop opand opand)
  | (begin effect ... value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

Note that this language contains a definition entry designating the top-level tail used as the entry point for each procedure and for the module as a whole. There is no syntactic distinction, but making a semantic distinction will simplify our implementation of the calling convention to support return.

Compiles Values-unique-lang v6 to Imp-mf-lang v6 by picking a particular order to implement let expressions using set!.

Finally, we update normalize-bind. Below we design the target language Proc-imp-cmf-lang v6, typeset with differences compared to Proc-imp-cmf-lang v5.

  p ::= (module (define label (lambda (aloc ...) entry tail)) ... entry tail)
     
  entry ::= tail
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (call triv opand ...)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) entry tail)) ... entry tail)
     
  entry ::= tail
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (call triv opand ...)
  | (begin effect ... value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) entry)) ... entry)
     
  entry ::= tail
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (call triv opand ...)
     
  effect ::= (set! aloc value)
  | (begin effect ...)
  | (if pred effect effect)
     
  opand ::= aloc
  | int64
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

We simply extend Proc-imp-cmf-lang v5 with our new abstractions, including non-tail calls.

Compiles Imp-mf-lang v6 to Proc-imp-cmf-lang v6, pushing set! under begin so that the right-hand-side of each set! is base value-producing operation.

This normalizes Imp-mf-lang v6 with respect to the equations:
(set! aloc
      (begin effect_1 ...
             value))

   

=

   

(begin effect_1 ...
       (set! aloc value))
(set! aloc
      (if pred
          value_1
          value_2))

   

=

   

(if pred
    (set! aloc value_1)
    (set! aloc value_2))

1.8.6 Extending Calling Convention

Now we design Imp-cmf-lang-v6, the target language of our calling convention translation.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc value)
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?
  p ::= (module info (define label info tail (lambda (aloc ...) entry)) ... tail entry)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
     
  frame ::= (aloc ...)
     
  entry ::= tail
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (call triv opand ...)
     
  effect ::= (set! loc aloc value)
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | aloc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc value)
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

Compared to Imp-cmf-lang v5, the main difference is the return-point form in effect context. This instruction introduces a new, non-top-level label in the middle of an instruction sequence, which is expected to be exclusively used by the calling convention to implement return. We further assume that a return-point cannot appear inside another return-point, i.e., there are no nested return-points. Our compiler can never generate this code, and there is no reason to support it.

The implicit return value, value in tail position, is no longer valid. Instead, the run-time system will set the first return address, and the final result of any computation is explicitly returned to the run-time system using the calling conventions. The run-time system initializes the current-return-address-register to be the address of the exit procedure.

To implement return, we modify every entry point to store the current-return-address-register as described by our calling convention. Then we explicitly return the base expressions in value context by jumping to that return address.

The new-frames info field records all the new frames created in the block, and will be used later to push new frames on to the stack, and assign new frame variables to actual frame variables. There should be one frame for each non-tail call, even if that frame is empty. The new frame variables should be in order. Each new frame variable must only appear in one frame in the new-frames field. Recall that alocs are unique to a scope, and the new-frames field represents newly defined alocs. It would not make sense for the same aloc to appear in two frames.

Compiles Proc-imp-cmf-lang v6 to Imp-cmf-lang v6 by imposing calling conventions on all calls (both tail and non-tail calls), and entry points. The registers used to passing parameters are defined by current-parameter-registers, and the registers used for returning are defined by current-return-address-register and current-return-value-register.

After implementing the calling conventions, we have two abstractions that we need to implement.

First, we must allocate frames. We’ve already collected the new frame variables, so we need to modify the allocator to determine the size of each frame and allocate them on the stack. This requires explicitly manipulating the frame base pointer, which also changes how frame variables are implemented.

Second, we need to implement return-points. These will be compiled to raw labels, so we essentially preserve them until a low-level language with access to raw labels.

1.8.7 Implementing A Stack of Frames

The first new abstraction we must implement is frame allocation. We have two needs to allocate frames. First, we need to know how large the frame is, so we can actually allocate that much space. To compute the size of each caller’s frame, we need to know how many variables might be live across a non-tail call, so this must wait until after undead-analysis. Second, to minimize the space required, we want to make sure we wait to allocate until conflict-analysis. Otherwise, we might have to assume that all live abstract locations are unique frame variables, increasing the size of the frame. By waiting until after conflict-analysis, so we can try to assign non-conflicting variables to the same frame variable.

1.8.7.1 Updating intermediate passes

We begin by updating the passes through conflict-analysis.

The next pass in the sequence is select-instructions. We define Asm-pred-lang v6 below, with changes typeset with respect to Asm-pred-lang v5.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
  | info?
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc triv value)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  loc ::= rloc
  | aloc
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

There are no new restrictions for return-point, so we just add support for return-point, and drop halt.

Compiles Imp-cmf-lang v6 to Asm-pred-lang v6, selecting appropriate sequences of abstract assembly instructions to implement the operations of the source language.

1.8.7.2 Analyzing Return Points and Non-tail Calls

We first extend uncover-locals to find locals in non-tail calls and return points. Below we define Asm-pred-lang-v6/locals. We typeset changes compared to Asm-pred-lang v5/locals.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...))))
     
  frame ::= (aloc ...)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
     
  int64 ::= int64?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...))))
     
  frame ::= (aloc ...)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
     
  int64 ::= int64?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

Updating this analysis is not complicated. We simply add a case to handle return-points. Remember that the "arguments" to jump are only used for later analyses and not considered locals for this analysis.

The new frame variables are considered locals, so will be part of the analysis. The locals set is the set of unassigned variables, so we add variables to it that are unassigned, and remove variables from the set as they are assigned. So new frame variables are listed in the locals set for the enclosing block, until they are assigend frame variables.

Compiles Asm-pred-lang v6 to Asm-pred-lang v6/locals, analysing which abstract locations are used in each block, and each block and the module with the set of variables in an info? fields.

Next we extend undead-analysis. We design Asm-pred-lang-v6/undead below, typeset with respect to Asm-pred-lang v5/undead.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (undead-out undead-set-tree/rloc?)))
     
  frame ::= (aloc ...)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (undead-out undead-set-tree/rloc?)))
     
  frame ::= (aloc ...)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (undead-out undead-set-tree/rloc?)))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

We add two new info fields: undead-out and call-undead. The undead-out field continues to store the undead-set tree, which we must update to follow the structure of return-points. We also record information about what is undead across a non-tail call, the call-undead set. This is the set of all variables that are undead after any non-tail call in a block; there is a single set for the entire block, even when there are multiple non-tail calls.

The call-undead field stores every abstract location or frame variable that is in the undead-out set of a return-point. The abstract locations must be allocated to the frame, as discussed above, and both these abstract locations and the frame variables are requried to compute the size of the frame.

First, we update the definition of undead-set-tree? to handle the return-point instruction, which includes a nested tail.

Our new undead-set tree is one of:
  • an undead-out set (aloc ...), corresponding to the undead-out set for a single instruction such as (set! aloc triv).

  • a list of undead-set trees, (undead-set-tree?_1 ... undead-set-tree?_2), corresponding to a begin statement (begin effect_1 ... effect_2) or (begin effect_1 ... tail). The first element of the list represents undead-set tree for the first effect, the second element represents the undead-set tree for the second effect, and so on.

  • a list of exactly three undead-set trees, (undead-set-tree?_p undead-set-tree?_1 undead-set-tree?_2), corresponding to a if statement (if pred tail_1 tail_2). undead-set-tree?_p corresponds to the undead-set tree of pred, while undead-set-tree?_1 corresponds to tail_1 and undead-set-tree?_2 corresponds to tail_2.

  • a list containing an undead-out set and one undead-set tree, (undead-set undead-set-tree?), corresponding to a return-point statement (return-point label tail). The undead-set is the undead-out set for the return-point, and the undead-set-tree? in the list is the undead-set tree for the tail of the return-point.

Analyzing return-point requires making explicit a fact from our calling convention. After returning, we expect current-return-value-register to be live. We model this as treating a return-point as defining the current-return-value-register.

Performs undead analysis, compiling Asm-pred-lang v6/locals to Asm-pred-lang v6/undead by decorating programs with their undead-set trees.

Below are examples of compiling a program that swaps its inputs, one using the normal allocator and one using an empty set of assignable registers.

Examples:
> ((compose
    undead-analysis
    uncover-locals
    select-instructions
    impose-calling-conventions
    normalize-bind
    sequentialize-let)
   '(module
      (define L.swap.1
        (lambda (x.1 y.2)
          (if (< y.2 x.1)
              x.1
              (let ([z.3 (call L.swap.1 y.2 x.1)])
                z.3))))
      (call L.swap.1 1 2)))

'(module

   ((new-frames ())

    (locals (tmp-ra.2))

    (call-undead ())

    (undead-out

     ((tmp-ra.2 rbp)

      (tmp-ra.2 rsi rbp)

      (tmp-ra.2 rsi rdi rbp)

      (rsi rdi r15 rbp)

      (rsi rdi r15 rbp))))

   (define L.swap.1

     ((new-frames (()))

      (locals (z.3 tmp-ra.1 x.1 y.2))

      (undead-out

       ((rdi rsi tmp-ra.1 rbp)

        (rsi x.1 tmp-ra.1 rbp)

        (y.2 x.1 tmp-ra.1 rbp)

        ((y.2 x.1 tmp-ra.1 rbp)

         ((tmp-ra.1 rax rbp) (rax rbp))

         (((rax tmp-ra.1 rbp)

           ((y.2 rsi rbp) (rsi rdi rbp) (rsi rdi r15 rbp) (rsi rdi r15 rbp)))

          (z.3 tmp-ra.1 rbp)

          (tmp-ra.1 rax rbp)

          (rax rbp)))))

      (call-undead (tmp-ra.1)))

     (begin

       (set! tmp-ra.1 r15)

       (set! x.1 rdi)

       (set! y.2 rsi)

       (if (< y.2 x.1)

         (begin (set! rax x.1) (jump tmp-ra.1 rbp rax))

         (begin

           (return-point L.rp.1

             (begin

               (set! rsi x.1)

               (set! rdi y.2)

               (set! r15 L.rp.1)

               (jump L.swap.1 rbp r15 rdi rsi)))

           (set! z.3 rax)

           (set! rax z.3)

           (jump tmp-ra.1 rbp rax)))))

   (begin

     (set! tmp-ra.2 r15)

     (set! rsi 2)

     (set! rdi 1)

     (set! r15 tmp-ra.2)

     (jump L.swap.1 rbp r15 rdi rsi)))

> (parameterize ([current-parameter-registers '()])
    ((compose
      undead-analysis
      uncover-locals
      select-instructions
      impose-calling-conventions
      normalize-bind
      sequentialize-let)
     '(module
        (define L.swap.1
          (lambda (x.1 y.2)
            (if (< y.2 x.1)
                x.1
                (let ([z.3 (call L.swap.1 y.2 x.1)])
                  z.3))))
        (call L.swap.1 1 2))))

'(module

   ((new-frames ())

    (locals (tmp-ra.6))

    (call-undead ())

    (undead-out

     ((tmp-ra.6 rbp)

      (tmp-ra.6 fv1 rbp)

      (tmp-ra.6 fv1 fv0 rbp)

      (fv1 fv0 r15 rbp)

      (fv1 fv0 r15 rbp))))

   (define L.swap.1

     ((new-frames ((nfv.4 nfv.5)))

      (locals (nfv.4 nfv.5 z.3 tmp-ra.3 x.1 y.2))

      (undead-out

       ((fv0 fv1 tmp-ra.3 rbp)

        (fv1 x.1 tmp-ra.3 rbp)

        (y.2 x.1 tmp-ra.3 rbp)

        ((y.2 x.1 tmp-ra.3 rbp)

         ((tmp-ra.3 rax rbp) (rax rbp))

         (((rax tmp-ra.3 rbp)

           ((y.2 nfv.5 rbp)

            (nfv.5 nfv.4 rbp)

            (nfv.5 nfv.4 r15 rbp)

            (nfv.5 nfv.4 r15 rbp)))

          (z.3 tmp-ra.3 rbp)

          (tmp-ra.3 rax rbp)

          (rax rbp)))))

      (call-undead (tmp-ra.3)))

     (begin

       (set! tmp-ra.3 r15)

       (set! x.1 fv0)

       (set! y.2 fv1)

       (if (< y.2 x.1)

         (begin (set! rax x.1) (jump tmp-ra.3 rbp rax))

         (begin

           (return-point L.rp.2

             (begin

               (set! nfv.5 x.1)

               (set! nfv.4 y.2)

               (set! r15 L.rp.2)

               (jump L.swap.1 rbp r15 nfv.4 nfv.5)))

           (set! z.3 rax)

           (set! rax z.3)

           (jump tmp-ra.3 rbp rax)))))

   (begin

     (set! tmp-ra.6 r15)

     (set! fv1 2)

     (set! fv0 1)

     (set! r15 tmp-ra.6)

     (jump L.swap.1 rbp r15 fv0 fv1)))

The following example shows the output on an intermediate representation of non-tail recursive factorial, compiled with current-parameter-registers set to '() to force the compiler to generate frame variables and test edge cases.

Example:
> (pretty-display
   (undead-analysis
    '(module
       ((new-frames ()) (locals (ra.12)))
       (define L.fact.4
         ((new-frames ((nfv.16)))
          (locals (ra.13 x.9 tmp.14 tmp.15 new-n.10 nfv.16 factn-1.11 tmp.17)))
         (begin
           (set! x.9 fv0)
           (set! ra.13 r15)
           (if (= x.9 0)
               (begin (set! rax 1) (jump ra.13 rbp rax))
               (begin
                 (set! tmp.14 -1)
                 (set! tmp.15 x.9)
                 (set! tmp.15 (+ tmp.15 tmp.14))
                 (set! new-n.10 tmp.15)
                 (return-point
                     L.rp.6
                   (begin
                     (set! nfv.16 new-n.10)
                     (set! r15 L.rp.6)
                     (jump L.fact.4 rbp r15 nfv.16)))
                 (set! factn-1.11 rax)
                 (set! tmp.17 x.9)
                 (set! tmp.17 (* tmp.17 factn-1.11))
                 (set! rax tmp.17)
                 (jump ra.13 rbp rax)))))
       (begin
         (set! ra.12 r15)
         (set! fv0 5)
         (set! r15 ra.12)
         (jump L.fact.4 rbp r15 fv0)))))

(module

  ((new-frames ())

   (locals (ra.12))

   (call-undead ())

   (undead-out ((ra.12 rbp) (ra.12 fv0 rbp) (fv0 r15 rbp) (fv0 r15 rbp))))

  (define L.fact.4

    ((new-frames ((nfv.16)))

     (locals (ra.13 x.9 tmp.14 tmp.15 new-n.10 nfv.16 factn-1.11 tmp.17))

     (undead-out

      ((r15 x.9 rbp)

       (x.9 ra.13 rbp)

       ((x.9 ra.13 rbp)

        ((ra.13 rax rbp) (rax rbp))

        ((tmp.14 x.9 ra.13 rbp)

         (tmp.14 tmp.15 x.9 ra.13 rbp)

         (tmp.15 x.9 ra.13 rbp)

         (new-n.10 x.9 ra.13 rbp)

         ((rax x.9 ra.13 rbp) ((nfv.16 rbp) (nfv.16 r15 rbp) (nfv.16 r15 rbp)))

         (x.9 factn-1.11 ra.13 rbp)

         (factn-1.11 tmp.17 ra.13 rbp)

         (tmp.17 ra.13 rbp)

         (ra.13 rax rbp)

         (rax rbp)))))

     (call-undead (x.9 ra.13)))

    (begin

      (set! x.9 fv0)

      (set! ra.13 r15)

      (if (= x.9 0)

        (begin (set! rax 1) (jump ra.13 rbp rax))

        (begin

          (set! tmp.14 -1)

          (set! tmp.15 x.9)

          (set! tmp.15 (+ tmp.15 tmp.14))

          (set! new-n.10 tmp.15)

          (return-point L.rp.6

            (begin

              (set! nfv.16 new-n.10)

              (set! r15 L.rp.6)

              (jump L.fact.4 rbp r15 nfv.16)))

          (set! factn-1.11 rax)

          (set! tmp.17 x.9)

          (set! tmp.17 (* tmp.17 factn-1.11))

          (set! rax tmp.17)

          (jump ra.13 rbp rax)))))

  (begin

    (set! ra.12 r15)

    (set! fv0 5)

    (set! r15 ra.12)

    (jump L.fact.4 rbp r15 fv0)))

Next we update the conflict-analysis. Below, we define Asm-pred-lang v6/conflicts, typeset with differences compared to Asm-pred-lang v5/conflicts.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...))))
     
  frame ::= (aloc ...)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...)) (undead-out undead-set-tree/rloc?)))
     
  frame ::= (aloc ...)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

Recall that current-return-value-register is assigned by a non-tail call. Also note that current-frame-base-pointer-register and current-return-value-register are likely to end up in conflict with everything, even though we have removed them from the current-assignable-registers set.

If our frame allocation was more clever, we would need to adjust the conflict analysis to make all caller saved registers in conflict with a non-tail call. However, we instead assign all call-undead variables to the frame, so we don’t need to do very much for non-tail calls.

Performs conflict analysis, compiling Asm-pred-lang v6/undead to Asm-pred-lang v6/conflicts by decorating programs with their conflict graph.

1.8.7.3 Frame Allocation

The size of a frame n (in slots) for a given non-tail call is the maximum of:
  • the number of locations in the call-undead, or

  • one more than the index of the largest frame location in the call-undead set.

So far, it’s unlikely that the current compiler implementation will ever create a program with a frame variable live after a non-tail call, since register allocation hasn’t happened yet. However, we will see shortly an example of a call-undead variables assigned a frame variable with an index greater than the size of the call-undead set.

In practice, calling conventions distinguish between two sets of registers: callee-saved and caller-saved, to allow some registers to be live across a call. We ignore this for simplicity and assume all registers are caller-saved.

To allocate a frame, intuitively, we want to transform `(return-point ,rp ,tail) into:
`(begin
   (set! ,fv_0 ,x_0)
   ...
   (set! ,fv_n-1 ,x_n-1)
 
   (set! ,fbp (- ,fbp ,nb))
   (return-point ,rp ,tail)
   (set! ,fbp (+ ,fbp ,nb))
 
   (set! ,x_0 ,fv_0)
   ...
   (set! ,x_n-1 ,fv_n-1))
where:

This pushes all call-live variables onto the frame, pushes the frame, performs the call, pops the frame, and reads back the variables.

Unfortunately, we can’t implement this transformation as is. We want to avoid producing new set!s. First, they will invalidate the undead analysis we’ve just performed. Second, some or all the moves might be unnecessary. We don’t know whether those variables x_0 ... x_n-1 need to be in registers, so it would be potentially more efficient to assign them to frame variables in the first place and leave some other pass to move them into registers when necessary, rather than definitely generate a ton of memory accesses.

Instead, we perform this transformation in two steps. First, a new pass assign-call-undead-variables assigns each variable that is live across a call to a frame variable, instead of producing new moves. This produces a partial assignment of abstract locations to frame variables, which the register allocator will work from. Second, a new pass, allocate-frames does the work of updating the frame base pointer, effectively allocating the frame for each non-tail call.

We define Asm-pred-lang-v6/pre-framed below, typeset with changes with respect to Asm-pred-lang-v6/conflicts.

  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc fvar) ...))))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc fvar) ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  fvar ::= fvar?
     
  int64 ::= int64?

The core of assign-call-undead-variables is similar to assign-registers. The core algorithm is a straight-forward recursion over the call-undead set, and produces an assignment.
  • If the input set is empty, return the default assignment.

  • Choose a variable, x, from the input set of variables.

  • Recur with x removed from the input set and the conflict graph. The recursive call should return an assignment for all the remaining variables.

  • Select a compatible frame variable for x.

    A variable x is compatible with a frame location fvar_i if it is not directly in conflict with fvar_i, and it is not in conflict with a variable y that has been assigned to fvar_i.

    An easy way to find a compatible frame variable is to find the set of frame variables to which x cannot be assigned. Then, starting from fv0, assign the first frame variable that is not in the incompatible set.

    Finally, add the assignment for the x to the result of the recursive call.

The default assignment for this pass is the empty assignment, since nothing has been assigned yet.

Since many frame variables will be assigned prior to register allocation, you will need to modify assign-registers to use a similar algorithm for spilling, instead of a naive algorithm that starts spilling at frame variable fv0.

Compiles Asm-pred-lang-v6/conflicts to Asm-pred-lang-v6/pre-framed by pre-assigning all variables in the call-undead sets to frame variables.

Example:
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       assign-call-undead-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions
       impose-calling-conventions
       normalize-bind
       sequentialize-let)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (call L.swap.1 y.2 x.1)])
                   z.3))))
         (call L.swap.1 1 2)))))

(module

  ((new-frames ())

   (locals (tmp-ra.10))

   (call-undead ())

   (undead-out

    ((tmp-ra.10 rbp)

     (tmp-ra.10 fv1 rbp)

     (tmp-ra.10 fv1 fv0 rbp)

     (fv1 fv0 r15 rbp)

     (fv1 fv0 r15 rbp)))

   (conflicts

    ((tmp-ra.10 (fv0 fv1 rbp))

     (rbp (r15 fv0 fv1 tmp-ra.10))

     (fv1 (r15 fv0 rbp tmp-ra.10))

     (fv0 (r15 rbp fv1 tmp-ra.10))

     (r15 (rbp fv0 fv1))))

   (assignment ()))

  (define L.swap.1

    ((new-frames ((nfv.8 nfv.9)))

     (locals (y.2 x.1 z.3 nfv.9 nfv.8))

     (undead-out

      ((fv0 fv1 tmp-ra.7 rbp)

       (fv1 x.1 tmp-ra.7 rbp)

       (y.2 x.1 tmp-ra.7 rbp)

       ((y.2 x.1 tmp-ra.7 rbp)

        ((tmp-ra.7 rax rbp) (rax rbp))

        (((rax tmp-ra.7 rbp)

          ((y.2 nfv.9 rbp)

           (nfv.9 nfv.8 rbp)

           (nfv.9 nfv.8 r15 rbp)

           (nfv.9 nfv.8 r15 rbp)))

         (z.3 tmp-ra.7 rbp)

         (tmp-ra.7 rax rbp)

         (rax rbp)))))

     (call-undead (tmp-ra.7))

     (conflicts

      ((y.2 (rbp tmp-ra.7 x.1 nfv.9))

       (x.1 (y.2 rbp tmp-ra.7 fv1))

       (tmp-ra.7 (y.2 x.1 rbp fv1 fv0 rax z.3))

       (z.3 (rbp tmp-ra.7))

       (nfv.9 (r15 nfv.8 rbp y.2))

       (nfv.8 (r15 rbp nfv.9))

       (rbp (y.2 x.1 tmp-ra.7 rax z.3 r15 nfv.8 nfv.9))

       (r15 (rbp nfv.8 nfv.9))

       (rax (rbp tmp-ra.7))

       (fv0 (tmp-ra.7))

       (fv1 (x.1 tmp-ra.7))))

     (assignment ((tmp-ra.7 fv2))))

    (begin

      (set! tmp-ra.7 r15)

      (set! x.1 fv0)

      (set! y.2 fv1)

      (if (< y.2 x.1)

        (begin (set! rax x.1) (jump tmp-ra.7 rbp rax))

        (begin

          (return-point L.rp.3

            (begin

              (set! nfv.9 x.1)

              (set! nfv.8 y.2)

              (set! r15 L.rp.3)

              (jump L.swap.1 rbp r15 nfv.8 nfv.9)))

          (set! z.3 rax)

          (set! rax z.3)

          (jump tmp-ra.7 rbp rax)))))

  (begin

    (set! tmp-ra.10 r15)

    (set! fv1 2)

    (set! fv0 1)

    (set! r15 tmp-ra.10)

    (jump L.swap.1 rbp r15 fv0 fv1)))

Notice in this example that tmp-ra.7 gets assigned the frame variable fv2, since it is live across the call, but in conflict with the fv0 and fv1 due to the calling convention. This means the frame will have size 3, to preserve fv2, despite only 1 variable being live across the call.

Note that our language is flexible enough to allow us to inline fv2 in the call-undead set, or not and allow a allocate-frames to resolve call-undead variables using the assignment.

Now we can allocate frames for each non-tail call. For each block, we compute the size of the frames for all non-tail call, as described earlier.

To allocate the caller’s frame, we transform `(return-point ,rp ,tail) into
`(begin
   (set! ,fbp (- ,fbp ,nb))
   (return-point ,rp ,tail)
   (set! ,fbp (+ ,fbp ,nb)))
where:
Recall that the stack grows downward, so we subtract nb bytes from the current frame base pointer to allocate a frame of with n slots.

We could allocate a different sized frame for each call, but this would require associating call-undead sets with each return point, and complicate the assignment of new-frame variables.

With the caller’s frame allocated, we also know where the callee’s frame begins, so we also assign each of the new-frame variables from the new-frame lists. In order, each new-frame variable is assigned to a frame variable starting with (make-fvar n). These assignments are added to the assignment field the enclosing block.

The output is Asm-pred-lang-v6/framed, which only changes in its info fields compared to Asm-pred-lang-v6/pre-framed.

  info ::= (#:from-contract (info/c (new-frames (frame ...)) (locals (aloc ...)) (call-undead (loc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc fvar) ...))))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc fvar) ...))))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  aloc ::= aloc?
     
  label ::= label?
     
  fvar ::= fvar?
     
  rloc ::= register?
  | fvar?

The only differences are in the info field. The call-undead sets and new-frame fields are removed.

Note that if the undead-outs were being preserved for debugging, they must be removed after allocate-frames, since this pass invalidates the undead-out sets.

We remove the assigned variables from the locals set, to allow later passes to assume the locals set are all unassigned.

Compiles Asm-pred-lang-v6/pre-framed to Asm-pred-lang-v6/framed by allocating frames for each non-tail call, and assigning all new-frame variables to frame variables in the new frame.

Example:
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       allocate-frames
       assign-call-undead-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions
       impose-calling-conventions
       normalize-bind
       sequentialize-let)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (call L.swap.1 y.2 x.1)])
                   z.3))))
         (call L.swap.1 1 2)))))

(module

  ((locals (tmp-ra.14))

   (conflicts

    ((tmp-ra.14 (fv0 fv1 rbp))

     (rbp (r15 fv0 fv1 tmp-ra.14))

     (fv1 (r15 fv0 rbp tmp-ra.14))

     (fv0 (r15 rbp fv1 tmp-ra.14))

     (r15 (rbp fv0 fv1))))

   (assignment ()))

  (define L.swap.1

    ((locals (z.3 x.1 y.2))

     (conflicts

      ((y.2 (rbp tmp-ra.11 x.1 nfv.13))

       (x.1 (y.2 rbp tmp-ra.11 fv1))

       (tmp-ra.11 (y.2 x.1 rbp fv1 fv0 rax z.3))

       (z.3 (rbp tmp-ra.11))

       (nfv.13 (r15 nfv.12 rbp y.2))

       (nfv.12 (r15 rbp nfv.13))

       (rbp (y.2 x.1 tmp-ra.11 rax z.3 r15 nfv.12 nfv.13))

       (r15 (rbp nfv.12 nfv.13))

       (rax (rbp tmp-ra.11))

       (fv0 (tmp-ra.11))

       (fv1 (x.1 tmp-ra.11))))

     (assignment ((tmp-ra.11 fv2) (nfv.12 fv3) (nfv.13 fv4))))

    (begin

      (set! tmp-ra.11 r15)

      (set! x.1 fv0)

      (set! y.2 fv1)

      (if (< y.2 x.1)

        (begin (set! rax x.1) (jump tmp-ra.11 rbp rax))

        (begin

          (begin

            (set! rbp (- rbp 24))

            (return-point L.rp.4

              (begin

                (set! nfv.13 x.1)

                (set! nfv.12 y.2)

                (set! r15 L.rp.4)

                (jump L.swap.1 rbp r15 nfv.12 nfv.13)))

            (set! rbp (+ rbp 24)))

          (set! z.3 rax)

          (set! rax z.3)

          (jump tmp-ra.11 rbp rax)))))

  (begin

    (set! tmp-ra.14 r15)

    (set! fv1 2)

    (set! fv0 1)

    (set! r15 tmp-ra.14)

    (jump L.swap.1 rbp r15 fv0 fv1)))

Example:
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions
       impose-calling-conventions
       normalize-bind
       sequentialize-let)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (call L.swap.1 y.2 x.1)])
                   z.3))))
         (call L.swap.1 1 2)))))

(module

  ((new-frames ())

   (locals (tmp-ra.18))

   (call-undead ())

   (undead-out

    ((tmp-ra.18 rbp)

     (tmp-ra.18 fv1 rbp)

     (tmp-ra.18 fv1 fv0 rbp)

     (fv1 fv0 r15 rbp)

     (fv1 fv0 r15 rbp)))

   (conflicts

    ((tmp-ra.18 (fv0 fv1 rbp))

     (rbp (r15 fv0 fv1 tmp-ra.18))

     (fv1 (r15 fv0 rbp tmp-ra.18))

     (fv0 (r15 rbp fv1 tmp-ra.18))

     (r15 (rbp fv0 fv1)))))

  (define L.swap.1

    ((new-frames ((nfv.16 nfv.17)))

     (locals (nfv.16 nfv.17 z.3 tmp-ra.15 x.1 y.2))

     (undead-out

      ((fv0 fv1 tmp-ra.15 rbp)

       (fv1 x.1 tmp-ra.15 rbp)

       (y.2 x.1 tmp-ra.15 rbp)

       ((y.2 x.1 tmp-ra.15 rbp)

        ((tmp-ra.15 rax rbp) (rax rbp))

        (((rax tmp-ra.15 rbp)

          ((y.2 nfv.17 rbp)

           (nfv.17 nfv.16 rbp)

           (nfv.17 nfv.16 r15 rbp)

           (nfv.17 nfv.16 r15 rbp)))

         (z.3 tmp-ra.15 rbp)

         (tmp-ra.15 rax rbp)

         (rax rbp)))))

     (call-undead (tmp-ra.15))

     (conflicts

      ((y.2 (rbp tmp-ra.15 x.1 nfv.17))

       (x.1 (y.2 rbp tmp-ra.15 fv1))

       (tmp-ra.15 (y.2 x.1 rbp fv1 fv0 rax z.3))

       (z.3 (rbp tmp-ra.15))

       (nfv.17 (r15 nfv.16 rbp y.2))

       (nfv.16 (r15 rbp nfv.17))

       (rbp (y.2 x.1 tmp-ra.15 rax z.3 r15 nfv.16 nfv.17))

       (r15 (rbp nfv.16 nfv.17))

       (rax (rbp tmp-ra.15))

       (fv0 (tmp-ra.15))

       (fv1 (x.1 tmp-ra.15)))))

    (begin

      (set! tmp-ra.15 r15)

      (set! x.1 fv0)

      (set! y.2 fv1)

      (if (< y.2 x.1)

        (begin (set! rax x.1) (jump tmp-ra.15 rbp rax))

        (begin

          (return-point L.rp.5

            (begin

              (set! nfv.17 x.1)

              (set! nfv.16 y.2)

              (set! r15 L.rp.5)

              (jump L.swap.1 rbp r15 nfv.16 nfv.17)))

          (set! z.3 rax)

          (set! rax z.3)

          (jump tmp-ra.15 rbp rax)))))

  (begin

    (set! tmp-ra.18 r15)

    (set! fv1 2)

    (set! fv0 1)

    (set! r15 tmp-ra.18)

    (jump L.swap.1 rbp r15 fv0 fv1)))

Because the frame allocator sits in the middle of our register allocation pipeline, the optimized allocator assign-homes-opt is no longer a drop-in replacement for the naive assign-homes. We therefore remove assign-homes-opt and in-line the passes in the current-pass-list.

1.8.7.4 Adjusting the Register Allocator

Frames are now implemented and all new-frame variables and variables live across a call are assigned to frame variables. We need to adjust our register allocator so that it does not try to spill variables into frame variables that are already taken.

To do this, we essentially remove spilling from assign-registers. Instead, in the output, the locals set should include only spilled locations. A separate pass (which looks suspiciously like assign-call-undead-variables) handles spilling.

Asm-pred-lang-v6/spilled is defined below, only changed by allowing assignments to arbitrary rlocs. We typeset the differences with respect to Asm-pred-lang-v6/framed.

  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc rloc fvar) ...))))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc rloc) ...))))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?

Performs graph-colouring register allocation, compiling Asm-pred-lang v6/framed to Asm-pred-lang v6/spilled by decorating programs with their register assignments.

The final change to the register allocator is to assign spilled variables to frame variables.

The new language, Asm-pred-lang-v6/assignments, is the familiar output of our register allocation process, which has all abstract locations assigned to physical locations.

  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...)) (assignment ((aloc rloc) ...))))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (assignment ((aloc rloc) ...))))
     
  frame ::= (aloc ...)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= loc
  | label
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?

After assigning frame variables, we can discard all the assorted info fields and keep only the assignment.

Compiles Asm-pred-lang-v6/spilled to Asm-pred-lang-v6/assignments by allocating all abstract locations in the locals set to free frame variables.

Finally, we actually replace abstract locations with physical locations. Below we define Nested-asm-lang-fvars v6, typeset with differences compared to Nested-asm-lang v5.

  p ::= (module (define label tail) ... tail)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (assignment ((aloc rloc) ...))))
     
  frame ::= (aloc ...)
     
  tail ::= (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
     
  rloc ::= register?
  | fvar?
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= reg
  | fvar
     
  trg ::= loc
  | label
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  fvar ::= fvar?
     
  label ::= label?

We need to update the pass to handle return-points.

Compiles Asm-pred-lang v6/assignments to Nested-asm-lang-fvars v6 by replacing all abstract location with physical locations using the assignment described in the assignment info field.

Example:
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       replace-locations
       assign-frame-variables
       assign-registers
       allocate-frames
       assign-call-undead-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions
       impose-calling-conventions
       normalize-bind
       sequentialize-let)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (call L.swap.1 y.2 x.1)])
                   z.3))))
         (call L.swap.1 1 2)))))

(module

  (define L.swap.1

    (begin

      (set! fv2 r15)

      (set! r14 fv0)

      (set! r15 fv1)

      (if (< r15 r14)

        (begin (set! rax r14) (jump fv2))

        (begin

          (begin

            (set! rbp (- rbp 24))

            (return-point L.rp.6

              (begin

                (set! fv4 r14)

                (set! fv3 r15)

                (set! r15 L.rp.6)

                (jump L.swap.1)))

            (set! rbp (+ rbp 24)))

          (set! r15 rax)

          (set! rax r15)

          (jump fv2)))))

  (begin

    (set! r15 r15)

    (set! fv1 2)

    (set! fv0 1)

    (set! r15 r15)

    (jump L.swap.1)))

1.8.8 Adjusting Frame Variables

We changed the invariants on fbp, the current-frame-base-pointer-register, when added the stack of frames. We now allow it to be incremented and decremented by an integer literal. This affects how we implement frame variables.

Previously, the frame variables fv1 represented the address (fbp - 8) in all contexts. However, now the compilation is non-trivial, as it must be aware of increments and decrements to the fbp.

Consider the example snippet
(begin
  (set! rbp (- rbp 8))
  (return-point L.rp.8
    (begin
      (set! rdi fv3)
      (jump L.f.1)))
  (set! rbp (+ rbp 8)))

In this example, the frame variable fv3 is being passed to the procedure L.f.1 in a non-tail call. fv3 does not refer to 3rd frame variable on callee’s, but the 3rd frame variable on the caller’s frame. Since the frame is allocated prior to the return point, we need to fix-up this index by translating frame variables relative to frame allocations introduced around return points.

To do this, we change implement-fvars to be aware of the current fbp offset. We can do this more easily by relocating implement-fvars in the compiler pipeline, to before expose-basic-blocks. This allows the compiler to make use of the nesting structure of the program while tracking changes to fbp.

To update implement-fvars, we need to keep an accumulator of the current offset from the base of the frame. On entry to a block, frame variables start indexing from the base of the frame, so the offset is 0. So, fv3 corresponds to (fbp - 24) ((- (* 3 (current-word-size-bytes)) 0)). After pushing/allocating a frame, such as (set! fbp (- fbp 8)), fv3 corresponds to (fbp - 16) ((+ (* 3 (current-word-size-bytes)) -8)). After popping/deallocating a frame, such as (set! fbp (+ fbp 8)) fv3 corresponds to (fbp - 24) ((+ (* 3 (current-word-size-bytes)) 0)) again.

Recall that fbp is only incremented or decremented by integer literal values, like those generated by allocate-frames. Other assignments to fbp are invalid programs. This means we don’t have to consider complicated data flows into fbp.

The source language for implement-fvars, Nested-asm-lang-fvars v6, is defined below typeset with respect to Nested-asm-lang v5.

  p ::= (module (define label tail) ... tail)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  binop ::= *
  | +
  | -
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  opand ::= loc
  | int64
     
  triv ::= opand
  | label
     
  loc ::= reg
  | fvar
     
  trg ::= loc
  | label
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  fvar ::= fvar?
     
  label ::= label?

The language does not change much, only adding a new binop.

The target language simply changes fvars to addrs. We define Nested-asm-lang-v6 below.

  p ::= (module (define label tail) ... tail)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  fvar ::= fvar?
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  triv ::= opand
  | label
     
  opand ::= loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

All languages following this pass need to be updated to use addrs instead of fvars. This should not affect most passes.

Reifies fvars into displacement mode operands.

1.8.9 Implementing Return Points

Finally, to accommodate non-tail calls, we introduced a new abstraction: return points. We must now implement this abstraction.

To implement return points, we need to compile all the instructions following the return points into labelled blocks, since that is our low-level implementation of labels. We lift all the instructions following the return point into a new block, and merge the tail implementing the call into the begin of the caller. Essentially, we transform:
`(begin
   ,ss1 ...
   ,(return-point ,rp ,tail)
   ,ss2 ...)
into:
`(define ,rp (begin ,ss2 ...))
`(begin
   ,ss1 ...
   ,tail)

This transformation is part of the expose-basic-blocks pass, which lifts many inline blocks into top-level explicitly labelled blocks, and should now do the same for return points.

The target language of the transformation is Block-pred-lang v6, defined below as a change over Block-pred-lang v5.

  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  loc ::= reg
  | addr
  | fvar
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  aloc ::= aloc?
     
  fvar ::= fvar?
     
  label ::= label?
  p ::= (module b (define label tail) ... b tail)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg) tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ...)
  | (if pred effect effect)
  | (return-point label tail)
     
  loc ::= reg
  | addr
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?
  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  triv ::= opand
  | label
     
  opand ::= loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

There are no major differences, since we are compiling a new abstraction into an old one. Note that only - has been added to the binops.

Compile the Nested-asm-lang v6 to Block-pred-lang v6, eliminating all nested expressions by generate fresh basic blocks and jumps.

Example:
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       expose-basic-blocks
       implement-fvars
       replace-locations
       assign-frame-variables
       assign-registers
       allocate-frames
       assign-call-undead-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions
       impose-calling-conventions
       normalize-bind
       sequentialize-let)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (call L.swap.1 y.2 x.1)])
                   z.3))))
         (call L.swap.1 1 2)))))

(module

  (define L.__main.8

    (begin

      (set! r15 r15)

      (set! (rbp - 8) 2)

      (set! (rbp - 0) 1)

      (set! r15 r15)

      (jump L.swap.1)))

  (define L.swap.1

    (begin

      (set! (rbp - 16) r15)

      (set! r14 (rbp - 0))

      (set! r15 (rbp - 8))

      (if (< r15 r14) (jump L.__nested.9) (jump L.__nested.10))))

  (define L.rp.7

    (begin

      (set! rbp (+ rbp 24))

      (set! r15 rax)

      (set! rax r15)

      (jump (rbp - 16))))

  (define L.__nested.9 (begin (set! rax r14) (jump (rbp - 16))))

  (define L.__nested.10

    (begin

      (set! rbp (- rbp 24))

      (set! (rbp - 8) r14)

      (set! (rbp - 0) r15)

      (set! r15 L.rp.7)

      (jump L.swap.1))))

1.8.10 Final Passes

resolve-predicates and flatten-program should not need any real changes, since they don’t inspect binary operations and didn’t inspect at frame variables

We define Block-asm-lang v6 below.

  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if (relop loc opand) (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  loc ::= reg
  | addr
  | fvar
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  fvar ::= fvar?
  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if (relop loc opand) pred (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  loc ::= reg
  | addr
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if (relop loc opand) (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  triv ::= opand
  | label
     
  opand ::= loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

Compile the Block-pred-lang v6 to Block-asm-lang v6 by manipulating the branches of if statements to resolve branches.

We define Para-asm-lang v6 below.

  p ::= (begin s ...)
     
  s ::= (halt opand)
  | (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  loc ::= reg
  | addr
  | fvar
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  fvar ::= fvar?
     
  label ::= label?
  p ::= (begin s ...)
  | (module b ... b)
     
  b ::= (define label tail)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if (relop loc opand) (jump trg) (jump trg))
     
  s effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?
  p ::= (begin s ...)
     
  s ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  triv ::= opand
  | label
     
  opand ::= loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  int64 ::= int64?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

Compile Block-asm-lang v6 to Para-asm-lang v6 by flattening basic blocks into labeled instructions.

The only two passes that should require changes are patch-instructions and generate-x64. The languages change only in minor ways.

Para-asm-lang v6 is defined below.

  p ::= (begin s ...)
     
  s ::= (halt opand)
  | (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  loc ::= reg
  | addr
  | fvar
     
  binop ::= *
  | +
  | -
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  fvar ::= fvar?
     
  label ::= label?
  p ::= (begin s ...)
  | (module b ... b)
     
  b ::= (define label tail)
     
  tail ::= (jump trg)
  | (begin effect ... tail)
  | (if (relop loc opand) (jump trg) (jump trg))
     
  s effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  dispoffset ::= dispoffset?
     
  label ::= label?
  p ::= (begin s ...)
     
  s ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (compare loc opand)
  | (jump-if relop trg)
     
  triv ::= opand
  | label
     
  opand ::= loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  int64 ::= int64?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

patch-instructions should be updated to work over addrs instead of fvarss. This can be done by changing a few predicates, depending on the design.

Paren-x64 v6 is defined below.

  p ::= (begin s ...)
     
  s ::= (set! addr int32)
  | (set! addr trg)
  | (set! reg loc)
  | (set! reg triv)
  | (set! reg_1 (binop reg_1 int32))
  | (set! reg_1 (binop reg_1 loc))
  | (with-label label s)
  | (jump trg)
  | (compare reg opand)
  | (jump-if relop label)
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  binop ::= *
  | +
  | -
     
  int64 ::= int64?
     
  int32 ::= int32?
     
  dispoffset ::= dispoffset?
     
  label ::= label?
  p ::= (begin s ...)
     
  s ::= (set! addr int32)
  | (set! addr trg)
  | (set! reg loc)
  | (set! reg loc triv)
  | (set! reg_1 (binop reg_1 int32))
  | (set! reg_1 loc_1 (binop reg_1 loc loc_1 opand))
  | (jump trg)
  | (with-label label s)
  | (jump trg)
  | (compare reg loc opand)
  | (jump-if relop label trg)
     
  trg ::= reg
  | label
     
  triv ::= trg
  | int64
  | opand
  | label
     
  opand ::= reg
  | loc
  | int64
     
  trg ::= loc
  | label
     
  loc ::= reg
  | addr
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  int32 ::= int32?
     
  dispoffset ::= dispoffset?
     
  label ::= label?
  p ::= (begin s ...)
     
  s ::= (set! addr int32)
  | (set! addr trg)
  | (set! reg loc)
  | (set! reg triv)
  | (set! reg_1 (binop reg_1 int32))
  | (set! reg_1 (binop reg_1 loc))
  | (with-label label s)
  | (jump trg)
  | (compare reg opand)
  | (jump-if relop label)
     
  trg ::= reg
  | label
     
  triv ::= trg
  | int64
     
  opand ::= reg
  | int64
     
  loc ::= reg
  | addr
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r10
  | r11
  | r12
  | r13
  | r14
  | r15
     
  addr ::= (fbp - dispoffset)
     
  fbp ::= frame-base-pointer-register?
     
  binop ::= *
  | +
  | -
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
     
  int32 ::= int32?
     
  dispoffset ::= dispoffset?
     
  label ::= label?

Compile the Para-asm-lang v6 to Paren-x64 v6 by patching instructions that have no x64 analogue into to a sequence of instructions and an auxiliary register from current-patch-instructions-registers.

generate-x64 needs to be updated to generate the new binop. Ideally, there is a separate helper for generating binops, so this is only a minimal change.

The new binary operation, -, corresponds to the x64 instruction sub, and has the same restrictions as the add and mul instructions.

procedure

(generate-x64 p)  (and/c string? x64-instructions?)

  p : paren-x64-v6?
Compile the Paren-x64 v6 program into a valid sequence of x64 instructions, represented as a string.

1.8.11 Appendix: Overview

%3L0Values-lang v6L0->L0 check-values-langL1Values-unique-lang v6L0->L1 uniquifyL3Imp-mf-lang v6L1->L3 sequentialize-letL2Proc-imp-cmf-lang v6L3->L2 normalize-bindL4Imp-cmf-lang v6L5Asm-pred-lang v6L4->L5 select-instructionsL2->L4 impose-calling-conventionsL6Asm-pred-lang v6/localsL5->L6 uncover-localsL7Asm-pred-lang v6/undeadL6->L7 undead-analysisL8Asm-pred-lang v6/conflictsL7->L8 conflict-analysisL81Asm-pred-lang v6/pre-framedL8->L81 assign-call-undead-variablesL82Asm-pred-lang v6/framedL81->L82 allocate-framesL83Asm-pred-lang v6/spilledL82->L83 assign-registersL9Asm-pred-lang v6/assignmentsL83->L9 assign-frame-variablesL10Nested-asm-lang-fvars v6L9->L10 replace-locationsL10->L10 optimize-predicatesL10_1Nested-asm-lang v6L10->L10_1 implement-fvarsL11Block-pred-lang v6L10_1->L11 expose-basic-blocksL12Block-asm-lang v6L11->L12 resolve-predicatesL12_1Para-asm-lang v6L12->L12_1 flatten-programL16Paren-x64 v6L12_1->L16 patch-instructionsL14x64L15integerL14->L15 executeL16->L14 generate-x64L16->L15 interp-paren-x64L17Paren-x64-rt v6L16->L17 link-paren-x64L17->L15 interp-loop

Figure 6: Overview of Compiler Version 6