1.5 Register Allocation

1.5.1 Preface: What’s wrong with our language?

With Asm-lang v2, we introduced abstract locations to free the programmer from thinking about physical locations. Unfortunately, our implementation strategy has a severe limitation. While it’s simple and works, it generates extremely slow code! Each and every variable assignment or reference accesses memory. While memory accesses have improved a lot compared to old computers due to caching, accessing memory is still orders of magnitude slower than accessing a register when our variable is not in the cache (and, in general, it won’t be in the cache). Our compiler will have better performance if we help the machine out by using registers as much as possible.

Assigning abstract locations to registers automatically is a non-trivial task. We started out by compiling to frame variables because there are infinitely many frame variables, but only 16 registers. Actually, fewer than 16, since the compiler and run-time system reserve some of those for various purposes. To assign an abstract location a new frame variable is trivial—just pick a new one, there’s always a new one, just like abstract locations. This one-to-one mapping between abstractions means assignment is a simple, systematic algorithm. To assign an abstract location to a register, however, we need to pick a register that isn’t currently in use, and figuring out which registers aren’t in use requires a clever program analysis.

In general, being clever should be a last resort.

1.5.2 Optimizing Location Assignment

Conceptually, register allocation is a simple idea.

Undead analysis: figure out which abstract locations might still be needed after each instruction.
Conflict analysis: figure out which abstract locations cannot be assigned to the same physical location because they both contain values that are needed at the same time.
Register allocation: assign each abstract locations to a register that is different from any conflicting abstract locations.
Spilling: if we fail to find a register for an abstract location, put it in a frame variable instead.

Our Asm-lang v2 compiler is already a trivial implementation of this algorithm. It assumes every abstract location might be needed forever, and is in conflict with every other abstract location. It doesn’t bother trying, and thus fails, to find a register for every abstract location, and so spills everything to the frame. But we want to optimize this process.

In general, we will never do a perfect job, due to Rice’s Theorem. Rice’s Theorem is a huge buzz kill. It tells us, formally, that we cannot decide anything "interesting" about any Turing-complete program. Formally, "interesting" is defined as any property that is not either true for every program, or false for every program. "Decide" means write a program that returns either true or false on all programs. The halting program is an instance of Rice’s Theorem. Rice’s Theorem tells us we will never be able to do a perfect job—not when optimizing, not when analyzing, not when asking whether two programs are equivalent, not when deciding which program is "better". We will only be able to make trade-offs. We want to try to do a better job. Well, not better—we can never decide what is "better" due to Rice’s Theorem. But we’d like to produce code that runs faster, and are willing to trade a more complex and slower compiler for it.

Since register allocation is a modification of our existing abstract location assignment strategy, we already have a source and target language in mind. We start from an existing language Asm-lang v2/locals, which is reproduced below:

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

This language, the output of uncover-locals, records which abstract locations are referenced in the program and thus need to be assigned physical locations. The register allocator takes over from here to perform that assignment.

And we use Asm-lang v2/assignments, reproduced below.

DiffAsm-lang v2/assignmentsAsm-lang v2/locals

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc loc) ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

loc	::=	reg
	\|	fvar

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r12
	\|	r13
	\|	r14
	\|	r15

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

fvar	::=	fvar?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc loc) ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

loc	::=	reg
	\|	fvar

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r12
	\|	r13
	\|	r14
	\|	r15

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

fvar	::=	fvar?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

After this, our existing pass replace-locations should handle replacing abstract locations by the assigned physical locations.

Any compiler optimization should be seen as a transformation between programs in the same language, i.e., an intra-language transformation. The point is not to introduce a new layer of abstraction, but to improve some performance metric of a program in some language. In this case, the optimization happens in Asm-lang v2/assignments; all the programs we generate are equivalent to a program that only uses frame variables in its assigments field. In this case, performing the optimization requires additional steps of analysis, so we introduce several administrative languages prior to Asm-lang v2/assignments.

To emphasize that this is an optimization, we design assign-homes-opt as a drop-in replacement for assign-homes.

procedure
(assign-homes-opt p) → nested-asm-lang-v2?
p : asm-lang-v2?
Compiles Asm-lang v2 to Nested-asm-lang v2, replacing each abstract location with a physical location. This version performs graph-colouring register allocation.

This pass is still a composition of several passes, which we design next.

1.5.3 Undeadness Analysis

We begin the register allocation by figuring out which locations might still be needed after each instruction. This process is often called liveness analysis, but we refer to it as undeadness analysis for reasons we explain shortly.

A variable (either abstract location or physical location) with a particular value that definitely will be used is considered live, and any variable (with a particular value) that definitely won’t be used is considered dead.

Recall that due to Rice’s Theorem, we know it’s generally impossible to decide whether a variable is dead or alive. This means that when writing an analysis, we must assume partial knowledge. The result is that we ignore liveness entirely, which is hard to prove, and focus on deadness, which is easy to prove.

Asm-lang v2/locals is a simple enough language that we can tell whether a variable is dead or alive. Later, when we add new instructions, we will modify the undead analysis and find variables that aren’t necessarily live or dead, and must be assumed to be undead. This is also necessary if we want to handle linking, or separate compilation.

We cannot in general decide the value of a variable at a given instruction. Instead, we focus on analyzing each assignment to a variable, which might change the value of the variable.

We assume that any variable that gets used, or might get used, might be not dead, i.e., we assume it is undead, and consider a variable dead only when we have conclusive proof—like witnessing an instruction driving a new value through its heart, er, storing a new value in that variable.

We collect the undead variables into sets. The undead-out set of an instruction is a set of variables that are undead after executing that instruction.

Most compilers call these live-out or live-after sets. This suggests that variables are definitely alive, and that the analysis is computing sets of variables that are definitely alive, neither of which is true. Undead is not exactly the same as not-definitely-dead, but it’s more suggestive of reality for compilers.

We design an algorithm for computing the undead-out sets by taking a single linear pass over the program. Our algorithm is fast, but slightly counter-intuitive to implement.

To compute the undead-out sets for each instruction, we analyze the program by looping over the instruction sequence backwards, starting with the last statement. The loop takes an instruction and its undead-out set as input. We analyze the instruction with its undead-out set and compute an undead-in set for the instruction, which is the same as the undead-out set of the preceding instruction in the program. That is, the undead-in set for an instruction s_i is the undead-out set for the instruction s_{i-1} in the program.

To start the loop, this algorithm requires a default undead-out set for the last instruction in the scope of our analysis. The default undead-out set for Asm-lang v2/locals is the empty set—no variables are alive after the end of the program, as we are analyzing the entire program at once. In general, the default set may not be empty, depending on the language and the scope of the analysis. For example, in Paren-x64 v1, we assume rax after the program finishes. When we introduce functions, we will assume that return value locations are live at the end of functions. If we had global variables, we might need to assume they are live at the end of a program.

Each iteration of the loop performs the following analysis on a particular instruction. We start by assuming the undead-in set is the same as the undead-out set, then update it depending on what happens in the instruction. If a variable is defined, i.e., its value is overwritten in the instruction, it is definitely dead upon entry to this intruction, so we remove it from the undead-in set. If a variable is referenced in the instruction, it ought to be live—we don’t actually know, since that instruction’s result itself might not be used, or the instruction itself might never be executed, but the variable is at least acting like it’s live—and is added to the undead-in set.

This algorithm creates the undead-out sets for each instruction so that later passes can associate each instruction with its undead-out set. There are many ways to associate the undead-out sets with instructions. We choose a representation that implicitly associates each set with an instruction. Since our programs are trees of instructions, we collect the undead-out sets for each instruction into a tree of undead-out sets, whose tree structure mirrors the program’s tree structure. We define the data undead-set tree to mirror the structure of Asm-lang v2 programs.

An undead-set tree is one of:

an undead-out set (aloc ...), corresponding to the undead-out set for a single instruction such as (halt triv) or (set! aloc triv).
a list of undead-set trees, (undead-set-tree?_1 ... undead-set-tree?_2), corresponding to a begin statement (begin effect_1 ... effect_2) or (begin effect_1 ... tail). The first element of the list represents undead-set tree for the first effect, the second element represents the undead-set tree for the second effect, and so on.

We always traverse an undead-set tree together with a corresponding program; the undead-set tree cannot be interpreted in isolation. For example, given a list of undead-set trees undead-outs and a list of effects ss, we would traverse with the following template:

(define (fn-for-s-and-undead-outs ss undead-outs)
  (match (cons ss undead-outs)
    [(cons '() '())
     (... case-for-empty ...)]
    [(cons `(,s ,rest-ss ...) `(,undead-out ,rest-undead-outs ...))
     (... (fn-for-s-and-undead-out s undead-out)
          (fn-for-ss-and-undead-outs rest-ss rest-undead-outs))]))

To describe the output of the analysis, we define a new administrative language. We collect the undead-set tree a new info field. Below, we define Asm-lang v2/undead. The only change compared to Asm-lang v2/locals is in the info field, so we typeset the difference in the info field.

DiffAsm-lang v2/undeadAsm-lang v2/locals

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree?)))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree?)))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

procedure
(undead-analysis p) → asm-lang-v2/undead?
  p : asm-lang-v2/locals?
Performs undeadness analysis, decorating the program with undead-set tree. Only the info field of the program is modified.
Examples:
> (undead-analysis
   '(module ((locals (x.1)))
      (begin
        (set! x.1 42)
        (halt x.1))))
'(module
   ((locals (x.1)) (undead-out ((x.1) ())))
   (begin (set! x.1 42) (halt x.1)))
> (undead-analysis
   '(module ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1)))
      (begin
        (set! v.1 1)
        (set! w.2 46)
        (set! x.3 v.1)
        (set! p.1 7)
        (set! x.3 (+ x.3 p.1))
        (set! y.4 x.3)
        (set! p.1 4)
        (set! y.4 (+ y.4 p.1))
        (set! z.5 x.3)
        (set! z.5 (+ z.5 w.2))
        (set! t.6 y.4)
        (set! p.1 -1)
        (set! t.6 (* t.6 p.1))
        (set! z.5 (+ z.5 t.6))
        (halt z.5))))
'(module
   ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1))
    (undead-out
     ((v.1)
      (v.1 w.2)
      (x.3 w.2)
      (p.1 x.3 w.2)
      (x.3 w.2)
      (y.4 x.3 w.2)
      (p.1 y.4 x.3 w.2)
      (x.3 w.2 y.4)
      (w.2 z.5 y.4)
      (y.4 z.5)
      (t.6 z.5)
      (p.1 t.6 z.5)
      (t.6 z.5)
      (z.5)
      ())))
   (begin
     (set! v.1 1)
     (set! w.2 46)
     (set! x.3 v.1)
     (set! p.1 7)
     (set! x.3 (+ x.3 p.1))
     (set! y.4 x.3)
     (set! p.1 4)
     (set! y.4 (+ y.4 p.1))
     (set! z.5 x.3)
     (set! z.5 (+ z.5 w.2))
     (set! t.6 y.4)
     (set! p.1 -1)
     (set! t.6 (* t.6 p.1))
     (set! z.5 (+ z.5 t.6))
     (halt z.5)))

An important corner case to consider is what happens when unused variables appear in a program.

Examples:

> (undead-analysis
   '(module ((locals (x.1 y.1)))
      (begin
        (set! y.1 42)
        (set! x.1 5)
        (halt x.1))))
'(module
   ((locals (x.1 y.1)) (undead-out (() (x.1) ())))
   (begin (set! y.1 42) (set! x.1 5) (halt x.1)))
> (undead-analysis
   '(module ((locals (x.1 y.1)))
      (begin
        (set! x.1 5)
        (set! y.1 42)
        (halt x.1))))
'(module
   ((locals (x.1 y.1)) (undead-out ((x.1) (x.1) ())))
   (begin (set! x.1 5) (set! y.1 42) (halt x.1)))

We could use the undead information to design an optimization to remove dead variables. However, to separate concerns, we should not do this during register allocation. Instead, later, we’ll design a more general optimization pass that happens before register allocation.

1.5.4 Conflict Analysis

To assign abstract locations to physical locations efficiently, we need to know when any two variables are in conflict, i.e., contain different values might be in use at the same time.

We start by defining what a conflict is precisely.

True Definition of Conflict: Any variable that gets a new value during an instruction is in conflict with every variable that (1) has a different value at the same time and (2) will still be used after that instruction.

Unfortunately, due to Rice’s Theorem, we cannot decide either property. We cannot figure out the value of every variable before run time; if we could, compiling would not be necessary. We also do not know which variables are live, only which are undead. Therefore, we can only approximate conflicts.

To approximate conflicts, we ignore values, and once more focus on assignments to variables. An assignment to the variable means it might take on a new value that might be different from the value of any variable which might be live at that point. We have already approximated liveness via undead-out sets, so what remains is to approximate when a variable takes on a new value. Below, we describe how to approximate conflicts and slightly refine these criteria.

We choose to represent conflicts in a data structure called a conflict graph. Interpreted as an undirected graph, the variables are represented as nodes (also known as vertexes), and conflicts between variables are represented as an edge from the variable to each of the variables in the associated set of conflicts. If there is an edge between any two nodes, then they are in conflict. Interpreted as a dictionary, the conflict graph maps each variable to a set of variables with which it is in conflict.

We create a conflict graph from the undead-out sets as follows. Any variable that is defined during an instruction is in conflict with every variable (except itself) in the undead-out set associated with that instruction. For example, the variable x.1 is defined in (set! x.1 (+ x.1 x.2)), while the variables x.2 and x.1 are referenced. No variable can conflict with itself, since it always has the same value as itself. We approximate the values of variables by assuming each definition changes the variable’s value, and by assuming each variable in the undead-out set also has a unique value. This approximation tells us that x.1 cannot be assigned the same physical location as any other variable in the undead-out set. If x.2 is undead-out at this instruction, and we try to put x.1 in the same physical location as x.2, then the value of x.2 would be overwritten by the value of (+ x.1 x.2).

We can reduce the number of conflicts, and thus possibly fit more variables into registers, by observing that one instruction does tell us that two values will be the same. A move instruction, such as (set! x.1 x.2), is an instruction that simply defines the value of one variable to be that of another.

Approximation of Conflict

Any variable defined during a non-move instruction is in conflict with every variable (except itself) in the undead-out set associated with the instruction.
Any variable defined during a move instruction is in conflict with every variable in the undead-out set associated with the instruction, except itself and the variable referenced in the move.

To implement conflict analysis, we design the language Asm-lang v2/conflicts to capture the conflict graph. We typeset the difference compared to Asm-lang v2/undead.

DiffAsm-lang v2/conflictsAsm-lang v2/undead

info

::=

(#:from-contract (info/c (locals (aloc ...)) (conflicts ((aloc (aloc ...)) ...)) (undead-out undead-set-tree?)))

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (conflicts ((aloc (aloc ...)) ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree?)))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

The info field is extended with a conflict graph, represented as an association list from a variable to sets of abstract locations.

As in Asm-lang v2/undead, the info field also contains a declaration of the abstract locations that may be used in the program, and (as usual) possibly other non-required but useful information.

To implement conflict analysis, we simultaneously traverse a program with its undead-set tree, and analyze each instruction according to the approximate conflict definition above. We start with a graph that initially contains a node for every abstract location in the locals set, and extend the graph with conflicts as we discover them.

procedure
(conflict-analysis p) → asm-lang-v2/conflicts?
  p : asm-lang-v2/undead?
Decorates a program with its conflict graph.
Examples:
> (conflict-analysis
   '(module ((locals (x.1))
             (undead-out ((x.1) ())))
      (begin
        (set! x.1 42)
        (halt x.1))))
'(module
   ((locals (x.1)) (conflicts ((x.1 ()))))
   (begin (set! x.1 42) (halt x.1)))
> (conflict-analysis
   '(module ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1))
             (undead-out
              ((v.1)
               (v.1 w.2)
               (w.2 x.3)
               (p.1 w.2 x.3)
               (w.2 x.3)
               (y.4 w.2 x.3)
               (p.1 y.4 w.2 x.3)
               (y.4 w.2 x.3)
               (z.5 y.4 w.2)
               (z.5 y.4)
               (t.6 z.5)
               (t.6 z.5 p.1)
               (t.6 z.5)
               (z.5)
               ())))
      (begin
        (set! v.1 1)
        (set! w.2 46)
        (set! x.3 v.1)
        (set! p.1 7)
        (set! x.3 (+ x.3 p.1))
        (set! y.4 x.3)
        (set! p.1 4)
        (set! y.4 (+ y.4 p.1))
        (set! z.5 x.3)
        (set! z.5 (+ z.5 w.2))
        (set! t.6 y.4)
        (set! p.1 -1)
        (set! t.6 (* t.6 p.1))
        (set! z.5 (+ z.5 t.6))
        (halt z.5))))
'(module
   ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1))
    (conflicts
     ((p.1 (z.5 t.6 y.4 x.3 w.2))
      (t.6 (p.1 z.5))
      (z.5 (p.1 t.6 w.2 y.4))
      (y.4 (z.5 x.3 p.1 w.2))
      (x.3 (y.4 p.1 w.2))
      (w.2 (z.5 y.4 p.1 x.3 v.1))
      (v.1 (w.2)))))
   (begin
     (set! v.1 1)
     (set! w.2 46)
     (set! x.3 v.1)
     (set! p.1 7)
     (set! x.3 (+ x.3 p.1))
     (set! y.4 x.3)
     (set! p.1 4)
     (set! y.4 (+ y.4 p.1))
     (set! z.5 x.3)
     (set! z.5 (+ z.5 w.2))
     (set! t.6 y.4)
     (set! p.1 -1)
     (set! t.6 (* t.6 p.1))
     (set! z.5 (+ z.5 t.6))
     (halt z.5)))

1.5.5 Register Allocation

Register allocation, as in the step that actually assigns abstract locations to physical locations, takes the set of abstract locations to assign homes, the conflict graph, and some set of assignable registers, and tries to assign each abstract location to a register. As usual, Rice’s Theorem tells us we’ll never be able to decide, in general, the maximal number of abstract locations we can fit in registers. We’ll have to approximate.

We’ll use graph-colouring register allocation, an algorithm that is quadratic (and slows compile time down quite a lot), but usually assigns more abstract locations to registers than other faster algorithms.

It’s worth noting that since this core pass is quadratic, compile time is dominated by this single pass. One might be tempted to try to fuse many of our small compiler passes to save compile time, but most of our compile passes are linear, and have essentially no effect on compile time compared to graph-colouring register allocation.

The core algorithm has a straight-forward recursive description. We recur over the set of locals and produce an assignment, i.e., a dictionary mapping abstract locations to physical locations.

If the set of abstract locations is empty, return the empty assignment.
Otherwise, choose a low-degree abstract location from the input set of abstract locations, if one exists. Otherwise, pick an arbitrary abstract location from the set.
A low-degree abstract location is one with fewer than k conflicts, for some for pre-defined k. We pick k to be the number of registers in the set of assignable registers.
You can simply sort the abstract locations in degree-order and pick the first one, on each iteration of the loop, although this may not be the fastest way to pick a low-degree node. Note that abstract locations are removed from the conflict graph, so we must re-sort or otherwise calculate low-degree on each iteration.
Recur with the chosen abstract location removed from the input set and the conflict graph. The recursive call should return an assignment for all the remaining abstract locations.
Attempt to select a register for the chosen abstract location. You cannot select registers to which conflicting abstract locations were assigned by the recursive call. This attempt succeeds if a low-degree abstract location was chosen, and might fail otherwise (but it depends on which registers got allocated in the recursive call).
- If you succeed in selecting a register, then add the assignment for the chosen abstract location to the result of the recursive call.
- Otherwise, we cannot assign the chosen abstract location to a register. Instead, we spill it, i.e., we assign it a frame variable. We can assign a fresh variable, but we can reduce memory usage by trying to assign a non-conflicting frame variable.

This algorithm is due to R. Kent Dybvig, itself an adaptation of the optimistic register allocation described in "Improvements to graph coloring register allocation" (ACM TOPLAS 6:3, 1994) by Preston Briggs, et al.

To describe the output of the register allocator, we reuse Asm-lang v2/assignments. Below, we typeset the changes compared to Asm-lang v2/conflicts. Note only the info field changes.

DiffAsm-lang v2/assignmentsAsm-lang v2/conflicts

info

::=

(#:from-contract (info/c (locals (aloc ...)) (assignment conflicts ((aloc loc (aloc ...)) ...))))

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc loc) ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

loc	::=	reg
	\|	fvar

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r12
	\|	r13
	\|	r14
	\|	r15

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

fvar	::=	fvar?

p	::=	(module info tail)

info	::=	(#:from-contract (info/c (locals (aloc ...)) (conflicts ((aloc (aloc ...)) ...))))

tail	::=	(halt triv)
	\|	(begin effect ... tail)

effect	::=	(set! aloc triv)
	\|	(set! aloc_1 (binop aloc_1 triv))
	\|	(begin effect ... effect)

triv	::=	aloc
	\|	int64

binop	::=	*
	\|	+

aloc	::=	aloc?

int64	::=	int64?

procedure
(assign-registers p) → asm-lang-v2/assignments?
  p : asm-lang-v2/conflicts?
Performs graph-colouring register allocation. The pass attempts to fit each of the abstract location declared in the locals set into a register, and if one cannot be found, assigns it a frame variable instead.
Examples:
> (assign-registers
   '(module ((locals (x.1))
             (conflicts ((x.1 ()))))
     (begin
       (set! x.1 42)
       (halt x.1))))
'(module
   ((locals (x.1)) (conflicts ((x.1 ()))) (assignment ((x.1 r15))))
   (begin (set! x.1 42) (halt x.1)))
> (parameterize ([current-assignable-registers '(r9)])
    (assign-registers
     '(module ((locals (x.1))
               (conflicts ((x.1 ()))))
       (begin
         (set! x.1 42)
         (halt x.1)))))
'(module
   ((locals (x.1)) (conflicts ((x.1 ()))) (assignment ((x.1 r9))))
   (begin (set! x.1 42) (halt x.1)))
> (parameterize ([current-assignable-registers '()])
    (assign-registers
     '(module ((locals (x.1))
               (conflicts ((x.1 ()))))
       (begin
         (set! x.1 42)
         (halt x.1)))))
'(module
   ((locals (x.1)) (conflicts ((x.1 ()))) (assignment ((x.1 fv0))))
   (begin (set! x.1 42) (halt x.1)))
> (assign-registers
   '(module ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1))
             (conflicts
              ((x.3 (z.5 p.1 y.4 v.1 w.2))
               (w.2 (z.5 p.1 y.4 v.1 x.3))
               (v.1 (w.2 x.3))
               (y.4 (t.6 z.5 p.1 w.2 x.3))
               (p.1 (t.6 z.5 y.4 w.2 x.3))
               (z.5 (t.6 p.1 y.4 w.2 x.3))
               (t.6 (z.5 p.1 y.4)))))
      (begin
        (set! v.1 1)
        (set! w.2 46)
        (set! x.3 v.1)
        (set! p.1 7)
        (set! x.3 (+ x.3 p.1))
        (set! y.4 x.3)
        (set! p.1 4)
        (set! y.4 (+ y.4 p.1))
        (set! z.5 x.3)
        (set! z.5 (+ z.5 w.2))
        (set! t.6 y.4)
        (set! p.1 -1)
        (set! t.6 (* t.6 p.1))
        (set! z.5 (+ z.5 t.6))
        (halt z.5))))
'(module
   ((locals (v.1 w.2 x.3 y.4 z.5 t.6 p.1))
    (conflicts
     ((x.3 (z.5 p.1 y.4 v.1 w.2))
      (w.2 (z.5 p.1 y.4 v.1 x.3))
      (v.1 (w.2 x.3))
      (y.4 (t.6 z.5 p.1 w.2 x.3))
      (p.1 (t.6 z.5 y.4 w.2 x.3))
      (z.5 (t.6 p.1 y.4 w.2 x.3))
      (t.6 (z.5 p.1 y.4))))
    (assignment
     ((p.1 r15) (z.5 r14) (y.4 r13) (x.3 r9) (w.2 r8) (t.6 r9) (v.1 r15))))
   (begin
     (set! v.1 1)
     (set! w.2 46)
     (set! x.3 v.1)
     (set! p.1 7)
     (set! x.3 (+ x.3 p.1))
     (set! y.4 x.3)
     (set! p.1 4)
     (set! y.4 (+ y.4 p.1))
     (set! z.5 x.3)
     (set! z.5 (+ z.5 w.2))
     (set! t.6 y.4)
     (set! p.1 -1)
     (set! t.6 (* t.6 p.1))
     (set! z.5 (+ z.5 t.6))
     (halt z.5)))

1.5.6 Appendix: Compiler Overview

Figure 3: Overview of Register Allocation Optimization

contents ← prev up next →

1	Book
2	Project Logistics
3	Project Iterations
4	CPSC 411 Infrastructure Package
5	Credits

1.1	A Compiler Begins with a Language
1.2	Abstracting Boilerplate (v1)
1.3	Abstract Locations
1.4	Value Orientation
1.5	Register Allocation
1.6	Structured Control Flow
1.7	Procedural Abstraction: Call
1.8	Procedural Abstraction: Return
1.9	Algebraic Expressions
1.10	Data types: Immediates
1.11	Data types: Structured Data and Heap Allocation
1.12	First-class Procedures: Code is Data
1.13	Recursive Data
1.14	Syntactic Sugar
1.15	Appendix: Racket Preliminaries
1.16	Appendix: Compiler Design Recipe

1.5.1	Preface: What’s wrong with our language?
1.5.2	Optimizing Location Assignment
1.5.3	Undeadness Analysis
1.5.4	Conflict Analysis
1.5.5	Register Allocation
1.5.6	Appendix: Compiler Overview