On this page:
1.7.1 Preface:   What’s wrong with our language?
1.7.2 Designing a source language with call
1.7.3 Calling Conventions Introduction
1.7.3.1 Designing the Calling Convention Translation
1.7.4 Extending front-end with support for call
check-values-lang
uniquify
sequentialize-let
normalize-bind
1.7.5 Implementing Calling Conventions
impose-calling-conventions
1.7.6 Exposing Jumps
select-instructions
1.7.6.1 Extending Register Allocation
uncover-locals
undead-analysis
conflict-analysis
assign-registers
replace-locations
1.7.6.2 Exposing Basic Blocks
optimize-predicates
expose-basic-blocks
1.7.7 Appendix:   Overview

1.7 Procedural Abstraction: Call

1.7.1 Preface: What’s wrong with our language?

In the last chapter, we designed the language Values-lang v4 with support for structured control-flow operations. This enables our programs to make run-time decisions that influence what code is executed. However, the language is missing a key feature necessary for practical programming: the ability to reuse code.

In this chapter, we introduce a common method of reusing code: procedural abstraction. We introduce a limited form of procedure that is essentially similar to the basic blocks from the last chapter. This feature is a thin layer of abstraction over the jmp instruction that we can safely expose in the source language. Our low-level language already exposes jmp, so we have all the machinery we need from our low-level languages. However, we need to solve a critical problem: designing a calling convention.

Procedures are sometimes called functions. This is often a misnomer, since "function" evokes a purely mathematical construct specifying input and output pairs that does not necessarily give rise to an algorithm, while a procedure is code that executes algorithmically, and may rely on machine state in addition to its declared inputs and outputs.

Normally, procedures are thought about as composed of two features: call and return. However, these are two separate abstractions. In this chapter, we introduce only the call abstraction. Procedures can be called, but never return. We limit where such a call can happen so we do not have to answer inconvenient questions.

1.7.2 Designing a source language with call

Below, we define Values-lang v5 by extending Values-lang-v4 with a tail callsa procedure call in tail position.

  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  pred ::= (relop triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
     
  triv ::= int64
  | x
     
  x ::= name?
  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  pred ::= (relop triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
     
  triv ::= int64
  | x
     
  x ::= name?
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?

A program now begins with a set of declared procedures, named blocks of code that are parameterized over some data indicated by the list of names (x ...), called the parameters. Each of these procedures can be used by the call abstraction (call name value ...), which takes the name of a procedure and a list of values, called the arguments, with which to instantiate the parameters. At run-time, the call unconditionally transfers control to the code declared in the definition of the procedure name.

Notice this call is restricted to tail context. If we allowed a call in value context, for example, then we could have a call on the right-hand side of a let. This would transfer control with a tail still remaining to be executed. Defining what this means in a sensible way is difficult, and requires introducing some abstraction to transfer control back from a procedure to the middle of some existing computation. Recall that a value in tail context is implicitly the final answer, and is compiled to a halt instruction. In terms of the lower-level languages, we can understand a tail call as jumping until the program reaches a halt instruction.

We already have the machinery to compile procedure definitions and calls. Procedure definitions are transformed into basic blocks, and calls into, essentially, jmp instructions.

The only question is how to pass arguments. The call instruction needs to know in which locations to store the arguments, and the called procedure needs to know from which locations to read its parameters. The problem is deciding how to ensure the locations end up the same. To solve this, we introduce a calling convention.

1.7.3 Calling Conventions Introduction

The calling convention gives us a pattern to set up a call to any procedure. We fix a set of physical locations. Both the caller and the callee agree that those locations are the only thing they need to know about. Every call will first set those locations and pass control to the procedure. Every procedure will read from those locations on entry, and move the arguments into its own abstract locations. This way, no procedure needs to know about another’s abstract locations. This allows our register allocator to continue functioning as is, with only a small change in scope from the entire program to procedure definition.

Design digression:
Strictly speaking, we don’t need to use physical locations for our calling convention. What we need is a set of global, shared locations, whose names are unique across the entire program, and any program that we might link with. The register allocator needs to assign all uses of these shared locations to the same physical locations.

A simple implementation of these global shared location is to use physical locations. Physical locations are automatically globally consistent and unique, and automatically known by all programs we might link with. If we allow those physical locations to be registers, we can generate very fast procedure calls by keeping memory out of the picture as much as possible. Unfortunately, using them requires that we expose physical locations through every layer of abstraction up to the point where we implement the calling convention. This makes all our abstractions only partial abstractions, and injects undefined behaviour back into our intermediate languages.

We could also use the stack to implement the calling convention. This is simpler, as we can keep registers abstract and need to expose memory high in the compiler pipeline anyway, but slower since every procedure call must now access memory.

We could try to create global abstract locations. However, then our register allocator will only work over whole programs; we won’t be able to support separate compilation, without implementing a separate, lower-level calling convention. This would create unnecessary indirection and be less efficient.

We could try to introduce abstract call-setup and return-from-call instructions, and leave it to the register allocator to implement these in terms of physical locations. This complicates the register allocator, an already complicated pass. It also mixes concerns: assigning abstract locations to physical locations, and generating the instructions to implement call and return. At some level, the register allocator will have to perform the transformation we’re already suggesting: first setup physical locations for the call, create conflicts between those physical locations and abstract locations, then assign registers. At very least, we need to do this before conflict analysis, so we might as well do it before all the analyses. This leads us right back to the original design: expose physical locations through the register allocator, high in the compiler pipeline.

Our calling convention is based on the x64 System V ABI. We deviate from it slightly and develop a similar, but simplified, calling convention. The calling convention is defined by parameters, which will let us be abstract with respect to particular choices in the calling convention.

These parameters are all defined in cpsc411/compiler-lib.

Our calling convention passes the first n arguments as registers, using the set of registers defined in the parameter current-parameter-registers. The default value is '(rdi rsi rdx rcx r8 r9), which is defined by the x64 System V ABI to be where the first 6 arguments of any procedure are stored. To deal with an arbitrary number of arguments, we may need more than the n registers we have available for parameters. For the rest, we use fresh frame variables.

Since tail calls never return, we do not need to worry about what is on the frame before a call. We can simply overwrite all existing frame variables. This means recursive tail calls have the same performance characteristic as, and in fact compile to, loops: they use a constant amount of stack space, compile directly to jumps, and only need registers as long as they use fewer than 6 arguments.

1.7.3.1 Designing the Calling Convention Translation

To design our calling convention translation, we start by looking at how we want to translate terms into abstractions we already have. We then redesign our intermediate languages to support our translation.

Design digression:
We must be careful to design a translation that eliminates some abstraction layer, or we risk developing a translation that simply kicks the real problem down to the next language. A good heuristic here is to avoid designing a translation that introduces any new abstractions. If we need new abstractions, we should look at the problem bottom up—designing the proposed abstraction as an abstraction of some features expressible in the target language.

Intuitively, we know how to compile calls. We want to instantiate the parameters of the procedure, moving arguments to the shared locations, then jump to the label of the procedure. Concretely, we want to perform the following transformations.

Now that we know the translation we would like to perform, we need to figure out where to fit the new translation in our compiler pipeline. To do this, we need to ask a few questions.

First: which target languages support the features needed in the target of our translation? If we look at the pipeline from the last chapter, Appendix: Overview, we know the translation must come at least after sequentialize-let, since the target language will need to use imperative features introduced in that pass. Introducing calling conventions also generates begin statement in tail position, and generates set! expressions with a combination of values and locations for operands. This last feature tells us we should place the new pass before select-instructions, since we want the unrestricted set! feature select-instructions provides.

Second: which translations might be disrupted by introducing new features in their source language? If we add the new translation before normalize-bind, then we won’t need to extend normalize-bind to support the call construct, but we will need to make sure normalize-bind handles the jump construct. This might be a problem, if jump appears in effect context, but note that since call appears in tail context, so must the jump that call compiles to. This suggests we could place the new pass either right before or right after normalize-bind.

Third, we try to future proof our design decisions: will the answer to any of the above questions change if we add new features in the future? This is hard to predict; the future is vast so the search space is large. We can limit our search by focusing on limitations in features we’re now adding. We just added call in tail context. Are we likely to lift this restriction and allow calls in other contexts later? Yes, that seems likely; most languages allow calls in non-tail context. If we allowed calls in effect context, e.g., then our translation would need to generate a begin in effect context, and not just tail context. This tells us we want to support nested begin in the target language of our new pass. Since we changed the target of normalize-bind to allow nested begin, we can still place our new pass before or after normalize-bind. We’re also likely to add call in value context. Notice that normalize-bind is sensitive to forms in value position, in particular, on the right-hand side of a set!. This suggests we should avoid transforming call until after performing normalize-bind.

We conclude the new pass should go just after normalize-bind. We start by extending the first few passes down to normalize-bind.

1.7.4 Extending front-end with support for call

We start by formally defining Values-lang v5, the new source language. We typeset the difference compared to Values-lang v4.

  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  pred ::= (relop triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
     
  triv ::= int64
  | x
     
  x ::= name?
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?
  p ::= (module (define x (lambda (x ...) tail)) ... tail)
     
  pred ::= (relop triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([x value] ...) tail)
  | (if pred tail tail)
  | (call x triv ...)
     
  value ::= triv
  | (binop triv triv)
  | (let ([x value] ...) value)
  | (if pred value value)
     
  triv ::= int64
  | x
     
  x ::= name?
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  int64 ::= int64?

Modules now define a set of procedures at the top-level before the initial tail. The program begins executing the tail following the set of definitions.

The defined procedures can be used anywhere in the module. We support mutually recursive calls, so procedures defined later in the module can be referenced by definitions earlier in the module. For example, the following is a valid Values-lang v5 program:
(module
  (define odd?
    (lambda (x)
      (if (= x 0)
          0
          (let ([y (+ x -1)])
            (call even? y)))))
  (define even?
    (lambda (x)
      (if (= x 0)
          1
          (let ([y (+ x -1)])
            (call odd? y)))))
  (call even? 5))

We continue to require that the source program is well bound: all lexical identifiers are defined before they are used. But we have new binding rules. We restrict procedures to not bind the same identifier as a parameter twice; for example, (lambda (x x) x) is invalid. We could allow this and define a shadowing order for parameters, but this would always introduce a dead variable and is probably a mistake in the source language. We also forbid the same identifier being defined twice at the top level. For example:
(module
  (define f (lambda (x) x))
  (define f (lambda (x) 5))
  5)
is invalid. Since we support mutually recursive procedures, there isn’t a sensible way to define this binding.

We have not introduced a method for dynamically checking that a procedure is used correctly yet. To avoid exposing undefined behaviour, we make an additional restriction to calls. Calls must call a statically known procedure with exactly the right number of arguments. Otherwise, our desired transformation will leave some physical locations uninitialized, resulting in undefined behaviour.

We now have two data types exposed in the source language: integers, and procedures. This introduces more opportunities for undefined behaviour. If we try to compare a procedure to an integer, we will generate code with undefined behaviour. We therefore require that source program does not use labels in this way.

To validate these assumptions, we can implement check-values-lang.

check-values-lang must necessarily rule out many well-defined programs, since without modifying the source language, we cannot tell the types of all procedure parameters. We design a conservative approximation that is sound, i.e., it never accepts a program with undefined behaviour, but incomplete, i.e., some well-defined programs are rejected. This is a normal trade off in compilers, and yet another consequence of Rice’s Theorem.

We use the following heuristics to implement check-values-lang:
  • A procedure’s name can only appear in application position of a call, or bound in the right-hand side of a let.

  • The parameters to a procedure are assumed to be integers.

  • A call (call x triv ...) is only well typed if x is bound to a procedure with n parameters and there are exactly n arguments in call.

  • A binary operation (binop triv_1 triv_2) is only well typed, and has type integer, if both triv_1 and triv_2 have type integer.

  • A relational operation (relop triv_1 triv_2) is only well typed if both triv_1 and triv_2 have type integer.

  • An if expression (if pred tail_1 tail_2) is only well typed if pred is a well-typed predicate.

  • Every procedure must return an int64.

Finally, we have one restriction imposed by the run-time system: the final result of the program must be an int64.

procedure

(check-values-lang p)  values-lang-v5?

  p : any/c
Validates that the Values-lang v5 is syntactically well-formed, well bound and well typed: all procedure calls pass the correct number of arguments, and all binop and relop are never used with labels. You may want to separate this into two problems: first checking syntax, then type checking.

Next we need to resolve lexical identifiers. This is slightly complicated by introducing procedures. We want to compile procedures to labeled blocks and jumps, so we need to compile their names to labels rather than abstract locations.

First, we design Values-unique-lang v5. We typeset the differences compared to Values-lang v5.

  p ::= (module (define label x (lambda (aloc x ...) tail)) ... tail)
     
  pred ::= (relop opand opand triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc x value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([aloc x value] ...) tail)
  | (if pred tail tail)
  | (call x triv opand ...)
     
  value ::= triv
  | (binop opand opand triv triv)
  | (let ([aloc x value] ...) value)
  | (if pred value value)
     
  x ::= name?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand triv triv)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([aloc value] ...) tail)
  | (if pred tail tail)
  | (call triv opand ...)
     
  value ::= triv
  | (binop opand opand triv triv)
  | (let ([aloc value] ...) value)
  | (if pred value value)
     
  opand triv ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (let ([aloc value] ...) tail)
  | (if pred tail tail)
  | (call triv opand ...)
     
  value ::= triv
  | (binop opand opand)
  | (let ([aloc value] ...) value)
  | (if pred value value)
     
  opand ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

As usual, we change local lexical identifiers to abstract locations. However, we also change top-level lexical identifiers into labels. labels are trivs, although we assume they are only used according to the typing rules imposed by check-values-lang.

procedure

(uniquify p)  values-unique-lang-v5?

  p : values-lang-v5?
Compiles Values-lang v5 to Values-unique-lang v5 by resolving top-level lexical identifiers into unique labels, and all other lexical identifiers into unique abstract locations.

Next we design Imp-mf-lang v5, an imperative language in monadic form with procedures. We typeset the differences compared to Values-unique-lang v5.

  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (let ([aloc value] ...) pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (let ([aloc value] ...) tail)
  | (if pred tail tail)
  | (call triv opand ...)
     
  value ::= triv
  | (binop opand opand)
  | (begin effect ... value)
  | (let ([aloc value] ...) value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand triv triv)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand triv triv)
  | (begin effect ... value)
  | (if pred value value)
  | (begin effect ... value)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
  | (begin effect ... effect)
     
  opand triv ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (begin effect ... value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

There are no interesting changes. We simply propagate the new procedure forms form down one more level of abstraction.

Compiles Values-unique-lang v5 to Imp-mf-lang v5 by picking a particular order to implement let expressions using set!.

Finally, we need to normalize assignment statements so the right-hand side is a "trivial" value. Below, we design Proc-imp-cmf-lang-v5 the target language of normalize-bind.

  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
  | (begin effect ... value)
  | (if pred value value)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand triv triv)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand triv triv)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand triv ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module (define label (lambda (aloc ...) tail)) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | aloc
     
  triv ::= opand
  | label
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  int64 ::= int64?

There is no major change, since we’ve only added call to tail context, and normalize-bind is only concerned with value context. Viewed in comparison to the source for normalize-bind, it merely removes the nesting in value context. The previous iteration of this intermediate language is Imp-cmf-lang v4; compared with that, we simply add the procedure abstractions, namely tail calls and procedure definitions.

Compiles Imp-mf-lang v5 to Proc-imp-cmf-lang v5, pushing set! under begin so that the right-hand-side of each set! is simple value-producing operation.

This normalizes Imp-mf-lang v5 with respect to the equations:
(set! aloc
      (begin effect_1 ...
             value))

   

=

   

(begin effect_1 ...
       (set! aloc value))
(set! aloc
      (if pred
          value_1
          value_2))

   

=

   

(if pred
    (set! aloc value_1)
    (set! aloc value_2))

1.7.5 Implementing Calling Conventions

Now we can design the language for our calling convention translation.

We start with the design of Imp-cmf-lang v5, the target language of impose-calling-conventions.

  p ::= (module (define label tail (lambda (aloc ...) tail)) ... tail)
     
  tail ::= value
  | (jump trg loc ...)
  | (call triv opand ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc aloc value)
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
  | aloc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  rloc ::= register?
  | fvar?
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= value
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc value)
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

Compared to the source, we remove the call form and replace it by the jump form. As described in Designing the Calling Convention Translation, all calls are compiled to a sequence of set!s moving the arguments followed by a jump, and all procedure definitions are compiled to a block that assigns the parameters, as directed by the calling convention.

Note that we now require physical locations in the target language, so we must gradually expose physical locations up to this language from the rest of the compiler.

Compiles Proc-imp-cmf-lang v5 to Imp-cmf-lang v5 by imposing calling conventions on all calls and procedure definitions. The parameter registers are defined by the list current-parameter-registers.

1.7.6 Exposing Jumps

Implementing our calling convention requires exposing jumps quite high in the compiler pipeline, in Imp-mf-lang v5, while in the previous chapter we hid jumps behind an abstraction boundary in Block-pred-lang v4. Thankfully, it is not very difficult to propagate jumps up the compiler pipeline. The main challenge is in adjusting the register allocator, but we have already done the design work to simplify that by annotating jumps with their live-out sets (which can be directly used as undead-out sets).

First, we design Asm-pred-lang v5, the target of our select-instructions pass. To see how to extend select-instructions, we should view the difference, we typeset the differences compared to Asm-pred-lang v4.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= info?
     
  pred ::= (relop loc opand aloc triv)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand triv)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc aloc triv)
  | (set! loc_1 aloc_1 (binop loc_1 opand aloc_1 triv))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc rloc ::= triv
  | int64
  | aloc
     
  trg ::= label
  | loc
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= info?
     
  pred ::= (relop loc opand opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | value
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  value ::= triv
  | (binop opand opand)
     
  effect ::= (set! loc triv value)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= info?
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

The main difference is in the addition of the jump instruction. Note that the "arguments" to the jump are not part of the meaning of the instruction; they are just metadata used later by the compiler. We also must handle physical locations in the source language, but this does not cause many changes.

Compiles Imp-cmf-lang v5 to Asm-pred-lang v5, selecting appropriate sequences of abstract assembly instructions to implement the operations of the source language.

1.7.6.1 Extending Register Allocation

Now that we have multiple blocks and jumps, we have a design choice to make in our compiler. When analyzing the program to determine how variables are used, we can either:
  • interpret the program as a tree, analyzing and allocating registers to each block separately and essentially ignoring jumps, or

  • interpret the program as a graph, trying to follow control and data flow to determine the destination of a jump, in order to analyze conflicts and allocate registers across jumps.

The first option is intraprocedural; it is simpler, and can simply map our existing algorithm essentially unchanged over each block. The second option is interprocedural. It would be more complex, but could do a better job allocating registers in some cases by allowing a caller and callee to share some registers, thus eliminating some moves introduced during a procedure call.

We design our compiler using the first option: extending our existing algorithm slightly to an intraprocedural analysis and allocator. This requires very few extensions over the existing design, except to handle jump, which we have conveniently annotated with its live-out set. Our calling convention is already quite cheap since we can use registers most of the time, and the benefit of interprocedural analysis can be recovered somewhat by procedure inlining, which has additional benefits besides. This small benefit of interprocedural analysis, in our setting, is not worth additional complexity in register allocation, which is already quite complex.

First, we extend uncover-locals to analyze jumps. We design the administrative language Asm-pred-lang v5/locals (Asm-pred-lang v5 with locals) below. Note that the only difference is in the specification of the info field.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...))))
     
  tail ::= (halt opand triv)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...))))
  | info?
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...))))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

Note that because the source language now includes blocks, we need to perform the local analysis over each block. Each block gets its own info field, with its own locals set. The locals set for the initial tail of the module is stored in the module’s info field.

We also extend the analysis to support jump. Note that (jump trg loc ...) only references trg; the rest of the loc are only metadata.

Compiles Asm-pred-lang v5 to Asm-pred-lang v5/locals, analysing which abstract locations are used in each block, and updating each block and the module with the set of variables in an info? fields.

Now our undead analysis must change to analyze jumps. Asm-pred-lang v5/undead defines the output of undead-analysis.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree/rloc? undead-set-tree?)))
     
  tail ::= (halt opand triv)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
  info ::= (#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree/rloc?)))
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (undead-out undead-set-tree/rloc?)))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

When analyzing a jump statement, we need to compute its undead-out set.

In general, this is difficult. In general, we may not know the destination of the jump, so we would either have to conservatively approximate and say "anything could be live, i.e., everything is undead", or analyze the control flow of the program, following jumps and analyzing the destination.

Thankfully, none of that is necessary. Because jumps in our language only come from procedure calls, and our calling convention translation decorated the jump with the locations used by the procedure call, i.e.,, those locations (expected to be) live after the jump, our undead analysis is trivial. The undead-out set of a jump statement (jump trg loc ...) is the set (loc ...).

This requires no changes to the undead-set tree.

Again, we also need to modify the analysis slightly to perform local analysis on each block, and store undead-set trees in the info field for the corresponding block.

Performs undead analysis, compiling Asm-pred-lang v5/locals to Asm-pred-lang v5/undead by decorating programs with their undead-set trees.

Next we need to compute the conflict graph.

Below, we design Asm-pred-lang v5/conflicts.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc aloc (loc aloc ...)) ...))))
     
  tail ::= (halt opand triv)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...)) (undead-out undead-set-tree/rloc?)))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (conflicts ((loc (loc ...)) ...))))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

conflict-analysis does not change significantly. We simply extend the algorithm to support jump statements. Note that (jump trg loc ...) only references trg, and never defines any abstract location.

Again, the analysis must perform local analysis on each block separately.

Performs conflict analysis, compiling Asm-pred-lang v5/undead to Asm-pred-lang v5/conflicts by decorating programs with their conflict graph.

The register allocator does not need major changes. Notably, since the allocator is defined over the conflict graph, instead of over the program, it needs even fewer changes to support the new instructions in the language.

Below we define Asm-pred-lang v5/assignments, which only changes in the info field as usual.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc rloc) ...))))
     
  tail ::= (halt opand triv)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
  info ::= (#:from-contract (info/c (locals (aloc ...)) (assignment conflicts ((aloc rloc) (loc (loc ...)) ...))))
  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc rloc) ...))))
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  opand ::= int64
  | loc
     
  triv ::= opand
  | label
     
  loc ::= rloc
  | aloc
     
  trg ::= label
  | loc
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  label ::= label?
     
  rloc ::= register?
  | fvar?
     
  int64 ::= int64?

The allocator simply runs the same algorithm as before, but this time, on each block’s conflict graph, separately.

Performs graph-colouring register allocation, compiling Asm-pred-lang v5/conflicts to Asm-pred-lang v5/assignments by decorating programs with their register assignments.

Finally, we actually replace abstract locations with physical locations.

We design the source, Nested-asm-lang v5 below, although we discuss its design later.

  p ::= (module info (define label info tail) ... tail)
     
  info ::= (#:from-contract (info/c (locals (aloc ...)) (assignment ((aloc rloc) ...))))
     
  tail ::= (halt opand)
  | (jump trg loc ...)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  loc ::= reg
  | fvar
  | rloc
  | aloc
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  triv ::= opand
  | label
     
  opand ::= int64
  | loc
     
  loc ::= reg
  | fvar
     
  trg ::= label
  | loc
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  fvar ::= fvar?
     
  label ::= label?
     
  int64 ::= int64?

We need to extend the implementation to traverse each block, and support jump statements. In the process, we also discard the undead annotations on the jump instruction.

Replaces all abstract location with physical locations using the assignment described in the assignment info field, and dropping any register-allocation-related metadata from the program.

1.7.6.2 Exposing Basic Blocks

The last updates we need to make are to optimize-predicates and to expose-basic-blocks, both of use Nested-asm-lang v5 as the source language.

We design the source, Nested-asm-lang v5 below, typeset compared to Nested-asm-lang v4

  p ::= (module (define label tail) ... tail)
     
  tail ::= (halt opand triv)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  triv ::= opand
  | label
     
  opand triv ::= int64
  | loc
     
  trg ::= label
  | loc
  p ::= (module (define label tail) ... tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  triv ::= opand
  | label
     
  opand ::= int64
  | loc
     
  loc ::= reg
  | fvar
     
  trg ::= label
  | loc
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  fvar ::= fvar?
     
  label ::= label?
     
  int64 ::= int64?

The main difference is the inclusion of jump expressions and block definitions. These do not complicate the process of exposing basic blocks much. We have only added jump in tail position, which matches the requirement for a basic block already. To extend expose-basic-blocks, we simply need to traverse each block, transforming it into a basic block, and exposing new basic blocks in the process.

Note that we again need to impose the convention that execution begins with the first basic block, and move the initial tail into an explicit basic block.

Few changes are needed to optimize-predicates, since jumps do not affect predicate position, so we just need to optimize each block and recognized jumps.

Optimize Nested-asm-lang v5 programs by analyzing and simplifying predicates.

The target language of expose-basic-blocks is Block-pred-lang v5, has no changes compared to Block-pred-lang v4.

  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  p ::= (module b (define label tail) ... b tail)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
  | (begin effect ... pred)
  | (if pred pred pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg) tail tail)
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
  | (begin effect ... effect)
  | (if pred effect effect)
     
  triv ::= opand
  | label
     
  opand ::= int64
  | loc
     
  trg ::= label
  | loc
     
  loc ::= reg
  | fvar
     
  trg ::= label
  | loc
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  fvar ::= fvar?
     
  label ::= label?
     
  int64 ::= int64?
  p ::= (module b ... b)
     
  b ::= (define label tail)
     
  pred ::= (relop loc opand)
  | (true)
  | (false)
  | (not pred)
     
  tail ::= (halt opand)
  | (jump trg)
  | (begin effect ... tail)
  | (if pred (jump trg) (jump trg))
     
  effect ::= (set! loc triv)
  | (set! loc_1 (binop loc_1 opand))
     
  triv ::= opand
  | label
     
  opand ::= int64
  | loc
     
  trg ::= label
  | loc
     
  loc ::= reg
  | fvar
     
  reg ::= rsp
  | rbp
  | rax
  | rbx
  | rcx
  | rdx
  | rsi
  | rdi
  | r8
  | r9
  | r12
  | r13
  | r14
  | r15
     
  binop ::= *
  | +
     
  relop ::= <
  | <=
  | =
  | >=
  | >
  | !=
     
  aloc ::= aloc?
     
  fvar ::= fvar?
     
  label ::= label?
     
  int64 ::= int64?

Compile the Nested-asm-lang v5 to Block-pred-lang v5, eliminating all nested expressions by generate fresh basic blocks and jumps.

Tail calls are implemented completely as an abstraction over basic blocks, with a convention about shared physical locations. Nothing else in the compiler needs to change.

1.7.7 Appendix: Overview

%3L0Values-lang v5L0->L0 check-values-langL1Values-unique-lang v5L0->L1 uniquifyL3Imp-mf-lang v5L1->L3 sequentialize-letL2Proc-imp-cmf-lang v5L4Imp-cmf-lang v5L2->L4 impose-calling-conventionsL3->L2 normalize-bindL5Asm-pred-lang v5L4->L5 select-instructionsL10Nested-asm-lang v5L5->L10 assign-homes-optL6Asm-pred-lang v5/localsL5->L6 uncover-localsL10->L10 optimize-predicatesL11Block-pred-lang v5L10->L11 expose-basic-blocksL12Block-asm-lang v4L11->L12 resolve-predicatesL12aPara-asm-lang v4L12->L12a flatten-programL13Paren-x64-fvars v4L12a->L13 patch-instructionsL16Paren-x64 v4L13->L16 implement-fvarsL14x64L15integerL14->L15 executeL7Asm-pred-lang v5/undeadL6->L7 undead-analysisL8Asm-pred-lang v5/conflictsL7->L8 conflict-analysisL9Asm-pred-lang v5/assignmentsL8->L9 assign-registersL9->L10 replace-locationsL16->L14 generate-x64L16->L15 interp-paren-x64L17Paren-x64-rt v4L16->L17 link-paren-x64L17->L15 interp-loop

Figure 5: Overview of Compiler Version 5