On this page:
1.14.1 Preface:   What’s wrong with Racketish?
1.14.2 Implementing Syntactic Sugar
expand-macros
1.14.3 Appendix:   Overview

1.14 Syntactic Sugar

1.14.1 Preface: What’s wrong with Racketish?

Exprs-lang v9 is a good start at a core language. You could start building standard libraries and get to work. But it’s missing a handful of nice features. Most languages support various complex data literals, such as literals for lists, array, or other structured data. They also support features you cannot encode as functions, like short-circuiting boolean operations and and or.

These get elaborated very early. In Racket, these are mostly implemented as macros, a user-defined function that transformer source syntax. We’ll design a compiler pass that isn’t quite a macro system, but should give you an idea of how a macro system works, and give you the ability to add new syntactic sugar to the language.

Below, we define Racketish-Surface.

  p ::= (module (define x (lambda (x ...) value)) ... value)
     
  value ::= triv
  | (quote s-expr)
  | (value value ...)
  | (macro-id value ...)
  | (let ([x value] ...) value)
  | (if value value value)
  | (call value value ...)
     
  triv ::= fixnum
  | prim-f
  | x
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (x ...) value)
  | vec-literal
     
  x ::= name?
  | prim-f
     
  s-expr ::= #t
  | #f
  | fixnum
  | ascii-char-literal
  | (s-expr ...)
     
  macro-id ::= vector
  | and
  | or
  | begin
     
  prim-f ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?
     
  vec-literal ::= host:vector?

We add two special syntaxes, (quote s-expr) for quoted lists, and (vector value ...) for vector literals. Note that our language does not have symbols, so our s-expr differs from Racket’s. The only thing to know is that this takes an arbitrary number of arguments.

We also add two reader notations to the surface language. ’s-expr is a reader macro in Racket and is implicitly elaborated to (quote s-expr). #(value ...) is the notation for vector literals, and is equivalent to (vector (quote value) ...), but not syntactically identical in the same way ’s-expr and (quote s-expr) are.

Because Racket’s reader supports them, we do not have to do much to add them! We avoid implementing a parser once more (huzzah!). This is one interesting feature of bootstrappingdefining a language in itself. When the meta and object languages share features, merely using a compiler that already implements the feature implements the feature.

Parsing is often described as a solved problem, but it is not, and is easy to get wrong, and doing a bad job can summon the dark gods to our realm. We do not want to summon dark gods, so we worked directly with data in the host language, and it was fine.

You can see the two are equivalent in Racket by trying typing the following in to the REPL or in the interactions frame in DrRacket:
'(1 2 3)
 
(quote (1 2 3))
 
#(1 2 3)
 
(vector 1 2 3)

I can’t typeset these examples using my normal REPL typesetting feature since Racket implicitly renders them identically.

Using match on this notation requires a little care, since our program is itself a quoted datum in Racket, and match treats them slightly differently.
(match '(quote (1 2 3))
  [`(quote ,es) es])
 
(match ''(1 2 3)
  [`(quote ,es) es])
 
(match '#(1 2 3)
  [`#(,es ...) es])
 
(match '#(1 2 3)
  [(vector es ...) es])
 
;; Fails to match
(match '#(1 2 3)
  [`(vector ,es ...) es])

We also introduce a new form for applying macros, and add a few macros. We also make procedure calls implicit; essentially, anything that looks like an application form is either a macro application, or, is implicitly a procedure call.

1.14.2 Implementing Syntactic Sugar

Macro expansion is the first step. Ideally, this is designed to be extensible, so we can add new macros at will, or even allow the user to define macros.

Our target language is Exprs-lang v9

  p ::= (module (define x (lambda (x ...) value)) ... value)
     
  value ::= triv
  | (quote s-expr)
  | (value value ...)
  | (macro-id value ...)
  | (let ([x value] ...) value)
  | (if value value value)
  | (call value value ...)
     
  triv ::= x
  | fixnum
  | prim-f
  | x
  | #t
  | #f
  | empty
  | (void)
  | (error uint8)
  | ascii-char-literal
  | (lambda (x ...) value)
  | vec-literal
     
  x ::= name?
  | prim-f
     
  s-expr ::= #t
  | #f
  | fixnum
  | ascii-char-literal
  | (s-expr ...)
     
  macro-id ::= vector
  | and
  | or
  | begin
     
  prim-f ::= *
  | +
  | -
  | eq?
  | <
  | <=
  | >
  | >=
  | eq?
  | fixnum?
  | boolean?
  | empty?
  | void?
  | ascii-char?
  | error?
  | not
  | pair?
  | procedure?
  | vector?
  | cons
  | car
  | cdr
  | make-vector
  | vector-length
  | vector-set!
  | vector-ref
  | procedure-arity
     
  fixnum ::= int61?
     
  uint8 ::= uint8?
     
  ascii-char-literal ::= ascii-char-literal?
     
  vec-literal ::= host:vector?

All macros are elaborated to existing features. We also elaborate implicit procedure call into explicit call.

Notice that the vector literal notation embeds an actual Racket vector in our AST.

Example:
> (vector-ref (second '(module #(1 2 3))) 0)

1

This vector is required to only contain values from Racketish-Surface.

And the the quoted literal notation actually expands to the symbol quote, although it always gets printed as a tick:

Examples:
> '(module '(#t #f))

'(module '(#t #f))

> (first (second '(module '(#t #f))))

'quote

The transformations we want to define are given below:
  • (and v ...)
    `(and) => #t
    `(and ,e) => e
    `(and ,e ,es ...) => `(if ,e (and ,es ...) #f)

    When and is given no arguments, we expand to #t, since #t is the multiplicative identity element on booleans (like 1 is for integers and multiplication). When given a single argument, we return it, since Racket is falsey. Otherwise, we elaborate to branch on the first argument, and recursively expand and applied to the rest of its arguments.

  • (or v ...)
    `(or) => #f
    `(or ,e) => e
    `(or ,e ,es ...) => `(let ([,tmp ,e]) (if ,tmp ,tmp (or ,es ...)))

    When or is given no arguments, we expand to #f, since #f is the additive identity element on booleans (like 0 is for integers and addition). When given a single argument, we return it, since Racket is falsey. Otherwise, we bind the first argument to a fresh auxiliary variable, return it if its true, and otherwise return the or of the rest of the arguments.

    Notice that since we introduce an auxiliary variable we could accidentally capture any use of a free variable in es ..., just like any compiler pass that introduces auxiliary variables. We use fresh as normal, since our representation of abstract locations are valid names.

  • (vector e ...) and #(e ...)
    `(vector ,es ...)
    =>
    `(let ([,tmp (make-vector ,(length es))])
       (begin
         (vector-set! ,tmp 0 ,(list-ref es 0))
         ...
         (vector-set! ,tmp ,(sub1 (length es)) ,(list-ref es (sub1 (length es))))
         tmp))

  • (quote s-expr)
    `',v  => v
    `'(,s-expr ,s-exprs ...)
    =>
    `(cons ',s-expr  '(,s-exprs ...))

  • (begin e ...)
    `(begin) => '(void)
    `(begin ,e) => e
    `(begin ,e ,es ...)
    =>
    `(let ([,tmp ,e])
       (if (error? ,tmp)
           ,tmp
           `(begin ,es ...)))

    We add a surface-level begin form that handles propagating errors correctly. This is essentially an implementation of do for the monad.

These rewrites are essentially syntax-case macros, recursive pattern-based macros that can perform computation at compile time.

Notice that each macro can generate more uses of macros. As a result, the recursive structure of this elaborate pass is going to be slightly different from other passes.

procedure

(expand-macros p)  exprs-lang-v9?

  p : racketish-surface?
Expand all Racketish-Surface macros.

1.14.3 Appendix: Overview

This graph does not integrate the Recursive Data chapter.

%3LuRacketish-SurfaceLu->Lu check-exprs-langLwExprs-lang-v9Lu->Lw expand-macrosLyExprs-unique-lang v9Lw->Ly uniquifyLzExprs-unsafe-data-lang v9Ly->Lz implement-safe-primopsLz1Exprs-unsafe-lang v9Lz->Lz1 implement-safe-callLz2Just-exprs-lang v9Lz1->Lz2 define->letrecLz2->Lz2 optimize-direct-callsLz3Lam-opticon-lang v9Lz2->Lz3 dox-lambdasLz4Lam-free-lang v9Lz3->Lz4 uncover-freeLz5Closure-lang v9Lz4->Lz5 convert-closuresLz5->Lz5 optimize-known-callsLz6Hoisted-lang v9Lz5->Lz6 hoist-lambdasLz7Proc-exposed-lang v9Lz6->Lz7 implement-closuresL0Exprs-bits-lang v8Lz7->L0 specify-representationL1Values-bits-lang v8L0->L1 remove-complex-opera*L3Imp-mf-lang v8L1->L3 sequentialize-letL2Proc-imp-cmf-lang v8L3->L2 normalize-bindL4Imp-cmf-lang v8L2->L4 impose-calling-conventionsL5_1Asm-alloc-lang v8L4->L5_1 select-instructionsL5Asm-pred-lang v8L5_1->L5 expose-allocation-pointerL6Asm-pred-lang v8/localsL5->L6 uncover-localsL7Asm-pred-lang v8/undeadL6->L7 undead-analysisL8Asm-pred-lang v8/conflictsL7->L8 conflict-analysisL81Asm-pred-lang v8/pre-framedL8->L81 assign-call-undead-variablesL82Asm-pred-lang v8/framedL81->L82 allocate-framesL83Asm-pred-lang v8/spilledL82->L83 assign-registersL9Asm-pred-lang v8/assignmentsL83->L9 assign-frame-variablesL10Nested-asm-lang-fvars v8L9->L10 replace-locationsL10_1Nested-asm-lang v8L10->L10_1 implement-fvarsL11Block-pred-lang v8L10_1->L11 expose-basic-blocksL12Block-asm-lang v8L11->L12 resolve-predicatesL12_1Para-asm-lang v8L12->L12_1 flatten-programL15_1Paren-x64-mops v8L12_1->L15_1 patch-instructionsL14x64L15integerL14->L15 executeL16Paren-x64 v8L15_1->L16 implement-mopsL16->L14 generate-x64L16->L15 interp-paren-x64L17Paren-x64-rt v8L16->L17 link-paren-x64L17->L15 interp-loop

Figure 11: Overview of Compiler Version 10