1.14 Syntactic Sugar
1.14.1 Preface: What’s wrong with Racketish?
Exprs-lang v9 is a good start at a core language. You could start building standard libraries and get to work. But it’s missing a handful of nice features. Most languages support various complex data literals, such as literals for lists, array, or other structured data. They also support features you cannot encode as functions, like short-circuiting boolean operations and and or.
These get elaborated very early. In Racket, these are mostly implemented as macros, a user-defined function that transformer source syntax. We’ll design a compiler pass that isn’t quite a macro system, but should give you an idea of how a macro system works, and give you the ability to add new syntactic sugar to the language.
Below, we define Racketish-Surface.
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (quote s-expr) | |||
| | (value value ...) | |||
| | (macro-id value ...) | |||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | fixnum | ||
| | prim-f | |||
| | x | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
| | vec-literal | |||
x | ::= | name? | ||
| | prim-f | |||
s-expr | ::= | #t | ||
| | #f | |||
| | fixnum | |||
| | ascii-char-literal | |||
| | (s-expr ...) | |||
macro-id | ::= | vector | ||
| | and | |||
| | or | |||
| | begin | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? | ||
vec-literal | ::= | host:vector? |
We add two special syntaxes, (quote s-expr) for quoted lists, and (vector value ...) for vector literals. Note that our language does not have symbols, so our s-expr differs from Racket’s. The only thing to know is that this takes an arbitrary number of arguments.
We also add two reader notations to the surface language. ’s-expr is a reader macro in Racket and is implicitly elaborated to (quote s-expr). #(value ...) is the notation for vector literals, and is equivalent to (vector (quote value) ...), but not syntactically identical in the same way ’s-expr and (quote s-expr) are.
Parsing is often described as a solved problem, but it is not, and is easy to get wrong, and doing a bad job can summon the dark gods to our realm. We do not want to summon dark gods, so we worked directly with data in the host language, and it was fine.
I can’t typeset these examples using my normal REPL typesetting feature since Racket implicitly renders them identically.
(match '(quote (1 2 3)) [`(quote ,es) es]) (match ''(1 2 3) [`(quote ,es) es]) (match '#(1 2 3) [`#(,es ...) es]) (match '#(1 2 3) [(vector es ...) es]) ;; Fails to match (match '#(1 2 3) [`(vector ,es ...) es])
We also introduce a new form for applying macros, and add a few macros. We also make procedure calls implicit; essentially, anything that looks like an application form is either a macro application, or, is implicitly a procedure call.
1.14.2 Implementing Syntactic Sugar
Macro expansion is the first step. Ideally, this is designed to be extensible, so we can add new macros at will, or even allow the user to define macros.
Our target language is Exprs-lang v9
p | ::= | (module (define x (lambda (x ...) value)) ... value) | ||
value | ::= | triv | ||
| | (quote s-expr) | |||
| | (value value ...) | |||
| | (macro-id value ...) | |||
| | (let ([x value] ...) value) | |||
| | (if value value value) | |||
| | (call value value ...) | |||
triv | ::= | x | ||
| | fixnum | |||
| | prim-f | |||
| | x | |||
| | #t | |||
| | #f | |||
| | empty | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) value) | |||
| | vec-literal | |||
x | ::= | name? | ||
| | prim-f | |||
prim-f | ::= | binop | ||
| | unop | |||
binop | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
s-expr | ::= | #t | ||
| | #f | |||
| | fixnum | |||
| | ascii-char-literal | |||
| | (s-expr ...) | |||
macro-id | ::= | vector | ||
| | and | |||
| | or | |||
| | begin | |||
unop prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity | |||
fixnum | ::= | int61? | ||
uint8 | ::= | uint8? | ||
ascii-char-literal | ::= | ascii-char-literal? | ||
vec-literal | ::= | host:vector? |
All macros are elaborated to existing features. We also elaborate implicit procedure call into explicit call.
> (vector-ref (second '(module #(1 2 3))) 0) 1
- (and v ...)
`(and) => #t `(and ,e) => e `(and ,e ,es ...) => `(if ,e (and ,es ...) #f) When and is given no arguments, we expand to #t, since #t is the multiplicative identity element on booleans (like 1 is for integers and multiplication). When given a single argument, we return it, since Racket is falsey. Otherwise, we elaborate to branch on the first argument, and recursively expand and applied to the rest of its arguments.
- (or v ...)
`(or) => #f `(or ,e) => e `(or ,e ,es ...) => `(let ([,tmp ,e]) (if ,tmp ,tmp (or ,es ...))) When or is given no arguments, we expand to #f, since #f is the additive identity element on booleans (like 0 is for integers and addition). When given a single argument, we return it, since Racket is falsey. Otherwise, we bind the first argument to a fresh auxiliary variable, return it if its true, and otherwise return the or of the rest of the arguments.
Notice that since we introduce an auxiliary variable we could accidentally capture any use of a free variable in es ..., just like any compiler pass that introduces auxiliary variables. We use fresh as normal, since our representation of abstract locations are valid names.
- (vector e ...) and #(e ...)
- (quote s-expr)
`',v => v `'(,s-expr ,s-exprs ...) => `(cons ',s-expr '(,s-exprs ...)) - (begin e ...)
`(begin) => '(void) `(begin ,e) => e `(begin ,e ,es ...) => `(let ([,tmp ,e]) (if (error? ,tmp) ,tmp `(begin ,es ...))) We add a surface-level begin form that handles propagating errors correctly. This is essentially an implementation of do for the monad.
These rewrites are essentially syntax-case macros, recursive pattern-based macros that can perform computation at compile time.
Notice that each macro can generate more uses of macros. As a result, the recursive structure of this elaborate pass is going to be slightly different from other passes.
procedure
(expand-macros p) → exprs-lang-v9?
p : racketish-surface? Expand all Racketish-Surface macros.
1.14.3 Appendix: Overview
This graph does not integrate the Recursive Data chapter.