1.14 Syntactic Sugar

1.14.1 Preface: What’s wrong with Racketish?

Exprs-lang v9 is a good start at a core language. You could start building standard libraries and get to work. But it’s missing a handful of nice features. Most languages support various complex data literals, such as literals for lists, array, or other structured data. They also support features you cannot encode as functions, like short-circuiting boolean operations and and or.

These get elaborated very early. In Racket, these are mostly implemented as macros, a user-defined function that transformer source syntax. We’ll design a compiler pass that isn’t quite a macro system, but should give you an idea of how a macro system works, and give you the ability to add new syntactic sugar to the language.

Below, we define Racketish-Surface.

p	::=	(module (define x (lambda (x ...) value)) ... value)

value	::=	triv
	\|	(quote s-expr)
	\|	(value value ...)
	\|	(macro-id value ...)
	\|	(let ([x value] ...) value)
	\|	(if value value value)
	\|	(call value value ...)

triv	::=	fixnum
	\|	prim-f
	\|	x
	\|	#t
	\|	#f
	\|	empty
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) value)
	\|	vec-literal

x	::=	name?
	\|	prim-f

s-expr	::=	#t
	\|	#f
	\|	fixnum
	\|	ascii-char-literal
	\|	(s-expr ...)

macro-id	::=	vector
	\|	and
	\|	or
	\|	begin

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

vec-literal	::=	host:vector?

We add two special syntaxes, (quote s-expr) for quoted lists, and (vector value ...) for vector literals. Note that our language does not have symbols, so our s-expr differs from Racket’s. The only thing to know is that this takes an arbitrary number of arguments.

We also add two reader notations to the surface language. ’s-expr is a reader macro in Racket and is implicitly elaborated to (quote s-expr). #(value ...) is the notation for vector literals, and is equivalent to (vector (quote value) ...), but not syntactically identical in the same way ’s-expr and (quote s-expr) are.

Because Racket’s reader supports them, we do not have to do much to add them! We avoid implementing a parser once more (huzzah!). This is one interesting feature of bootstrapping—defining a language in itself. When the meta and object languages share features, merely using a compiler that already implements the feature implements the feature.

Parsing is often described as a solved problem, but it is not, and is easy to get wrong, and doing a bad job can summon the dark gods to our realm. We do not want to summon dark gods, so we worked directly with data in the host language, and it was fine.

You can see the two are equivalent in Racket by typing the following in to the REPL or in the interactions frame in DrRacket:

'(1 2 3)

(quote (1 2 3))

#(1 2 3)

(vector 1 2 3)

I can’t typeset these examples using my normal REPL typesetting feature since Racket implicitly renders them identically.

Using match on this notation requires a little care, since our program is itself a quoted datum in Racket, and match treats them slightly differently.

(match '(quote (1 2 3))
  [`(quote ,es) es])

(match ''(1 2 3)
  [`(quote ,es) es])

(match '#(1 2 3)
  [`#(,es ...) es])

(match '#(1 2 3)
  [(vector es ...) es])

;; Fails to match
(match '#(1 2 3)
  [`(vector ,es ...) es])

We also introduce a new form for applying macros, and add a few macros. We also make procedure calls implicit; essentially, anything that looks like an application form is either a macro application, or, is implicitly a procedure call.

1.14.2 Implementing Syntactic Sugar

Macro expansion is the first step. Ideally, this is designed to be extensible, so we can add new macros at will, or even allow the user to define macros.

Our target language is Exprs-lang v9

Diff vs Source

p	::=	(module (define x (lambda (x ...) value)) ... value)

value	::=	triv
	\|	(quote s-expr)
	\|	(value value ...)
	\|	(macro-id value ...)
	\|	(let ([x value] ...) value)
	\|	(if value value value)
	\|	(call value value ...)

triv	::=	x
	\|	fixnum
	\|	prim-f
	\|	x
	\|	#t
	\|	#f
	\|	empty
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) value)
	\|	vec-literal

x	::=	name?
	\|	prim-f

prim-f	::=	binop
	\|	unop

binop	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

s-expr	::=	#t
	\|	#f
	\|	fixnum
	\|	ascii-char-literal
	\|	(s-expr ...)

macro-id	::=	vector
	\|	and
	\|	or
	\|	begin

unop prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

fixnum	::=	int61?

uint8	::=	uint8?

ascii-char-literal	::=	ascii-char-literal?

vec-literal	::=	host:vector?

All macros are elaborated to existing features. We also elaborate implicit procedure call into explicit call.

Notice that the vector literal notation embeds an actual Racket vector in our AST.

Example:

> (vector-ref (second '(module #(1 2 3))) 0)
1

This vector is required to only contain values from Racketish-Surface.

And the quoted literal notation actually expands to the symbol quote, although it always gets printed as a tick:

Examples:

> '(module '(#t #f))
'(module '(#t #f))
> (first (second '(module '(#t #f))))
'quote

The transformations we want to define are given below:

(and v ...)
`(and) => #t
`(and ,e) => e
`(and ,e ,es ...) => `(if ,e (and ,es ...) #f)
When and is given no arguments, we expand to #t, since #t is the multiplicative identity element on booleans (like 1 is for integers and multiplication). When given a single argument, we return it, since Racket is falsey. Otherwise, we elaborate to branch on the first argument, and recursively expand and applied to the rest of its arguments.
(or v ...)
`(or) => #f
`(or ,e) => e
`(or ,e ,es ...) => `(let ([,tmp ,e]) (if ,tmp ,tmp (or ,es ...)))
When or is given no arguments, we expand to #f, since #f is the additive identity element on booleans (like 0 is for integers and addition). When given a single argument, we return it, since Racket is falsey. Otherwise, we bind the first argument to a fresh auxiliary variable, return it if it’s true, and otherwise return the or of the rest of the arguments.
Notice that since we introduce an auxiliary variable we could accidentally capture any use of a free variable in es ..., just like any compiler pass that introduces auxiliary variables. We use fresh as normal, since our representation of abstract locations are valid names.

(vector e ...) and #(e ...)

`(vector ,es ...)
=>
`(let ([,tmp (make-vector ,(length es))])
   (begin
     (vector-set! ,tmp 0 ,(list-ref es 0))
     ...
     (vector-set! ,tmp ,(sub1 (length es)) ,(list-ref es (sub1 (length es))))
     tmp))

(quote s-expr)
`',v => v
`'(,s-expr ,s-exprs ...)
=>
`(cons ',s-expr '(,s-exprs ...))
(begin e ...)
`(begin) => '(void)
`(begin ,e) => e
`(begin ,e ,es ...)
=>
`(let ([,tmp ,e])
   (if (error? ,tmp)
       ,tmp
       `(begin ,es ...)))
We add a surface-level begin form that handles propagating errors correctly. This is essentially an implementation of do for the monad.

These rewrites are essentially syntax-case macros, recursive pattern-based macros that can perform computation at compile time.

Notice that each macro can generate more uses of macros. As a result, the recursive structure of this elaborate pass is going to be slightly different from other passes.

procedure
(expand-macros p) → exprs-lang-v9?
p : racketish-surface?
Expand all Racketish-Surface macros.

1.14.3 Appendix: Overview

This graph does not integrate the Recursive Data chapter.

Figure 11: Overview of Compiler Version 10

contents ← prev up next →

1	Book
2	Project Logistics
3	Project Iterations
4	CPSC 411 Infrastructure Package
5	Credits

1.1	A Compiler Begins with a Language
1.2	Abstracting Boilerplate (v1)
1.3	Abstract Locations
1.4	Value Orientation
1.5	Register Allocation
1.6	Structured Control Flow
1.7	Procedural Abstraction: Call
1.8	Procedural Abstraction: Return
1.9	Algebraic Expressions
1.10	Data types: Immediates
1.11	Data types: Structured Data and Heap Allocation
1.12	First-class Procedures: Code is Data
1.13	Recursive Data
1.14	Syntactic Sugar
1.15	Appendix: Racket Preliminaries
1.16	Appendix: Compiler Design Recipe

1.14.1	Preface: What’s wrong with Racketish?
1.14.2	Implementing Syntactic Sugar
1.14.3	Appendix: Overview