Introduction to Code Generation

What about the back half of the compiler?

What is Code Generation?

The first part of a compiler analyzes the source code into a structure that carries the meaning of the program; this structure is generally the abstract syntax tree that’s been checked and decorated. (Remember decorated means all identifier references have been resolved.)

From this structure we can generate the corresponding code in some other language, the target language. This is what a code generator does.

Some compilers generate twice: they first generate code in some “intermediate language” like SIL, LLVM IR, HIR, MIR, CIL, etc. Then they do the “real” code generation into a target language that is directly runnable (or really close to it), like virtual machine code, assembly language, or machine language.

Goals of Code Generation

Here are the three goals of any good code generator:

Exercise: The last two goals are seemingly in conflict. Discuss.

What Kind of Code Can We Generate?

There is no one way to do code generation. There is no one target of a code generator. A code generator can produce:

Are there other kinds of code?

Transpiling

JavaScript makes a great target language? Why? It’s got first-class functions and async support, and that Node.js thing runs on that awesome V8. And the ecosystem! How can we target JS?

Start by browsing code of simple examples. Each start with a decorated AST (there’s pretty much no need to use an IR if you are targeting a high-level language):

What did you notice from browsing these examples? A few tips and tricks, right? Things like:

Virtual Machines and Backend Language Libraries

LLVM

LLVM is, well, it’s huge, it’s impressive, and lots of compilers for modern languages are using it. It is a library with a ton of stuff, including an intermediate language and backends. It’s official name is The LLVM Compiler Infrastructure Project.

A take on modern compiler construction from Andi McClure:

llvm-humor.png

Exercise: Read the whole blog post from whence came that awesome picture.

JVM

Read about the JVM at Wikipedia and about its formal specification.

Web Assembly

Read about Web Assembly

Exercise: Explain WASM, briefly, in your own words.

Write your own VM!

SERIOUSLY. DO IT.

Targeting a Typical Processor

Okay let’s say you’re going to bypass LLVM or anything else and you are going straight to assembly. Or, let’s say you are joining the LLVM team and want to enhance the project. Or you want to write a new backend for an intermediate representation. Lots of good stuff to know. Here are some notes that just scratch the surface of generating assembly language.

Address Assignment

TODO - Fortran no recursion local vars, even return val on stack

TODO - C, show assembly for recursive function

TODO - C, show space for local vars

TODO - motivate stack frames

TODO - motivate dynamic links

TODO - Three approaches to static scope

TODO - Static link

TODO - Single display

TODO - Display in every frame

TODO - No need for display or static link in leaf subroutine

TODO - Who saves the registers

TODO - Inlining?

TODO - Summary of questions and concerns

Instruction Selection

TODO