An Iki Compiler

Hey let’s look at a real compiler.

Getting Started

I wrote a compiler for the language Iki, which you can use as a starting point to write your own compiler. Here is how to get it.

  1. Make sure you have a recent version of Node.js installed.
  2. Clone or fork the project at https://github.com/rtoal/iki-compiler.
  3. Go to the project directory and do: npm install -d.
  4. Run the unit tests: npm test.

Trying it Out

Let’s run it!

$ ./iki.js
iki.js [-a] [-o] [-i] [--target [x86|c|js]] filename

Options:
  --help     Show help                                                 [boolean]
  --version  Show version number                                       [boolean]
  -a         show abstract syntax tree after parsing then stop         [boolean]
  -o         do optimizations                                          [boolean]
  -i         generate and show the intermediate code then stop         [boolean]
  --target   generate code for x86, C, or JavaScript             [default: "js"]

Ah, so the compiler gives you some insight into what it is doing! It looks like your standard compiler: first it geenrates the AST, then intermediate code (optionally optimized), and then finally target code. Let’s do them one at a time.

First, note the project you got from GitHub contains a few example programs. Look at the folder docs/examples. There’s a familiar program in there:

-- This doesn’t write hello world
    -- My first program
   var x :int  ;       var y   : int;
           -- uggh: lousy indentation   and     SPACing

         while y - 5 == 3 loop
       var y: int;
                      read x ,y;
      x = 2 * (3+y);
        end;
          write 5;    -- Not sure why five, but okay.

First, let’s get the AST:

$ ./iki.js -a docs/examples/ugly.iki
Program {
  block:
   Block {
     statements:
      [ VarDec { id: 'x', type: 'int' },
        VarDec { id: 'y', type: 'int' },
        WhileStatement {
          condition:
           BinaryExp {
             op: '==',
             left: BinaryExp { op: '-', left: VarExp { name: 'y' }, right: IntLit { value: '5' } },
             right: IntLit { value: '3' } },
          body:
           Block {
             statements:
              [ VarDec { id: 'y', type: 'int' },
                ReadStatement { varexps: [ VarExp { name: 'x' }, VarExp { name: 'y' } ] },
                AssignmentStatement {
                  target: VarExp { name: 'x' },
                  source:
                   BinaryExp {
                     op: '*',
                     left: IntLit { value: '2' },
                     right: BinaryExp { op: '+', left: IntLit { value: '3' }, right: VarExp { name: 'y' } } } } ] } },
        WriteStatement { expressions: [ IntLit { value: '5' } ] } ] } }

The compiler gives you a complete dump of its internal abstract syntax tree. Pictorially, that’s:

ikiinternalrep-syntaxonly.png

Normally, we draw ASTs without all the property names...something like this:

ikiast.png

The next phase of the compiler is the semantic analyzer walks the tree, making all kinds of checks, and tying things together, like mapping variable references to their corresponding variable declarations. You can actually get the compiler to dump the structure by executing:

$ ./iki.js -i docs/examples/ugly.iki
0 (Program) block=#1
1 (Block) statements=[#2,#4,#5,#24]
2 (VarDec) id='x' type=#3
3 (Type) name='int'
4 (VarDec) id='y' type=#3
5 (WhileStatement) condition=#6 body=#12
6 (BinaryExp) op='==' left=#7 right=#10 type=#11
7 (BinaryExp) op='-' left=#8 right=#9 type=#3
8 (VarExp) name='y' referent=#4 type=#3
9 (IntLit) value='5' type=#3
10 (IntLit) value='3' type=#3
11 (Type) name='Bool'
12 (Block) statements=[#13,#14,#17]
13 (VarDec) id='y' type=#3
14 (ReadStatement) varexps=[#15,#16]
15 (VarExp) name='x' referent=#2 type=#3
16 (VarExp) name='y' referent=#13 type=#3
17 (AssignmentStatement) target=#18 source=#19
18 (VarExp) name='x' referent=#2 type=#3
19 (BinaryExp) op='*' left=#20 right=#21 type=#3
20 (IntLit) value='2' type=#3
21 (BinaryExp) op='+' left=#22 right=#23 type=#3
22 (IntLit) value='3' type=#3
23 (VarExp) name='y' referent=#13 type=#3
24 (WriteStatement) expressions=[#25]
25 (IntLit) value='5' type=#3

If you were to draw out a picture, you’d see (where we’ve shown the additions and modifications made by the semantic analyzer in red):

ikiinternalrep.png

Next, we’ll look at code generation. The compiler currently understands three targets: JavaScript, C, and x86 assembly language. If you want to get extra credit, generate LLVM or JVM. Send me a pull request.

Here is the JavaScript generation:

$ ./iki.js docs/examples/ugly.iki
(function () {
  var x_1 = 0;
  var y_2 = 0;
  while (((y_2 - 5) === 3)) {
    var y_3 = 0;
    x_1 = prompt();
    y_3 = prompt();
    x_1 = (2 * (3 + y_3));
  }
  alert(5);
}());

You can also have it generate C:

$ ./iki.js --target=C docs/examples/ugly.iki
#include <stdio.h>
#include <stdbool.h>
int main() {
    int x_1 = 0;
    int y_2 = 0;
    while (((y_2 - 5) == 3)) {
        int y_3 = 0;
        scanf("%d\n", &x_1);
        scanf("%d\n", &y_3);
        x_1 = (2 * (3 + y_3));
    }
    printf("%d\n", 5);
    return 0;
}

or even assembly language (GAS for the Mac):

$ ./iki.js --target=x86 docs/examples/ugly.iki
        .globl  _main
        .text
_main:
        push    %rbp
L1:
        movq    _y_1(%rip), %rax
        subq    $5, %rax
        cmp     $3, %rax
        sete    %al
        movsbq  %al, %rax
        cmpq    $0, %rax
        je      L2
        mov     _x_2(%rip), %rsi
        lea     READ(%rip), %rdi
        xor     %rax, %rax
        call    _scanf
        mov     _y_3(%rip), %rsi
        lea     READ(%rip), %rdi
        xor     %rax, %rax
        call    _scanf
        movq    $2, %rax
        movq    $3, %rcx
        addq    _y_3(%rip), %rcx
        imulq   %rcx, %rax
        mov     %rax, _x_2(%rip)
        jmp     L1
L2:
        mov     $5, %rsi
        lea     WRITE(%rip), %rdi
        xor     %rax, %rax
        call    _printf
        pop     %rbp
        ret
        .data
READ:
        .ascii  "%d\0\0"
WRITE:
        .ascii  "%d\n\0"
_y_1:
        .quad   0
_x_2:
        .quad   0
_y_3:
        .quad   0

Architecture

Okay, so how do we write something like this? First, here is an overview of some of the files in the project:

.
├── README.md
├── ast
│   ├── __tests__
│   │   └── parser.test.js
│   ├── index.js
│   └── parser.js
├── backend
│   ├── cgenerator.js
│   ├── jsgenerator.js
│   └── x86generator.js
├── docs
│   └── examples
│       ├── everything.iki
│       ├── hello.iki
│       ├── optimizeme.iki
│       ├── simple.iki
│       └── ugly.iki
├── grammar
│   ├── __tests__
│   │   ├── grammar.test.js
│   │   └── grammar_errors.test.js
│   ├── iki.ohm
│   └── syntax-checker.js
├── iki.js
├── package-lock.json
├── package.json
└── semantics
    ├── __tests__
    │   ├── analyzer.test.js
    │   └── analyzer_errors.test.js
    ├── analyzer.js
    ├── builtins.js
    ├── viewer.js
    ├── context.js
    └── optimizer.js

How it Works

At this point, we just tour the source code during class, learning about JavaScript, Ohm, and perhaps other things along the way.

Your Turn

Now you should be ready to design your own language and write a little compiler for it.

Next Steps

Here are some things we’ll be doing next:

  1. Scale up to more complex languages.
  2. Learn about optimization.
  3. Learn about VMs.
  4. Learn us some theory along the way.