I wrote a compiler for the language Iki, which you can use as a starting point to write your own compiler. Here is how to get it.
npm install -d
.
npm test
.
Let’s run it!
$ ./iki.js iki.js [-a] [-o] [-i] [--target [x86|c|js]] filename Options: --help Show help [boolean] --version Show version number [boolean] -a show abstract syntax tree after parsing then stop [boolean] -o do optimizations [boolean] -i generate and show the intermediate code then stop [boolean] --target generate code for x86, C, or JavaScript [default: "js"]
Ah, so the compiler gives you some insight into what it is doing! It looks like your standard compiler: first it geenrates the AST, then intermediate code (optionally optimized), and then finally target code. Let’s do them one at a time.
First, note the project you got from GitHub contains a few example programs. Look at the folder docs/examples. There’s a familiar program in there:
-- This doesn’t write hello world -- My first program var x :int ; var y : int; -- uggh: lousy indentation and SPACing while y - 5 == 3 loop var y: int; read x ,y; x = 2 * (3+y); end; write 5; -- Not sure why five, but okay.
First, let’s get the AST:
$ ./iki.js -a docs/examples/ugly.iki Program { block: Block { statements: [ VarDec { id: 'x', type: 'int' }, VarDec { id: 'y', type: 'int' }, WhileStatement { condition: BinaryExp { op: '==', left: BinaryExp { op: '-', left: VarExp { name: 'y' }, right: IntLit { value: '5' } }, right: IntLit { value: '3' } }, body: Block { statements: [ VarDec { id: 'y', type: 'int' }, ReadStatement { varexps: [ VarExp { name: 'x' }, VarExp { name: 'y' } ] }, AssignmentStatement { target: VarExp { name: 'x' }, source: BinaryExp { op: '*', left: IntLit { value: '2' }, right: BinaryExp { op: '+', left: IntLit { value: '3' }, right: VarExp { name: 'y' } } } } ] } }, WriteStatement { expressions: [ IntLit { value: '5' } ] } ] } }
The compiler gives you a complete dump of its internal abstract syntax tree. Pictorially, that’s:
Normally, we draw ASTs without all the property names...something like this:
The next phase of the compiler is the semantic analyzer walks the tree, making all kinds of checks, and tying things together, like mapping variable references to their corresponding variable declarations. You can actually get the compiler to dump the structure by executing:
$ ./iki.js -i docs/examples/ugly.iki 0 (Program) block=#1 1 (Block) statements=[#2,#4,#5,#24] 2 (VarDec) id='x' type=#3 3 (Type) name='int' 4 (VarDec) id='y' type=#3 5 (WhileStatement) condition=#6 body=#12 6 (BinaryExp) op='==' left=#7 right=#10 type=#11 7 (BinaryExp) op='-' left=#8 right=#9 type=#3 8 (VarExp) name='y' referent=#4 type=#3 9 (IntLit) value='5' type=#3 10 (IntLit) value='3' type=#3 11 (Type) name='Bool' 12 (Block) statements=[#13,#14,#17] 13 (VarDec) id='y' type=#3 14 (ReadStatement) varexps=[#15,#16] 15 (VarExp) name='x' referent=#2 type=#3 16 (VarExp) name='y' referent=#13 type=#3 17 (AssignmentStatement) target=#18 source=#19 18 (VarExp) name='x' referent=#2 type=#3 19 (BinaryExp) op='*' left=#20 right=#21 type=#3 20 (IntLit) value='2' type=#3 21 (BinaryExp) op='+' left=#22 right=#23 type=#3 22 (IntLit) value='3' type=#3 23 (VarExp) name='y' referent=#13 type=#3 24 (WriteStatement) expressions=[#25] 25 (IntLit) value='5' type=#3
If you were to draw out a picture, you’d see (where we’ve shown the additions and modifications made by the semantic analyzer in red):
Next, we’ll look at code generation. The compiler currently understands three targets: JavaScript, C, and x86 assembly language. If you want to get extra credit, generate LLVM or JVM. Send me a pull request.
Here is the JavaScript generation:
$ ./iki.js docs/examples/ugly.iki (function () { var x_1 = 0; var y_2 = 0; while (((y_2 - 5) === 3)) { var y_3 = 0; x_1 = prompt(); y_3 = prompt(); x_1 = (2 * (3 + y_3)); } alert(5); }());
You can also have it generate C:
$ ./iki.js --target=C docs/examples/ugly.iki #include <stdio.h> #include <stdbool.h> int main() { int x_1 = 0; int y_2 = 0; while (((y_2 - 5) == 3)) { int y_3 = 0; scanf("%d\n", &x_1); scanf("%d\n", &y_3); x_1 = (2 * (3 + y_3)); } printf("%d\n", 5); return 0; }
or even assembly language (GAS for the Mac):
$ ./iki.js --target=x86 docs/examples/ugly.iki .globl _main .text _main: push %rbp L1: movq _y_1(%rip), %rax subq $5, %rax cmp $3, %rax sete %al movsbq %al, %rax cmpq $0, %rax je L2 mov _x_2(%rip), %rsi lea READ(%rip), %rdi xor %rax, %rax call _scanf mov _y_3(%rip), %rsi lea READ(%rip), %rdi xor %rax, %rax call _scanf movq $2, %rax movq $3, %rcx addq _y_3(%rip), %rcx imulq %rcx, %rax mov %rax, _x_2(%rip) jmp L1 L2: mov $5, %rsi lea WRITE(%rip), %rdi xor %rax, %rax call _printf pop %rbp ret .data READ: .ascii "%d\0\0" WRITE: .ascii "%d\n\0" _y_1: .quad 0 _x_2: .quad 0 _y_3: .quad 0
Okay, so how do we write something like this? First, here is an overview of some of the files in the project:
. ├── README.md ├── ast │ ├── __tests__ │ │ └── parser.test.js │ ├── index.js │ └── parser.js ├── backend │ ├── cgenerator.js │ ├── jsgenerator.js │ └── x86generator.js ├── docs │ └── examples │ ├── everything.iki │ ├── hello.iki │ ├── optimizeme.iki │ ├── simple.iki │ └── ugly.iki ├── grammar │ ├── __tests__ │ │ ├── grammar.test.js │ │ └── grammar_errors.test.js │ ├── iki.ohm │ └── syntax-checker.js ├── iki.js ├── package-lock.json ├── package.json └── semantics ├── __tests__ │ ├── analyzer.test.js │ └── analyzer_errors.test.js ├── analyzer.js ├── builtins.js ├── viewer.js ├── context.js └── optimizer.js
At this point, we just tour the source code during class, learning about JavaScript, Ohm, and perhaps other things along the way.
Now you should be ready to design your own language and write a little compiler for it.
Here are some things we’ll be doing next: