An Interpreter for Ael

Time to write an interpreter.

Organizing an Interpreter

An interpreter runs code.

If you have machine code, your CPU is your interpreter. Sometimes people design abstract machines with very low-level code with simple, precise, “instructions” that, while more powerful than processor instructions, are super simple to write interpreters for.

But what about writing interpreters for higher-level languages? One way is to:

write a front-end to produce the AST, then
write a back-end to produce (abstract or real) machine code from the AST, then
interpret the low-level code.

For very simple languages, you can also just interpret the AST directly. in other words:

write a front-end to produce the AST, then
interpret the AST.

If your language is really, really simple, you can:

Run the code while you parse.

Now we could write a front-end entirely by hand, and there are good pedagogical reasons for doing so! But we don’t have unlimited time, so we are going to use Ohm, which provides a way to generate the trees that we can interpret directly. I know right...that’s pretty amazing.

Writing an Interpreter with Ohm

Here is an complete, quick-and-dirty interpreter for Ael, written with Ohm:

ael-interpreter.js

// An interpreter for Ael.
//
// Example usage:
//
//   $ node ael-interpreter.js 'print 2; x = 22; print 1 + x / 2;'
//   2
//   12


const ohm = require('ohm-js');

const aelGrammar = ohm.grammar(`Ael {
  Program = (Statement ";")+
  Statement = id "=" Exp       --assign
            | print Exp        --print
  Exp       = Exp "+" Term     --plus
            | Exp "-" Term     --minus
            | Term
  Term      = Term "*" Factor  --times
            | Term "/" Factor  --divide
            | Factor
  Factor    = "-" Primary      --negate
            | Primary
  Primary   = "(" Exp ")"      --parens
            | number
            | id
  number    = digit+
  print     = "print" ~alnum
  id        = ~print letter alnum*
}`);

const memory = new Map();

// This language is so simple, we don't need an AST.
const interpreter = aelGrammar.createSemantics().addOperation('exec', {
  Program(ss, _semicolons) { ss.exec(); },
  Statement_assign(i, _eq, e) { memory.set(i.sourceString, e.eval()); },
  Statement_print(_, e) { console.log(e.eval()); },
}).addOperation('eval', {
  Exp_plus(e, _op, t) { return e.eval() + t.eval(); },
  Exp_minus(e, _op, t) { return e.eval() - t.eval(); },
  Term_times(t, _op, f) { return t.eval() * f.eval(); },
  Term_divide(t, _op, f) { return t.eval() / f.eval(); },
  Factor_negate(_op, p) { return -p.eval(); },
  Primary_parens(_open, e, _close) { return e.eval(); },
  number(_chars) { return +this.sourceString; },
  id(_firstChar, _restChars) { return memory.get(this.sourceString); },
});

const match = aelGrammar.match(process.argv[2]);
if (match.succeeded()) {
  interpreter(match).exec();
} else {
  console.error(match.message);
  process.exitCode = 1;
}

And here are some sample runs:

$ node ael-interpreter.js '8 *    (22   - 3) / 4'
38
$ node ael-interpreter.js '8 * (22 ^  - 3) / 4'
Line 1, col 9:
> 1 | 8 * (22 ^  - 3) / 4
              ^
Expected ")"
$ node ael-interpreter.js '8 ('
Line 1, col 3:
> 1 | 8 (
        ^
Expected end of input

We’ll go over this interpreter in great detail in class, and in doing so, learn a lot about Ohm. A few notes on Ohm follow to give you a quick introduction, but please note they are not intended to be a substitute for you reading all about Ohm on your own, because we’re going to use it to build big languages later on.

A Few Notes on Ohm

Ohm is a library for building parsers, interpreters, compilers, and more. With it you can quickly generate parsers for languages specified with PEGs. There are a few basics that are good to know:

Install from npm, of course: npm install ohm-js
In your program, obtain the ohm object via const ohm = require('ohm-js');

Write grammars as you would a regular PEG, but with a few differences (like | for / and = for ←). Unlike regular PEGs, Ohm rules can be left-recursive. In addition, Ohm allows some fancy stuff, like parameterized rules and grammar inheritance. Details are in the syntax reference, which you need to eventually read. But as you may be impatient, here are the Ohm parsing expressions:

Expression	Description
`"`...`"`	A literal string. Allows the usual escape chars `\'`, `\"`, `\b`, `\n`, `\r`, `\t`, `\f`, `\\`, `\x`$hh$, `\u`$hhhh$.
$rulename$	Just a rule.
$rulename$`<`$params$`>`	A rule with parameters.
$e$`*`	$e$, zero or more times.
$e$`+`	$e$, one or more times.
$e$`?`	$e$, zero or one times.
$e_1\;e_2$	Match $e_1$ then match $e_2$.
$e_1\;\|\;e_2$	If $e_1$ matches, great. Otherwise try to match $e_2$.
`&`$e$	Match $e$ without consuming it (lookahead).
`~`$e$	Succeed if $e$ doesn’t match, without consuming anything (negative lookahead).
`#`$e$	Lexify a syntactic rule.

Note how lexical rules always start with lower case letters and syntax rules start with uppercase letters. Syntax rules allow tokens from the class space to be implicitly interspersed with other tokens.
There are several built-in rules:
- any = any char at all
- end = the end of the input
- space = " " | "\x09".."\x0d" | "\ufeff" | any char from Unicode category Zs
- digit = "0".."9"
- lower = any char from Unicode category Ll
- upper = any char from Unicode category Lu
- unicodeLtmo = any char from Unicode category Lt, Lm, or Lo
- letter = lower | upper | unicodeLtmo
- alnum = letter | digit
- hexDigit = digit | "A".."F" | "a".."f"
- caseInsensitive<t> = any case insensitve version of t
- emptyListOf<elem, sep> = /* nothing */
- nonemptyListOf<elem, sep> = elem (sep elem)*
- listOf<elem, sep> = nonemptyListOf<elem, sep> | emptyListOf<elem, sep>
- EmptyListOf<elem, sep> = /* nothing */
- NonemptyListOf<elem, sep> = elem (sep elem)*
- ListOf<elem, sep> = NonemptyListOf<elem, sep> | EmptyListOf<elem, sep>
All alternatives for a rule must have the same arity. But there is a cool short-hand to make it look like alternatives with different arities exist. For example:
```
Exp = Exp "+" Term  --plus
    | Exp "-" Term  --minus
    | Term
```
is an abbreviation for:
```
Exp       = Exp_plus | Exp_minus | Term
Exp_plus  = Exp "+" Term
Exp_minus = Exp "-" Term
```
Grammars can inherit from other grammars. When inheriting, use:
- = to add a new rule
- := to replace the rule from the supergrammar
- += to add an alternative to the rule from the supergrammar
Any grammar you write yourself automatically inherits from the base Ohm grammar, where the built-in rules are defined.
Given a grammar $g$ and source code $c$, $g$.match($c$) returns a MatchResult object, $m$. If $m$.succeeded() is true, $m$ wraps a parse tree, and you can apply a semantics object to it. If $m$.failed() is true, a detailed error message will be in $m$.message and a smaller error message will be in $m$.shortMessage.
You can add multiple semantics objects to grammar objects. A semantics is a collection of operations and attributes. Operations and attributes are applied to each node in the parse tree. Operations and attributes are similar: operations are functions (called every time), and attributes are memoized and called with property syntax. More on these when we get to more advanced uses.
There are default things to do when a semantics object doesn’t have operations or attributes for all the rules. We’ll get to these cases as needed.

These notes are not exhaustive. There is much more to Ohm. Check out the documentation.