An interpreter runs code.
If you have machine code, your CPU is your interpreter. Sometimes people design abstract machines with very low-level code with simple, precise, “instructions” that, while more powerful than processor instructions, are super simple to write interpreters for.
But what about writing interpreters for higher-level languages? One way is to:
For very simple languages, you can also just interpret the AST directly. in other words:
If your language is really, really simple, you can:
Now we could write a front-end entirely by hand, and there are good pedagogical reasons for doing so! But we don’t have unlimited time, so we are going to use Ohm, which provides a way to generate the trees that we can interpret directly. I know right...that’s pretty amazing.
Here is an complete, quick-and-dirty interpreter for Ael, written with Ohm:
// An interpreter for Ael. // // Example usage: // // $ node ael-interpreter.js 'print 2; x = 22; print 1 + x / 2;' // 2 // 12 const ohm = require('ohm-js'); const aelGrammar = ohm.grammar(`Ael { Program = (Statement ";")+ Statement = id "=" Exp --assign | print Exp --print Exp = Exp "+" Term --plus | Exp "-" Term --minus | Term Term = Term "*" Factor --times | Term "/" Factor --divide | Factor Factor = "-" Primary --negate | Primary Primary = "(" Exp ")" --parens | number | id number = digit+ print = "print" ~alnum id = ~print letter alnum* }`); const memory = new Map(); // This language is so simple, we don't need an AST. const interpreter = aelGrammar.createSemantics().addOperation('exec', { Program(ss, _semicolons) { ss.exec(); }, Statement_assign(i, _eq, e) { memory.set(i.sourceString, e.eval()); }, Statement_print(_, e) { console.log(e.eval()); }, }).addOperation('eval', { Exp_plus(e, _op, t) { return e.eval() + t.eval(); }, Exp_minus(e, _op, t) { return e.eval() - t.eval(); }, Term_times(t, _op, f) { return t.eval() * f.eval(); }, Term_divide(t, _op, f) { return t.eval() / f.eval(); }, Factor_negate(_op, p) { return -p.eval(); }, Primary_parens(_open, e, _close) { return e.eval(); }, number(_chars) { return +this.sourceString; }, id(_firstChar, _restChars) { return memory.get(this.sourceString); }, }); const match = aelGrammar.match(process.argv[2]); if (match.succeeded()) { interpreter(match).exec(); } else { console.error(match.message); process.exitCode = 1; }
And here are some sample runs:
$ node ael-interpreter.js '8 * (22 - 3) / 4' 38 $ node ael-interpreter.js '8 * (22 ^ - 3) / 4' Line 1, col 9: > 1 | 8 * (22 ^ - 3) / 4 ^ Expected ")" $ node ael-interpreter.js '8 (' Line 1, col 3: > 1 | 8 ( ^ Expected end of input
We’ll go over this interpreter in great detail in class, and in doing so, learn a lot about Ohm. A few notes on Ohm follow to give you a quick introduction, but please note they are not intended to be a substitute for you reading all about Ohm on your own, because we’re going to use it to build big languages later on.
Ohm is a library for building parsers, interpreters, compilers, and more. With it you can quickly generate parsers for languages specified with PEGs. There are a few basics that are good to know:
npm install ohm-js
ohm
object via const ohm = require('ohm-js');
|
for /
and =
for ←
). Unlike regular PEGs, Ohm rules can be left-recursive. In addition, Ohm allows some fancy stuff, like parameterized rules and grammar inheritance. Details are in the syntax reference, which you need to eventually read. But as you may be impatient, here are the Ohm parsing expressions:
Expression | Description |
---|---|
" ..." | A literal string. Allows the usual escape chars \' , \" , \b , \n , \r , \t , \f , \\ , \x $hh$, \u $hhhh$.
|
$rulename$ | Just a rule. |
$rulename$< $params$> | A rule with parameters. |
$e$* | $e$, zero or more times. |
$e$+ | $e$, one or more times. |
$e$? | $e$, zero or one times. |
$e_1\;e_2$ | Match $e_1$ then match $e_2$. |
$e_1\;|\;e_2$ | If $e_1$ matches, great. Otherwise try to match $e_2$. |
& $e$ | Match $e$ without consuming it (lookahead). |
~ $e$ | Succeed if $e$ doesn’t match, without consuming anything (negative lookahead). |
# $e$ | Lexify a syntactic rule. |
space
to be implicitly interspersed with other tokens.
any =
any char at all
end =
the end of the input
space = " " | "\x09".."\x0d" | "\ufeff" |
any char from Unicode category Zs
digit = "0".."9"
lower =
any char from Unicode category Ll
upper =
any char from Unicode category Lu
unicodeLtmo =
any char from Unicode category Lt, Lm, or Lo
letter = lower | upper | unicodeLtmo
alnum = letter | digit
hexDigit = digit | "A".."F" | "a".."f"
caseInsensitive<t> =
any case insensitve version of t
emptyListOf<elem, sep> = /* nothing */
nonemptyListOf<elem, sep> = elem (sep elem)*
listOf<elem, sep> = nonemptyListOf<elem, sep> | emptyListOf<elem, sep>
EmptyListOf<elem, sep> = /* nothing */
NonemptyListOf<elem, sep> = elem (sep elem)*
ListOf<elem, sep> = NonemptyListOf<elem, sep> | EmptyListOf<elem, sep>
Exp = Exp "+" Term --plus | Exp "-" Term --minus | Term
is an abbreviation for:
Exp = Exp_plus | Exp_minus | Term Exp_plus = Exp "+" Term Exp_minus = Exp "-" Term
=
to add a new rule
:=
to replace the rule from the supergrammar
+=
to add an alternative to the rule from the supergrammar
.match(
$c$)
returns a MatchResult
object, $m$. If $m$.succeeded()
is true, $m$ wraps a parse tree, and you can apply a semantics object to it. If $m$.failed()
is true, a detailed error message will be in $m$.message
and a smaller error message will be in $m$.shortMessage
.
These notes are not exhaustive. There is much more to Ohm. Check out the documentation.