The Language Bella

Bella is the second of five languages designed for a compiler course.

1  Introduction

Bella is a simple programming language with interesting features that make it a great fit for learning (1) compiler and interpreter writing, and (2) formal language semantics.

The language is limited to numbers, assignments, conditionals, loops, and simple functions. Variables can only hold numbers, not functions. Function “bodies” are limited to simple expressions (like Python lambdas), so we don’t have to deal with any scoping complications like temporal dead zones. Scoping is as in Python: local variables and parameters are function-scoped, and shadowing is allowed. Bella is a friendly, simple language, relatively easy to implement.

This document defines the language Bella.

Photo of Bella

2  Language Description

2.1  Programs

A programis a sequence of one or more statements.

let dozen = 12;                                   // variable declaration
print dozen % 3 ** 1;                             // print statement
function gcd(x, y) = y == 0 ? x : gcd(y, x % y);  // function declaration
while dozen >= 3 || (gcd(1, 10) != 5) {           // while statement
  dozen = dozen - 2.75E+19 ** 1 ** 3;             // assignment statement
}

Apologies for the semicolons, but they do make the language somewhat easier to parse.

2.2  Values and Types

All values in Bella are instances of a type. The language has the following types:

Numbers are first-class values, meaning they can be stored in variables, passed to functions, and returned from functions. Functions cannot be: the only thing one can do with a function is call it.

Numbers are written as in JavaScript:

2
2.0
55.9
819.999e-15
2E+10
5.89999e2

2.3  Declarations

A declaration binds an identifier to an entity. There are three kinds of declarations:

Here is a short example showing all three kinds:

let sister = 5 + 1;          // variable declaration of sister (Ⅴ + Ⅰ = Ⅵ)
function triple(x) = x * 3;  // function declaration of triple
                             // ...and parameter declaration of x

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. In Bella, shadowing of top-level variables by parameters of local functions is permitted, so the visible region of a declaration is the scope minus any "inner" scopes of declarations of identifiers with the same name and may therefore be discontinuous.

The identifiers appearing as defining occurrences of variable and function declarations must be mutually exclusive. A function’s parameters must be mutually exclusive as well:

let x = 3;
// let x = 5;               // ERROR: x already declared
// function x() = 0;        // ERROR: x already declared
// function g(a, a) = 0;    // ERROR: a already declared
function h(x) = 0;          // OK! parameter x shadows global x
// let h = 3;               // ERROR: h already declared (prev line)
function j(k) = 100 - x;    // OK
let k = 2;                  // OK! Parameter k on prev line not in scope

2.4  Functions

As a simple language, Bella does not have any anonymous functions; all functions must be named. The identifier used in the function declaration is read-only variable: you cannot reassign it:

function successor(n) = n + 1;
// let successor = 5;            // ERROR: read only

Functions can only be called. They cannot be used in a context where a number is expected:

function triple (x) = x * 3;
print triple(8);                  // Function call of triple, perfectly fine
// print triple                   // ERROR
// let t = triple;                // ERROR
// let times_nine = triple * 3    // ERROR

Functions declared with $n$ parameters must be passed exactly $n$ arguments when called.

2.5  Variables

A variable is something that stores a value. Variables come into existence either (1) as regular variables declared in a variable declaration statement (using let), or (2) as parameters in a parameter declaration. There is one pre-declared variable, $\pi$. Variables can only be used if previously declared.

The variable π is read-only and all other variables are writable. Interestingly, because parameters are scoped only to the function body and function bodies are simple expressions, all parameters are effectively read-only.

Variables can store numeric values only, not functions.

2.6  Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

2.7  Expressions

An expression produces a numeric. The Bella expressions are:

3  Standard Library

The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program.

4  Formal Syntax

The source of a Bella program is a Unicode string. Here is the syntax given as an Ohm grammar:

bella.ohm
Bella {
  Program   = Statement+
  Statement = let id "=" Exp ";"                        -- vardec
            | function id Params "=" Exp ";"            -- fundec
            | Exp7_id "=" Exp ";"                       -- assign
            | print Exp ";"                             -- print
            | while Exp Block                           -- while
  Params    = "(" ListOf<id, ","> ")"
  Block     = "{" Statement* "}"

  Exp       = ("-" | "!") Exp7                          -- unary
            | Exp1 "?" Exp1 ":" Exp                     -- ternary
            | Exp1
  Exp1      = Exp1 "||" Exp2                            -- binary
            | Exp2
  Exp2      = Exp2 "&&" Exp3                            -- binary
            | Exp3
  Exp3      = Exp4 ("<="|"<"|"=="|"!="|">="|">") Exp4   -- binary
            | Exp4
  Exp4      = Exp4 ("+" | "-") Exp5                     -- binary
            | Exp5
  Exp5      = Exp5 ("*" | "/" | "%") Exp6               -- binary
            | Exp6
  Exp6      = Exp7 "**" Exp6                            -- binary
            | Exp7
  Exp7      = num
            | true
            | false
            | id "(" ListOf<Exp, ","> ")"               -- call
            | id                                        -- id
            | "(" Exp ")"                               -- parens

  let       = "let" ~idchar
  function  = "function" ~idchar
  while     = "while" ~idchar
  true      = "true" ~idchar
  false     = "false" ~idchar
  print     = "print" ~idchar
  keyword   = let | function | while | true | false
  num       = digit+ ("." digit+)? (("E" | "e") ("+" | "-")? digit+)?
  id        = ~keyword letter idchar*
  idchar    = letter | digit | "_"
  space    += "//" (~"\n" any)*                         -- comment
}

5  Formal Semantics

Here is an operational semantics of Bella. It is based off of the following abstract syntax:

$ \begin{array}{l} n\!: \mathsf{Numeral} \\ i\!: \mathsf{Identifier} \\ e\!: \mathsf{Expression} = n \;|\; i \;|\; \mathtt{true} \;|\; \mathtt{false} \;|\; \mathit{unaryop} \; e \;|\; e_1 \; \mathit{binop} \; e_2 \;|\; i \; e^* \;|\; e \; \mathtt{?} \; e_1 \; \mathtt{:} \; e_2 \\ s\!: \mathsf{Statement} = \mathtt{let}\;i = e \;|\; \mathtt{func}\;i\;i^*=e \;|\; i = e \;|\; \mathtt{print}\;e \;|\; \mathtt{while}\;e\;b \\ b\!: \mathsf{Block} = \mathtt{block}\; s^* \\ p\!: \mathsf{Program} = \mathtt{program}\; b \end{array} $

The unary operators are - and !. The binary operators are +, -, *, /, %, **, <, <=, ==, !=, >=, >, &&, and ||.

The meaning of a Bella program at runtime is the list of values it prints. To formally specify this behavior, we also have to define the meanings of statements and expressions. We do this with the help of a memory, which maps identifiers to their runtime values, and the output, which is the list of values output so far. Each identifier in the memory is mapped to either (1) a regular variables, (2) a user-defined function, (3) a built-in, or “standard” function, or (4) the distinguished value UNDEF, which is how we indicate that a variable has not yet been declared or defined. Variables have an mutability status (RO or RW), and functions store the number of defined parameters, so the number of arguments can be checked at call time. Each statement is executed in the context of a state, which is the current memory together with the output so far. Expressions need only be evaluated in the context of the current memory, as they do not read nor modify the output.

$\begin{array}{l} \mathsf{Mem}_{\mathit{def}}\;=\;\mathsf{Identifier} \rightarrow \\ \;\;\;\;\;\;\;\; \mathsf{UNDEF} \; \cup \\ \;\;\;\;\;\;\;\; (\{\mathsf{NUM}\} \times \mathbb{R} \times \{\mathsf{RO},\mathsf{RW}\}) \; \cup \\ \;\;\;\;\;\;\;\; (\{\mathsf{FUNC}\} \times \mathsf{Identifier}^* \times \mathsf{Expression}) \; \cup \\ \;\;\;\;\;\;\;\; (\{\mathsf{FUNC}\} \times (\mathbb{R}^* \rightarrow \mathbb{R}) \times \mathbb{N}) \\ \mathsf{Output}_{\mathit{def}}\;=\;\mathbb{R}^* \\ \mathsf{State}_{\mathit{def}}\;=\;\mathsf{Mem \times Output} \\ \\ \Longrightarrow_E \; \subseteq \; (\mathsf{Exp} \times \mathsf{Mem}) \times \mathsf{Num} \\ \Longrightarrow_S \; \subseteq \; (\mathsf{Statement} \times \mathsf{State}) \times \mathsf{State} \\ \Longrightarrow_B \; \subseteq \; (\mathsf{Block} \times \mathsf{State}) \times \mathsf{State} \\ \Longrightarrow_P \; \subseteq \; \mathsf{Program} \times \mathsf{Output} \end{array}$

As is typical in formal semantic specifications, we will drop the subscripts on the $\Longrightarrow$ relations as we can infer them from context.

Here are the definition of the operational semantic relations.

$$\frac{}{\langle [\![n]\!],m \rangle \Longrightarrow n}$$
$$\frac{}{\langle [\![\mathtt{true}]\!],m\rangle \Longrightarrow 1}$$
$$\frac{}{\langle [\![\mathtt{false}]\!],m\rangle \Longrightarrow 0}$$
$$\frac{m(i) = (\mathsf{NUM}, x, \_)} {\langle[\![i]\!],m\rangle \Longrightarrow x}$$
$$\frac{\langle e,m\rangle \Longrightarrow x} {\langle[\![\mathsf{-}\;e]\!],m\rangle \Longrightarrow -x}$$
$$\frac{\langle e,m\rangle \Longrightarrow x\;\;\; x \neq 0} {\langle[\![\mathsf{!}\;e]\!],m\rangle \Longrightarrow 0}$$
$$\frac{\langle e,m \rangle \Longrightarrow 0} {\langle[\![\mathsf{!}\;e]\!],m\rangle \Longrightarrow 1}$$
$$\frac{\begin{gathered} op \in \{ \mathsf{+}, \mathsf{-}, \mathsf{*}, \mathsf{/}, \mathsf{\%}, \mathtt{**}\} \;\;\; op \neq \mathtt{/} \vee y \neq 0 \\ \langle e_1,m\rangle \Longrightarrow x \;\;\; \langle e_2,m\rangle \Longrightarrow y \end{gathered}} {\langle[\![e_1\;op\;e_2]\!],m\rangle \Longrightarrow op(x,y)}$$
$$\frac{\begin{gathered} op \in \{\mathtt{<}, \mathtt{<=}, \mathtt{==}, \mathtt{!=}, \mathtt{>=}, \mathtt{>} \} \\ \langle e_1,m\rangle \Longrightarrow x \;\;\; \langle e_2,m\rangle \Longrightarrow y \;\;\; op(x,y) \end{gathered}} {\langle[\![e_1\;op\;e_2]\!],m\rangle \Longrightarrow 1}$$
$$\frac{\begin{gathered} op \in \{\mathtt{<}, \mathtt{<=}, \mathtt{==}, \mathtt{!=}, \mathtt{>=}, \mathtt{>} \} \\ \langle e_1,m\rangle \Longrightarrow x \;\;\; \langle e_2,m\rangle \Longrightarrow y \;\;\; \neg op(x,y) \end{gathered}} {\langle[\![e_1\;op\;e_2]\!],m\rangle \Longrightarrow 0}$$
$$\frac{\langle e_1,m\rangle \Longrightarrow 0} {\langle[\![e_1\;\mathtt{\&\&}\;e_2]\!],m\rangle \Longrightarrow 0}$$
$$\frac{\langle e_1,m\rangle \Longrightarrow x \;\;\;\; x \neq 0 \;\;\;\; \langle e_2,m\rangle \Longrightarrow y} {\langle[\![e_1\;\mathtt{\&\&}\;e_2]\!],m\rangle \Longrightarrow y}$$
$$\frac{\langle e_1,m\rangle \Longrightarrow x \;\;\;\; x \neq 0} {\langle[\![e_1\;\mathtt{|\,|}\;e_2]\!],m\rangle \Longrightarrow x}$$
$$\frac{\langle e_1,m\rangle \Longrightarrow 0 \;\;\;\; \langle e_2,m\rangle \Longrightarrow y} {\langle[\![e_1\;\mathtt{|\,|}\;e_2]\!],m\rangle \Longrightarrow y}$$
$$\frac{\langle e,m\rangle \Longrightarrow x \;\;\; x \neq 0 \;\;\; \langle e_1,m\rangle \Longrightarrow y} {\langle[\![e \; \mathtt{?} \; e_1 \; \mathtt{:} \; e_2]\!],m\rangle \Longrightarrow y}$$
$$\frac{\langle e,m \rangle \Longrightarrow 0 \;\;\; \langle e_2,m\rangle \Longrightarrow z} {\langle[\![e \; \mathtt{?} \; e_1 \; \mathtt{:} \; e_2]\!],m\rangle \Longrightarrow z}$$
$$\frac{(\langle e_i,m\rangle \Longrightarrow a_i)_{i=1}^n \;\;\; m(i) = (\mathsf{FUNC},(p_1,\ldots, p_n),e') \;\;\; \langle e', m[a_i/p_i]_{i=1}^n \rangle \Longrightarrow x} {\langle [\![i\;e_1,\ldots,e_n]\!],m\rangle \Longrightarrow x}$$
$$\frac{(\langle e_i,m\rangle \Longrightarrow a_i)_{i=1}^n \;\;\; m(i) = (\mathsf{FUNC}, f, n)} {\langle [\![i\;e_1,\ldots,e_n]\!],m\rangle \Longrightarrow f(a_1,\ldots, a_n)}$$
$$\frac{\langle e,m\rangle \Longrightarrow x\;\;\;\; m(i) = \mathsf{UNDEF}} {\langle [\![\mathtt{let}\;i=e]\!],m,o\rangle \Longrightarrow (m[(\mathsf{NUM},x,\mathsf{RW})\,/\,i], o)}$$
$$\frac{m(i) = \mathsf{UNDEF}} {\langle [\![\mathtt{fun}\;i\;(p_1,\ldots,p_n)=e]\!],m,o\rangle \Longrightarrow (m[(\mathsf{FUNC},(p_1,\ldots,p_n),e)\,/i], o)}$$
$$\frac{\langle e,m\rangle \Longrightarrow x \;\;\; m(i) = (\mathsf{NUM},\_,\mathsf{RW})} {\langle [\![i=e]\!],m,o\rangle \Longrightarrow (m[(\mathsf{NUM},x,\mathsf{RW})\,/\,i], o)}$$
$$\frac{\langle e,m\rangle \Longrightarrow x} {\langle [\![\mathtt{print}\;e]\!],m,o\rangle \Longrightarrow (m, o \cdot x)}$$
$$\frac{\langle e,m \rangle \Longrightarrow 0} {\langle [\![\mathtt{while}\;e\;b]\!],m,o\rangle \Longrightarrow (m, o)}$$
$$\frac{\begin{gathered} \langle e,m\rangle \Longrightarrow x \;\;\; x \neq 0 \;\;\; \langle b,m,o \rangle \Longrightarrow (m',o') \\ \langle [\![\mathtt{while}\;e\;b]\!],m',o' \rangle \Longrightarrow (m'',o'')\end{gathered}} {\langle [\![\mathtt{while}\;e\;b]\!],m,o \rangle \Longrightarrow (m'',o'')}$$
$$\frac{(\langle s_i, m_i, o_i \rangle \Longrightarrow (m_{i+1},o_{i+1}))_{i=1}^n} {\langle [\![\mathtt{block}\;s_1,\ldots,s_n]\!],m_1,o_1 \rangle \Longrightarrow o_{n+1}}$$
$$\frac{\langle b, m_0, [] \rangle \Longrightarrow (m,o)} {[\![\mathtt{program}\;b]\!] \Longrightarrow o}$$

where $o_0$, the initial output, is defined to be the empty sequence, and the initial memory $m_0$ is our “standard library”:

$ \begin{array}{l} m_0 = \lambda\,i. \mathsf{UNDEF} \:[\\ \;\;\;\; (\mathsf{NUM}, \pi, \mathsf{true})\,/\,\mathtt{π}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda x.\sqrt{x}, 1)\,/\,\mathtt{sqrt}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda x.\sin{x}, 1)\,/\,\mathtt{sin}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda x.\cos{x}, 1)\,/\,\mathtt{cos}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda x.e^x, 1)\,/\,\mathtt{exp}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda x.\ln{x}, 1)\,/\,\mathtt{ln}] \:[\\ \;\;\;\; (\mathsf{FUNC}, \lambda (x,y).\sqrt{x^2+y^2}, 2)\,/\,\mathtt{hypot}]\\ \end{array} $