Bella is a simple programming language with interesting features that make it a great fit for learning (1) compiler and interpreter writing, and (2) formal language semantics.
The language is limited to numbers, assignments, conditionals, loops, and simple functions. Variables can only hold numbers, not functions. Function “bodies” are limited to simple expressions (like Python lambdas), so we don’t have to deal with any scoping complications like temporal dead zones. Scoping is as in Python: local variables and parameters are function-scoped, and shadowing is allowed. Bella is a friendly, simple language, relatively easy to implement.
This document defines the language Bella.
A programis a sequence of one or more statements.
let dozen = 12; // variable declaration print dozen % 3 ** 1; // print statement function gcd(x, y) = y == 0 ? x : gcd(y, x % y); // function declaration while dozen >= 3 || (gcd(1, 10) != 5) { // while statement dozen = dozen - 2.75E+19 ** 1 ** 3; // assignment statement }
Apologies for the semicolons, but they do make the language somewhat easier to parse.
All values in Bella are instances of a type. The language has the following types:
number
of IEEE-754 binary64 values.
function<
$n$>
, where $n \geq 0$, representing functions from $n$ numeric inputs to a single numeric output.Numbers are first-class values, meaning they can be stored in variables, passed to functions, and returned from functions. Functions cannot be: the only thing one can do with a function is call it.
Numbers are written as in JavaScript:
2 2.0 55.9 819.999e-15 2E+10 5.89999e2
A declaration binds an identifier to an entity. There are three kinds of declarations:
Here is a short example showing all three kinds:
let sister = 5 + 1; // variable declaration of sister (Ⅴ + Ⅰ = Ⅵ) function triple(x) = x * 3; // function declaration of triple // ...and parameter declaration of x
Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. In Bella, shadowing of top-level variables by parameters of local functions is permitted, so the visible region of a declaration is the scope minus any "inner" scopes of declarations of identifiers with the same name and may therefore be discontinuous.
The identifiers appearing as defining occurrences of variable and function declarations must be mutually exclusive. A function’s parameters must be mutually exclusive as well:
let x = 3; // let x = 5; // ERROR: x already declared // function x() = 0; // ERROR: x already declared // function g(a, a) = 0; // ERROR: a already declared function h(x) = 0; // OK! parameter x shadows global x // let h = 3; // ERROR: h already declared (prev line) function j(k) = 100 - x; // OK let k = 2; // OK! Parameter k on prev line not in scope
As a simple language, Bella does not have any anonymous functions; all functions must be named. The identifier used in the function declaration is read-only variable: you cannot reassign it:
function successor(n) = n + 1; // let successor = 5; // ERROR: read only
Functions can only be called. They cannot be used in a context where a number is expected:
function triple (x) = x * 3; print triple(8); // Function call of triple, perfectly fine // print triple // ERROR // let t = triple; // ERROR // let times_nine = triple * 3 // ERROR
Functions declared with $n$ parameters must be passed exactly $n$ arguments when called.
A variable is something that stores a value. Variables come into existence either (1) as regular variables declared in a variable declaration statement (using let
), or (2) as parameters in a parameter declaration. There is one pre-declared variable, $\pi$. Variables can only be used if previously declared.
The variable π
is read-only and all other variables are writable. Interestingly, because parameters are scoped only to the function body and function bodies are simple expressions, all parameters are effectively read-only.
Variables can store numeric values only, not functions.
A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:
let
$i$ =
$e$ ;
(Variable declaration statement) Declares a new variable with name $i$. This declaration must be unique within its scope. Evaluates $e$ and assigns this value to the new variable. The new variable is writable.
function
$f$ (
$x_1, \ldots, x_n$ ) =
$e$ ;
(Function declaration)
Defines a new function named $f$ with $n$ parameters, that is, its type is function<
$n$>
. The identifier $f$ must be unique within its scope. The parameters must be unique within their scope.
=
$e$ ;
(Assignment statement) $e$ is evaluated, then the value of $e$ is copied into $i$. $i$ must have been previously declared and visible in this scope.
print
$e$ ;
(Print statement) Evaluates $e$ then prints its value to standard output.
while
$e$ $b$
(While statement) First, $e$ is evaluated. If $e$ produces 0, the execution of the while statement terminates. Otherwise, body $b$ is executed then the entire while statement is executed again.
An expression produces a numeric. The Bella expressions are:
true
Produces 1.
false
Produces 0.
-
$e$
Evaluates $e$ and produces the negation of $e$.
!
$e$
If $e$ evaluates to 0, produces 1, otherwise produces 0.
*
$e_2$
The subexpressions are evaluated in any order and their product is produced.
/
$e_2$
The subexpressions are evaluated in any order and their quotient is produced.
%
$e_2$
The subexpressions are evaluated in any order and the remainder of $e_1$ divided by $e_2$ is produced.
+
$e_2$
The subexpressions are evaluated in any order and their sum is produced.
-
$e_2$
The subexpressions are evaluated in any order and their difference is produced.
?
$e_1$ :
$e_2$
Evaluates $e$ and if non-zero, evaluates and produces $e_1$. Otherwise evaluates and produces $e_2$.
<
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is less than the value of $e_2$.
<=
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is less or equal to the value of $e_2$.
==
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is equal to the value of $e_2$.
!=
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is not equal to the value of $e_2$.
>=
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is greater or equal to the value of $e_2$.
>
$e_2$
Evaluates the subexpressions in any order, then produces 1 or 0, respectively, as to whether the value of $e_1$ is greater than the value of $e_2$.
&&
$e_2$
First $e_1$ is evaluated. If it evaluates to 0, the entire expression immediately produces 0 (without evaluating $e_2$). Otherwise $e_2$ is evaluated and the entire expression produces the value of $e_2$.
||
$e_2$
First $e_1$ is evaluated. If it evaluates to a non-zero value, the entire expression immediately produces this value (without evaluating $e_2$). Otherwise $e_2$ is evaluated and the entire expression produces the value of $e_2$.
The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program.
π
Read-only variable whose value is the best approximate value of $\pi$.
function sqrt(x)
Returns the square root of $x$.
function sin(x)
Returns the sine of $x$ radians.
function cos(x)
Returns the cosine of $x$ radians.
function exp(x)
Returns $e^x$.
function ln(x)
Returns the natural log of $x$.
function hypot(x, y)
Returns the hypotenuse of a right triangle with sides $|x|$ and $|y|$.
The source of a Bella program is a Unicode string. Here is the syntax given as an Ohm grammar:
Bella { Program = Statement+ Statement = let id "=" Exp ";" -- vardec | function id Params "=" Exp ";" -- fundec | Exp7_id "=" Exp ";" -- assign | print Exp ";" -- print | while Exp Block -- while Params = "(" ListOf<id, ","> ")" Block = "{" Statement* "}" Exp = ("-" | "!") Exp7 -- unary | Exp1 "?" Exp1 ":" Exp -- ternary | Exp1 Exp1 = Exp1 "||" Exp2 -- binary | Exp2 Exp2 = Exp2 "&&" Exp3 -- binary | Exp3 Exp3 = Exp4 ("<="|"<"|"=="|"!="|">="|">") Exp4 -- binary | Exp4 Exp4 = Exp4 ("+" | "-") Exp5 -- binary | Exp5 Exp5 = Exp5 ("*" | "/" | "%") Exp6 -- binary | Exp6 Exp6 = Exp7 "**" Exp6 -- binary | Exp7 Exp7 = num | true | false | id "(" ListOf<Exp, ","> ")" -- call | id -- id | "(" Exp ")" -- parens let = "let" ~idchar function = "function" ~idchar while = "while" ~idchar true = "true" ~idchar false = "false" ~idchar print = "print" ~idchar keyword = let | function | while | true | false num = digit+ ("." digit+)? (("E" | "e") ("+" | "-")? digit+)? id = ~keyword letter idchar* idchar = letter | digit | "_" space += "//" (~"\n" any)* -- comment }
The meaning of a Bella program is defined in this section via transition rules in the style of Natural Semantics. It is defined from the following abstract syntax:
The unary operators are
-
and
!
.
The binary operators are
+
,
-
,
*
,
/
,
%
,
**
,
<
,
<=
,
==
,
!=
,
>=
,
>
,
&&
,
and ||
.
The meaning of a Bella program at runtime is the list of values it prints. To formally specify this behavior, we also have to define the meanings of statements and expressions. We do this with the help of a memory, which maps identifiers to their runtime values, and the output, which is the list of values output so far. Each identifier in the memory is mapped to either (1) a regular variables, (2) a user-defined function, (3) a built-in, or “standard” function, or (4) $\bot$, which is how we indicate that a variable has not yet been declared or defined. Variables have an mutability status (RO or RW), and functions store the number of defined parameters, so the number of arguments can be checked at call time. Each statement is executed in the context of a state, which is the current memory together with the output so far. Expressions need only be evaluated in the context of the current memory, as they do not read nor modify the output.
The following types are used in the definition:
The semantic relations are:
The semantic rules are:
where $o_0$, the initial output, is defined to be the empty sequence, and the initial memory $m_0$ is our “standard library”: