Astro is a fairly trivial programming language with interesting features that make it a great fit for introducing (1) compiler and interpreter writing, and (2) formal language semantics.
This document defines the language Astro.
A programis a sequence of one or more statements. There are only two kinds of statements, assignments and print statements. Comments begin with //
and extend to the end of the line.
// A simple program in Astro
radius = 55.2 * (-cos(2.8E-20) + 89) % 21; // assignment statement
the_area = π * radius ** 2; // another assignment
print(hypot(2.28, 3 - radius) / the_area); // print statement
Apologies for the semicolons, but they do make the language somewhat easier to parse.
All values in Astro are instances of a type. The language has the following types:
number
of IEEE-754 binary64 values.
function<
$n$>
for all $n \geq 0$, representing functions from $n$ numeric inputs to a single numeric output.Numbers are first-class values, meaning they can be stored in variables, passed to functions, and returned from functions. Functions cannot: the only thing one can do with a function is call it.
Numbers are written as in JavaScript:
2
2.0
55.9
819.999e-15
2E+10
5.89999e2
An assignment binds an entity to an identifier. Any identifier other than the built-in ones must be assigned to before it can subsequently be used.
sister = 5 + 1; // variable declaration of sister (Ⅴ + Ⅰ = Ⅵ)
print(sister); // okay, sister has been assigned
// print(cousin); // ERROR: cousin has not been assigned
print(π); // Turns out to be okay because π is built-in
As Astro is a trivial language, you cannot define your own functions. You are limited to the built-in functions sqrt
, sin
, cos
, and hypot
. If a function is bound to an identifier, you cannot rebind the identifier:
print(sqrt(100)); // ok, sqrt are built-in
// sqrt = 5; // ERROR: function already bound to sqrt
Functions can only be called. They cannot be used in a context where a number is expected:
// print(sin) // ERROR
// t = sin; // ERROR
// strange = sin * 3 // ERROR
Functions declared with $n$ parameters must be passed exactly $n$ arguments when called.
A variable is something that stores a value. Variables come into existence when assigned to for the first time. There is, however, one pre-declared variable, $\pi$. Variables can only be used if previously assigned or if built-in.
The variable π
is read-only and all other variables are writable.
Variables can store numeric values only, not functions.
A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:
=
$e$ ;
(Assignment statement) $e$ is evaluated, then the value of $e$ is copied into $i$. $i$ must not already have a function bound to it.
;
(Print statement) Evaluate $e$ then writes its evaluation to standard output.
An expression produces a numeric. The Astro expressions are:
-
$e$
Evaluates $e$ and produces the negation of $e$.
**
$e_2$
The subexpressions are evaluated in any order and ${e_1}^{e_2}$ is produced.
*
$e_2$
The subexpressions are evaluated in any order and their product is produced.
/
$e_2$
The subexpressions are evaluated in any order and their quotient is produced.
%
$e_2$
The subexpressions are evaluated in any order and the remainder of $e_1$ divided by $e_2$ is produced.
+
$e_2$
The subexpressions are evaluated in any order and their sum is produced.
-
$e_2$
The subexpressions are evaluated in any order and their difference is produced.
(Function call) Evaluates each $e_i$ then calls the function bound to $f$ with these evaluated arguments, in order, and produces the returned value of the function. $f$ must have a function of $n$ parameters bound to it.
The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program.
π
Read-only variable whose value is the best approximate value of $\pi$.
function sqrt(x)
Returns the square root of $x$.
function sin(x)
Returns the sine of $x$ radians.
function cos(x)
Returns the cosine of $x$ radians.
function hypot(x, y)
Returns the hypotenuse of a right triangle with sides $|x|$ and $|y|$.
The source of a Astro program is a Unicode string. Here is the syntax given as an Ohm grammar:
Astro {
Program = Statement+
Statement = id "=" Exp ";" --assignment
| print Exp ";" --print
Exp = Exp ("+" | "-") Term --binary
| Term
Term = Term ("*" | "/" | "%") Factor --binary
| Factor
Factor = Primary "**" Factor --binary
| "-" Primary --negation
| Primary
Primary = id "(" ListOf<Exp, ","> ")" --call
| numeral --num
| id --id
| "(" Exp ")" --parens
numeral = digit+ ("." digit+)? (("E" | "e") ("+" | "-")? digit+)?
print = "print" ~idchar
idchar = letter | digit | "_"
id = ~print letter idchar*
space += "//" (~"\n" any)* --comment
}
The meaning of an Astro program is defined in this section via transition rules in the style of Natural Semantics. It is defined from the following abstract syntax:
The meaning of a Astro program at runtime is the list of values it prints. To formally specify this behavior, we also have to define the meanings of statements and expressions. We do this with the help of a memory, which maps identifiers to their runtime values, and the output, which is the list of values output so far. The type $\textsf{Value}$ is defined as: $$ \frac{}{\mathsf{Undef}\!: \mathsf{Value}} \quad \frac{x\!:\mathsf{Real}\;\;\;b\!:\mathsf{Bool}}{\mathsf{Num}\;x\;b\!:\mathsf{Value}} \quad \frac{f\!:\mathsf{Real^* \rightarrow Real}\;\;\;n\!:\mathsf{Nat}}{\mathsf{Fun}\;f\;n\!:\mathsf{Value}} $$ allowing identifiers to be (1) bound to a variable with a mutability flag, (2) bound to a function together with its parameter count (so that the number of arguments can be checked at call time), or (3) not yet defined. The predefined types $\mathsf{Bool}$, $\mathsf{Nat}$, and $\mathsf{Real}$ refer to booleans, natural numbers, and IEEE-754 binary64 values, respectively. Each statement is executed in the context of a state, which is the current memory together with the output so far. Expressions need only be evaluated in the context of the current memory, as they do not read nor modify the output. The semantic rules are:
where $o_0$, the initial output, is defined to be the empty sequence, and the initial memory $m_0$ is our “standard library” defined as follows: