The Language Astro

Astro is the first of five languages designed for a compiler course.

1 Introduction

Astro is a fairly trivial programming language with interesting features that make it a great fit for introducing (1) compiler and interpreter writing, and (2) formal language semantics.

This document defines the language Astro.

2 Language Description

2.1 Programs

A programis a sequence of one or more statements. There are only two kinds of statements, assignments and print statements. Comments begin with // and extend to the end of the line.

// A simple program in Astro

radius = 55.2 * (-cos(2.8E-20) + 89) % 21;    // assignment statement
the_area = π * radius ** 2;                   // another assignment
print(hypot(2.28, 3 - radius) / the_area);    // print statement

Apologies for the semicolons, but they do make the language somewhat easier to parse.

2.2 Values and Types

All values in Astro are instances of a type. The language has the following types:

The type number of IEEE-754 binary64 values.
The types function<$n$> for all $n \geq 0$, representing functions from $n$ numeric inputs to a single numeric output.

Numbers are first-class values, meaning they can be stored in variables, passed to functions, and returned from functions. Functions cannot: the only thing one can do with a function is call it.

Numbers are written as in JavaScript:

2
2.0
55.9
819.999e-15
2E+10
5.89999e2

2.3 Assignments

An assignment binds an entity to an identifier. Any identifier other than the built-in ones must be assigned to before it can subsequently be used.

sister = 5 + 1;              // variable declaration of sister (Ⅴ + Ⅰ = Ⅵ)
print(sister);               // okay, sister has been assigned
// print(cousin);            // ERROR: cousin has not been assigned
print(π);                    // Turns out to be okay because π is built-in

2.4 Functions

As Astro is a trivial language, you cannot define your own functions. You are limited to the built-in functions sqrt, sin, cos, and hypot. If a function is bound to an identifier, you cannot rebind the identifier:

print(sqrt(100));            // ok, sqrt are built-in
// sqrt = 5;                 // ERROR: function already bound to sqrt

Functions can only be called. They cannot be used in a context where a number is expected:

// print(sin)                // ERROR
// t = sin;                  // ERROR
// strange = sin * 3         // ERROR

Functions declared with $n$ parameters must be passed exactly $n$ arguments when called.

2.5 Variables

A variable is something that stores a value. Variables come into existence when assigned to for the first time. There is, however, one pre-declared variable, $\pi$. Variables can only be used if previously assigned or if built-in.

The variable π is read-only and all other variables are writable.

Variables can store numeric values only, not functions.

2.6 Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

$i$ = $e$ ;
(Assignment statement) $e$ is evaluated, then the value of $e$ is copied into $i$. $i$ must not already have a function bound to it.
$\mathtt{print}\,e$ ;
(Print statement) Evaluate $e$ then writes its evaluation to standard output.

2.7 Expressions

An expression produces a numeric. The Astro expressions are:

A numeric literal.
A variable occurrence, which must have been previously defined.
- $e$
Evaluates $e$ and produces the negation of $e$.
$e_1$ ** $e_2$
The subexpressions are evaluated in any order and ${e_1}^{e_2}$ is produced.
$e_1$ * $e_2$
The subexpressions are evaluated in any order and their product is produced.
$e_1$ / $e_2$
The subexpressions are evaluated in any order and their quotient is produced.
$e_1$ % $e_2$
The subexpressions are evaluated in any order and the remainder of $e_1$ divided by $e_2$ is produced.
$e_1$ + $e_2$
The subexpressions are evaluated in any order and their sum is produced.
$e_1$ - $e_2$
The subexpressions are evaluated in any order and their difference is produced.
$f\,(e_1, \ldots e_n)$
(Function call) Evaluates each $e_i$ then calls the function bound to $f$ with these evaluated arguments, in order, and produces the returned value of the function. $f$ must have a function of $n$ parameters bound to it.

3 Standard Library

The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program.

π
Read-only variable whose value is the best approximate value of $\pi$.
function sqrt(x)
Returns the square root of $x$.
function sin(x)
Returns the sine of $x$ radians.
function cos(x)
Returns the cosine of $x$ radians.
function hypot(x, y)
Returns the hypotenuse of a right triangle with sides $|x|$ and $|y|$.

4 Formal Syntax

The source of a Astro program is a Unicode string. Here is the syntax given as an Ohm grammar:

astro.ohm

Astro {
  Program     = Statement+
  Statement   = id "=" Exp ";"                         --assignment
              | print Exp ";"                          --print
  Exp         = Exp ("+" | "-") Term                   --binary
              | Term
  Term        = Term ("*" | "/" | "%") Factor          --binary
              | Factor
  Factor      = Primary "**" Factor                    --binary
              | "-" Primary                            --negation
              | Primary
  Primary     = id "(" ListOf<Exp, ","> ")"            --call
              | numeral                                --num
              | id                                     --id
              | "(" Exp ")"                            --parens

  numeral     = digit+ ("." digit+)? (("E" | "e") ("+" | "-")? digit+)?
  print       = "print" ~idchar
  idchar      = letter | digit | "_"
  id          = ~print letter idchar*
  space      += "//" (~"\n" any)*                      --comment
}

5 Formal Semantics

The meaning of an Astro program is defined in this section via transition rules in the style of Natural Semantics. It is defined from the following abstract syntax:

$ \begin{array}{lcl} n: \mathsf{Nml} & & \\ i: \mathsf{Ide} & & \\ e: \mathsf{Exp} & = & n \mid i \mid -e \mid e+e \mid e-e \mid e\,\mathtt{*}\,e \mid e\,/\,e \mid e\,\%\,e \mid e\,\mathtt{**}\,e \mid \mathtt{call}\;i\;e^*\\ s: \mathsf{Stm} & = & i = e \mid \mathtt{print}\;e\\ p: \mathsf{Pro} & = & \mathtt{program}\;s^+\\ \end{array}$

The meaning of a Astro program at runtime is the list of values it prints. To formally specify this behavior, we also have to define the meanings of statements and expressions. We do this with the help of a memory, which maps identifiers to their runtime values, and the output, which is the list of values output so far. The type $\textsf{Value}$ is defined as: $$ \frac{}{\mathsf{Undef}\!: \mathsf{Value}} \quad \frac{x\!:\mathsf{Real}\;\;\;b\!:\mathsf{Bool}}{\mathsf{Num}\;x\;b\!:\mathsf{Value}} \quad \frac{f\!:\mathsf{Real^* \rightarrow Real}\;\;\;n\!:\mathsf{Nat}}{\mathsf{Fun}\;f\;n\!:\mathsf{Value}} $$ allowing identifiers to be (1) bound to a variable with a mutability flag, (2) bound to a function together with its parameter count (so that the number of arguments can be checked at call time), or (3) not yet defined. The predefined types $\mathsf{Bool}$, $\mathsf{Nat}$, and $\mathsf{Real}$ refer to booleans, natural numbers, and IEEE-754 binary64 values, respectively. Each statement is executed in the context of a state, which is the current memory together with the output so far. Expressions need only be evaluated in the context of the current memory, as they do not read nor modify the output. The semantic rules are:

$$\frac{} {m \vdash [\![n]\!] \Downarrow n}$$

$$\frac{m(i) = \mathsf{Num}\;x\;b} {m \vdash [\![i]\!] \Downarrow x}$$

$$\frac{m \vdash e \Downarrow x} {m \vdash [\![\mathsf{-}\;e]\!] \Downarrow -x}$$

$$\frac{\begin{gathered} op \in \{ \mathsf{+}, \mathsf{-}, \mathsf{*}, \mathsf{/}, \mathsf{\%}, \mathtt{**}\} \\ m \vdash e_1 \Downarrow x \;\;\; m \vdash e_2 \Downarrow y \end{gathered}} {m \vdash [\![e_1\;op\;e_2]\!] \Downarrow op(x,y)}$$

$$\frac{( m \vdash e_i \Downarrow a_i)_{i=1}^n \;\;\; m(i) = \mathsf{Fun}\;f\;n} {m \vdash [\![\texttt{call}\;i\;e_1,\ldots,e_n]\!] \Downarrow f(a_1,\ldots,a_n)}$$

$$\frac{m \vdash e \Downarrow x \;\;\; m(i) = \mathsf{Undef} \vee m(i) = \mathsf{Num}\;y\;\mathsf{true}} {(m,o) \vdash [\![i=e]\!] \Downarrow (m[i \mapsto \mathsf{Num}\;x\;\mathsf{true}], o)}$$

$$\frac{m \vdash e \Downarrow x} {(m,o) \vdash [\![\mathtt{print}\;e]\!] \Downarrow (m, o \cdot x)}$$

$$\frac{((m_{i-1}, o_{i-1}) \vdash s_i \Downarrow (m_i,o_i))_{i=1}^n} {\vdash [\![\mathtt{program}\;s_1,\ldots,s_n]\!] \Downarrow o_n}$$

where $o_0$, the initial output, is defined to be the empty sequence, and the initial memory $m_0$ is our “standard library” defined as follows:

$\begin{array}{l} m_0 = (\lambda\,i.\;\mathsf{Undef}) [ \\ \quad \mathtt{π} \mapsto \mathsf{Num}\;\pi\;\mathsf{false}][ \\ \quad \mathtt{sqrt} \mapsto \mathsf{Fun}\;(\lambda x.\sqrt{x})\;1][ \\ \quad \mathtt{sin} \mapsto \mathsf{Fun}\;(\lambda x.\sin{x})\;1][ \\ \quad \mathtt{cos} \mapsto \mathsf{Fun}\;(\lambda x.\cos{x})\;1][ \\ \quad \mathtt{hypot} \mapsto \mathsf{Fun}\;(\lambda (x,y).\sqrt{x^2+y^2})\;2] \\ \end{array}$