The Language Astro

Astro is the first of five languages designed for a compiler course.

1  Introduction

Astro is a fairly trivial programming language with interesting features that make it a great fit for introducing (1) compiler and interpreter writing, and (2) formal language semantics.

This document defines the language Astro.

2  Language Description

2.1  Programs

A programis a sequence of one or more statements. There are only two kinds of statements, assignments and print statements. Comments begin with // and extend to the end of the line.

// A simple program in Astro

radius = 55.2 * (-cos(2.8E-20) + 89) % 21;    // assignment statement
the_area = π * radius ** 2;                   // another assignment
print(hypot(2.28, 3 - radius) / the_area);    // print statement

Apologies for the semicolons, but they do make the language somewhat easier to parse.

2.2  Values and Types

All values in Astro are instances of a type. The language has the following types:

Numbers are first-class values, meaning they can be stored in variables, passed to functions, and returned from functions. Functions cannot: the only thing one can do with a function is call it.

Numbers are written as in JavaScript:

2
2.0
55.9
819.999e-15
2E+10
5.89999e2

2.3  Assignments

An assignment binds an entity to an identifier. Any identifier other than the built-in ones must be assigned to before it can subsequently be used.

sister = 5 + 1;              // variable declaration of sister (Ⅴ + Ⅰ = Ⅵ)
print(sister);               // okay, sister has been assigned
// print(cousin);            // ERROR: cousin has not been assigned
print(π);                    // Turns out to be okay because π is built-in

2.4  Functions

As Astro is a trivial language, you cannot define your own functions. You are limited to the built-in functions sqrt, sin, cos, and hypot. If a function is bound to an identifier, you cannot rebind the identifier:

print(sqrt(100));            // ok, sqrt are built-in
// sqrt = 5;                 // ERROR: function already bound to sqrt

Functions can only be called. They cannot be used in a context where a number is expected:

// print(sin)                // ERROR
// t = sin;                  // ERROR
// strange = sin * 3         // ERROR

Functions declared with $n$ parameters must be passed exactly $n$ arguments when called.

2.5  Variables

A variable is something that stores a value. Variables come into existence when assigned to for the first time. There is, however, one pre-declared variable, $\pi$. Variables can only be used if previously assigned or if built-in.

The variable π is read-only and all other variables are writable.

Variables can store numeric values only, not functions.

2.6  Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

2.7  Expressions

An expression produces a numeric. The Astro expressions are:

3  Standard Library

The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program.

4  Formal Syntax

The source of a Astro program is a Unicode string. Here is the syntax given as an Ohm grammar:

astro.ohm
Astro {
  Program     = Statement+
  Statement   = id "=" Exp ";"                         --assignment
              | print Exp ";"                          --print
  Exp         = Exp ("+" | "-") Term                   --binary
              | Term
  Term        = Term ("*" | "/" | "%") Factor          --binary
              | Factor
  Factor      = Primary "**" Factor                    --binary
              | "-" Primary                            --negation
              | Primary
  Primary     = id "(" ListOf<Exp, ","> ")"            --call
              | numeral                                --num
              | id                                     --id
              | "(" Exp ")"                            --parens

  numeral     = digit+ ("." digit+)? (("E" | "e") ("+" | "-")? digit+)?
  print       = "print" ~idchar
  idchar      = letter | digit | "_"
  id          = ~print letter idchar*
  space      += "//" (~"\n" any)*                      --comment
}

5  Formal Semantics

The meaning of an Astro program is defined in this section via transition rules in the style of Natural Semantics. It is defined from the following abstract syntax:

$ \begin{array}{lcl} n: \mathsf{Nml} & & \\ i: \mathsf{Ide} & & \\ e: \mathsf{Exp} & = & n \mid i \mid -e \mid e+e \mid e-e \mid e\,\mathtt{*}\,e \mid e\,/\,e \mid e\,\%\,e \mid e\,\mathtt{**}\,e \mid i\;e^*\\ s: \mathsf{Stm} & = & i = e \mid \mathtt{print}\;e\\ p: \mathsf{Pro} & = & s^+\\ \end{array}$

The meaning of a Astro program at runtime is the list of values it prints. To formally specify this behavior, we also have to define the meanings of statements and expressions. We do this with the help of a memory, which maps identifiers to their runtime values, and the output, which is the list of values output so far. Each identifier in the memory is mapped to either (1) a variable, (2) a function, or (3) $\bot$, which is how we indicate that a variable has not yet been defined. Variables have an mutability status (RO or RW), and functions store the number of defined parameters, so the number of arguments can be checked at call time. Each statement is executed in the context of a state, which is the current memory together with the output so far. Expressions need only be evaluated in the context of the current memory, as they do not read nor modify the output.

The following types are used in the definition:

$\begin{array}{l} \mathsf{Real}\;=\;\textrm{the type of IEEE-754 binary64 values} \\ \mathsf{Value}\;= (\{\mathsf{NUM}\} \times \mathsf{Real} \times \{\mathsf{RO},\mathsf{RW}\}) \;\cup\; (\{\mathsf{FUN}\} \times (\mathsf{Real}^* \rightarrow \mathsf{Real}) \times \mathbb{N}) \\ \mathsf{Mem}\;=\;\mathsf{Ide} \rightharpoonup \mathsf{Value} \\ \mathsf{Output}\;=\;\mathsf{Real}^* \\ \mathsf{State}\;=\;\mathsf{Mem \times Output} \end{array}$

The semantic relations are:

$\begin{array}{l} \Longrightarrow_E\;\subseteq (\mathsf{Exp} \times \mathsf{Mem}) \times \mathsf{Real} \\ \Longrightarrow_S\;\subseteq (\mathsf{Stm} \times \mathsf{Mem} \times \mathsf{Output}) \times (\mathsf{Mem} \times \mathsf{Output}) \\ \Longrightarrow_P\;\subseteq \mathsf{Pro} \times \mathsf{Output} \end{array}$

The semantic rules are:

$$\frac{} {[\![n]\!],m \Longrightarrow n}$$
$$\frac{m(i) = (\mathsf{NUM}, x, \_)} {[\![i]\!],m \Longrightarrow x}$$
$$\frac{ e,m \Longrightarrow x} {[\![\mathsf{-}\;e]\!],m \Longrightarrow -x}$$
$$\frac{\begin{gathered} op \in \{ \mathsf{+}, \mathsf{-}, \mathsf{*}, \mathsf{/}, \mathsf{\%}, \mathtt{**}\} \\ e_1,m \Longrightarrow x \;\;\; e_2,m \Longrightarrow y \end{gathered}} {[\![e_1\;op\;e_2]\!],m \Longrightarrow op(x,y)}$$
$$\frac{( e_i,m \Longrightarrow a_i)_{i=1}^n \;\;\; m(i) = (\mathsf{FUN}, f, n)} {[\![i\;e_1,\ldots,e_n]\!],m \Longrightarrow f(a_1,\ldots,a_n)}$$
$$\frac{ e,m \Longrightarrow x \;\;\; m(i) = \bot \vee m(i) = (\mathsf{NUM},\_,\mathsf{RW})} {[\![i=e]\!],m,o \Longrightarrow (m[(\mathsf{NUM},x,\mathsf{RW})\,/\,i], o)}$$
$$\frac{e, m \Longrightarrow x} {[\![\mathtt{print}\;e]\!],m,o \Longrightarrow (m, o \cdot x)}$$
$$\frac{( s_i, m_{i-1}, o_{i-1} \Longrightarrow (m_i,o_i))_{i=1}^n} {[\![\mathtt{program}\;s_1,\ldots,s_n]\!] \Longrightarrow o_n}$$

where $o_0$, the initial output, is defined to be the empty sequence, and the initial memory $m_0$ is our “standard library,” in which all numeric values are understood to have type $\textsf{Real}$:

$\begin{array}{l} m_0 = \{ \\ \;\;\;\; \mathtt{π}\!: (\mathsf{NUM}, \pi, \mathsf{RO}), \\ \;\;\;\; \mathtt{sqrt}\!: (\mathsf{FUN}, \lambda x.\sqrt{x}, 1), \\ \;\;\;\; \mathtt{sin}\!: (\mathsf{FUN}, \lambda x.\sin{x}, 1), \\ \;\;\;\; \mathtt{cos}\!: (\mathsf{FUN}, \lambda x.\cos{x}, 1), \\ \;\;\;\; \mathtt{hypot}\!: (\mathsf{FUN}, \lambda (x,y).\sqrt{x^2+y^2}, 2) \\ \} \end{array}$