The Language Hana

The Hana language is an extension of the Carlos language, more or less.

0 Introduction

Hana is an imperative, block structured, statically-typed programming language. It has much in common with C, such as its lack of true modularity, but differs from C in being oriented toward application-level, not system-level programming. Arrays and structures are heap-allocated, and there are no explicit pointers. Functions are allowed to be nested, overloaded, and variadic. There are built-in string, thread and stream types, and convenient operators such as # for length and $ for conversion to strings.

This document defines the language Hana.

1 Microsyntax

The source of a Hana program is a Unicode string. A lexically valid Hana program is a string that can be tokenized according to the common "longest substring" rule, with comments and whitespace allowed to separate tokens. Comments start with "//" and end with the earliest #xA or #xD. Whitespace is any string consisting entirely of the characters #x9, #xA, #xD and #x20.

The Hana tokens are (1) identifiers, (2) reserved words, (3) integer literals, (4) real literals, (5) string literals, (6) character literals, and (7) symbols. Identifiers are nonempty strings of letters (Unicode properties Lu, Ll, Lt, Lm, Lo), decimal digits (Unicode property Nd) and underscores (#x5F) beginning with a letter, except for the following reserved words:

    boolean     stream     const        if         break
                thread     volatile     else       continue
    int                                 for        return
    char        struct     null         while      start
    real        enum       true         unless     die
    string      void       false        until      new

Identifiers and reserved words are case sensitive. The symbols are:

    .       ,       ...     ;       ::
    (       )       [       ]       {       }
    =
    ==      !=      <       <=      >       >=
    ?       :       ||      &&      !
    +       -       *       /       %
    |       ^       &       ~       <<      >>
    ++      --
    #       $

Integer literals are nonempty strings of decimal digits. Real literals are described with the following regular expression:

\d+\.\d+([Ee][+-]?\d+)?

String literals are sequences of zero or more non-control characters or escape sequences, delimited by double quotes (#x22). The escape sequences are:

`\n`	newline
`\t`	tab
`\`xxxxxxxx`;`	where xxxxxxxx is a one to eight character hexadecimal digit sequence, this escape sequence stands for a character with a given codepoint.
`\"`	the double quote character
`\'`	the single quote character
`\\`	the backslash character

A character literal is a non-control character or escape sequence surrounded by single quotes (#x27).

2 Macrosyntax

A syntactically valid Hana program is one that is lexically valid and whose tokenization is derivable from the following grammar, presented in EBNF. Here the vertical bar denotes alternatives, square brackets enclose optional items, Kleene star denotes zero or more, Kleene plus denotes one or more, and parentheses are used for grouping. Symbols appear in single quotes. Reserved words are shown here in all lowercase and not quoted. The start symbol is PROGRAM.

  PROGRAM       →  STMT+
  DEC           →  TYPEDEC | VARDEC | FUNDEC
  TYPEDEC       →  struct ID '{' (TYPE ID ';')* '}'
                |  enum ID '{' ID (',' ID)* '}'
  TYPE          →  (boolean | char | int | real | string | stream | thread | ID) ('[' ']')*
  VARDEC        →  volatile? TYPE ID ('=' EXP)? ';'
                |  const TYPE ID '=' EXP ';'
  FUNDEC        →  (TYPE | void) ID '(' PARAMS ')' BLOCK
  PARAMS        →  (TYPE ID (',' TYPE ID)* (',' '...')? | '...')?
  BLOCK         →  '{' STMT* '}'
  STMT          →  DEC
                |  SIMPLESTMT ((if | unless | while | until) EXP)? ';'
                |  if '(' EXP ')' BLOCK (else if '(' EXP ')' BLOCK)* (else BLOCK)?
                |  while '(' EXP ')' BLOCK
                |  for '(' (TYPE ID '=' EXP)? ';' EXP? ';' ASSIGNMENT? ')' BLOCK
                |  for ID '(' EXP ')' BLOCK
  SIMPLESTMT    →  break
                |  continue
                |  return EXP?
                |  die EXP
                |  ASSIGNMENT
                |  CALL
  ASSIGNMENT    →  INCREMENT | VAR '=' EXP
  INCREMENT     →  INCOP VAR | VAR INCOP
  EXP           →  EXP0 ('?' EXP0 ':' EXP0)*
  EXP0          →  EXP1 ('||' EXP1)*
  EXP1          →  EXP2 ('&&' EXP2)*
  EXP2          →  EXP3 ('|' EXP3)*
  EXP3          →  EXP4 ('^' EXP4)*
  EXP4          →  EXP5 ('&' EXP5)*
  EXP5          →  EXP6 (RELOP EXP6)?
  EXP6          →  EXP7 (SHIFTOP EXP7)*
  EXP7          →  EXP8 (ADDOP EXP8)*
  EXP8          →  EXP9 (MULOP EXP9)*
  EXP9          →  PREFIXOP? EXP10
  EXP10         →  LITERAL
                |  VAR
                |  INCREMENT
                |  NEWOBJECT
                |  start CALL
                |  '(' EXP ')'
  LITERAL       →  null
                |  true
                |  false
                |  INTLIT
                |  FLOATLIT
                |  CHARLIT
                |  STRINGLIT
                |  ID '::' ID
  VAR           →  ID | CALL | VAR '[' EXP ('...' EXP)? ']' | VAR '.' ID
  NEWOBJECT     →  new TYPE ('{' ARGS '}' | ('[' EXP ']')+)
  CALL          →  ID '(' ARGS ')'
  ARGS          →  (EXP (',' EXP)*)?
  RELOP         →  '<' | '<=' | '==' | '!=' | '>=' | '>'
  SHIFTOP       →  '<<' | '>>'
  ADDOP         →  '+' | '-'
  MULOP         →  '*' | '/' | '%'
  PREFIXOP      →  '-' | '!' | '~' | '#' | '$'
  INCOP         →  '++' | '--'

3 Semantics

We describe the semantics of Hana informally but somewhat precisely.

3.1 Programs

A program is a sequence of one or more statements. Some statements, called declaration statements, declare entities; others simply execute.

// This is a complete Hana program.  When executed, it writes
// "hello, world" to standard output.

string greeting = "hello";
print(greeting + ", " + place());
string place() {return "world";}

3.2 Blocks

Blocks exist to control the scope of declarations. A block is a sequence of zero or more statements.

3.3 Declarations

A declaration binds an identifier to an entity. There are seven kinds of declarations:

type declarations,
function declarations,
variable declarations,
parameter declarations,
iterator declarations,
enumeration literal declarations, and
field declarations.

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any holes in the scope created by declarations of identifiers with the same name within (inner) blocks contained within the scope. (This means the visible region may be discontinuous).

The scope of an identifier declared in a type or function declaration is the entire innermost enclosing block containing the declaration, or the entire program if the declaration does not appear within a block. (This allows types and functions to be mutually recursive.)

Color c = Color::GREEN;           // line 1
enum Color {RED, GREEN, BLUE}     // line 2
print($c);                        // line 3
char f(string s) {                // line 4
    Point q() {return null;}      // line 5
    struct Point {int x; int y;}  // line 6
    c = Color::RED;               // line 7
}                                 // line 8
f("Can't see a point here");      // line 9

// Scope of Color and f is lines 1-9
// Scope of Point and q is lines 5-7.

The scope of an identifier declared in a variable declaration begins with the declaration itself and extends to the end of the innermost block containing the declaration, or to the end of the program if the declaration does not appear in a block.
The scope of an identifier declared in a field declaration is the same as the type in which the declaration appears.
The scope of an identifier declared as a literal of an enumeration type is the same as that of the type itself.
The scope of an identifier declared in a parameter declaration is the block (body) of the innermost function in which the parameter declaration appears.
The scope of an identifier declared as an iterator is the for-statement in which it appears.

The declared identifiers at the top-level of a block or program (types, variables, and functions) must be mutually distinct, except that multiple functions can share the same name.

Declarations within inner blocks hide declarations with the same name in an enclosing outer block.

The identifiers declared as parameters of a function must be unique among themselves and the identifiers declared at the top level of the function's body.

void test(int x, int y) {
    real z;
    string x;         // error - x is already a parameter
    if (z > 1.0) {
        int x = 5;    // this x is fine, however
    }
}

The identifiers within a particular enumeration type declaration must be unique among themselves.

enum Light {RED, AMBER, GREEN}                     // okay
enum StickyStuff {GLUE, AMBER, CEMENT, PASTE}      // okay
enum Call {WOO, HOO, HOO}                          // error!

3.4 Types

Hana features the following types:

the type boolean consisting entirely of the values true and false.
the type int of 32-bit two's-complement integers.
the type char of characters from the Universal Character Set.
the type real of IEEE-754 double precision values.
the type string of character strings.
the type stream of data streams.
the type thread of threads.
structure types, which are defined in a program. Structures have an ordered sequence of named fields.
enumeration types, which are defined in a program. The only values of an enumeration type are the ones declared for it.
array types: for every type T, there exists a type T[] whose values are zero-based integer-indexed sequences of values of type T.
the null type, whose sole value is the literal null.

The types int and real are arithmetic types. The arithmetic types, enumeration types, and the type char comprise the primitive types. Types that are not primitive types are called reference types.

3.5 Functions

A function has a name, an optional return type, a parameter list, and a body. The identifiers declared as parameters must all be unique. Functions marked void in their declarations are called "void functions" or "procedures" and have no return type; functions with "..." at the end of their parameter list are called variadic functions.

The signature of a function is a triple (t₀, [t₁, ..., t_n], b) where t₀ is either the return type of the function or the string "void"; t₁ through t_n are, in order, the types of the parameters, and b is a boolean value stating whether the function is variadic.

An expression list (e₁, ..., e_n) is said to match a signature (t₀, [t₁, ..., t_k], b) whenever it holds that

when b is false then n = k
when b is true then k <= n
each e_i is type-compatible (see Section 3.8) with t_i.

Note that the return type has no effect on the definition of matching.

3.6 Variables

A variable is something that stores a value. Hana is a statically-typed language so all variables also have a type. Variables are either writable or not writable, and are either volatile or not volatile. The kinds of variables are:

i
(Simple variable) Here i is a simple identifier with the same name as an identifier declared in a visible variable declaration, parameter declaration, or iterator declaration. The type of this variable is the type given the identifier in the innermost visible declaration. It is writeable unless it refers to an interator or is marked const in its declaration. If a variable is marked volatile then changes to the variable in one thread must be visible to other threads that may access the variable.
v[e]
(Subscripted variable) Here v is a variable of an array type or the string type and e is an expression of type int. The type of this variable is v's base type if v is an array, or char if a string. This variable is the array component or character at (zero-based) index e, and is writable unless v is a string. If during execution v is null, or e evaluates to a value less than zero or greater than or equal to the length of v, the thread dies.
v[e1 ... e2]
(Slice variable) Here v is a variable of an array type or the string type and e1 and e2 are expressions of type int. The type of this variable is the same as v's type. This variable is a new array or string whose first element is at the (zero-based) index max(e1,0), and whose last is at the index min(e2,length(v)-1). If e1 is beyond the last index of v, or is greater than e2, then this variable yields the empty array (or string). It is not writable. If during execution v is null, the thread dies.
v.f
(Selected variable) Here v is a variable of a struct type and f must be an identifier declared as a field of v's type. This variable refers to the f-field of the object referred to by v. The type of this variable is the type associated with the field f, and is writable. If during execution v is null, the thread dies.
f(e1, ..., en)
(Function call result) Here f must name a function whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and the function f is called with the arguments copied to the parameters. The function must not have been declared as void. This variable refers to the result of calling the function, has the type of the function, and is not writable.

3.7 Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

v++
++v
(Increment statement) Here v must have type int. Increments v.
v--
--v
(Decrement statement) Here v must have type int. Decrements v.
f(e1, ..., en)
(Call statement) Here f must name a method whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and the function f is called with the arguments copied to the parameters. The function must have been declared void.
v = e;
(Assignment statement) Here e must be type compatible with the type of v. v is determined and e is evaluated, then the value of e is copied into v.
break;
(Break statement) This statement may only appear within a while or for statement that is properly within the same function as the break statement. The break statement terminates the execution of the innermost enclosing while or for statement.
continue;
(Continue statement) This statement may only appear within a while or for statement that is properly within the same function as the continue statement. The continue statement terminates the current iteration of the innermost enclosing while or for statement.
return;
(Return statement) Causes an immediate return from the innermost enclosing function, which must have been marked void.
return e;
(Return statement) Evaluates e then causes the innermost enclosing function to immediately return the value of e. The function must not have been marked void, and e must be type compatible with its return type.
die e;
(Die statement) Evaluates e, which must have string type, writes this string to standard error, then causes the currently executing thread to die.
while (e) b
(While statement) Here e must have type boolean. First e is evaluated. If e produces false the execution of the while statement terminates. If e produces true, b is executed then the while statement is executed again.
if (e1) b1 else if (e2) b2 else if (e3) b3 ... else bn
(If statement) Each ei must have type boolean. Each ei is evaluated in order from left to right until one of them is true or they have all been evaluated. If any of the ei's evaluate to true the corresponding bi is executed, completing the execution of the if-statement. If none of the ei's evaluate to true, bn is executed (if it exists).
for (t i = e1; e2; s) b
(For statement) This is equivalent to {t i = e1; while (e2) {b; s;}}. If e2 is missing it is assumed to be true.
for i (a) b
(For statement) Here a must be an expression with an array type. This statement declares a new variable i whose scope is b. It executes b once for each value of i in the range of a. The expression a is evaluated only once, at the beginning of the execution of the for statement. The variable i is not writable.
s if e;
Evaluates e, then, only if it evaluated to true, executes s.
s unless e;
Evaluates e, then, only if it evaluated to false, executes s.
s while e;
Evaluates e, then, only if it evaluated to true, executes s then executes the entire statement again.
s until e;
Executes s, then evaluates e. It the evaluation produces true, the entire statement completes. Otherwise it is executed again.

3.8 Expressions

Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. String, stream, thread, array and structure values are therefore never manipulated directly, but only through references.

An expression e is type-compatible with a type t if and only if

e has type t, or
e has type int, and t is real, or
e is null and t is a reference type.

An expression of type int can appear anywhere an expression of type real is expected; in this case the integer value is implicitly converted to one of type real. The conversion must maintain the expression's value; this is always possible since the type real has 53 bits of precision.

The Hana expressions are:

An integer literal, which has type int.
A character literal, which has type char.
A floating point literal, which has type real.
A string literal, which has type string.
true
The literal of type boolean denoting truth.
false
The literal of type boolean denoting falsity.
T :: x
An enumeration literal, where T is the name of an enumeration type and x is a literal of that type.
null
A literal representing a reference to no object, and whose value is the sole value of the null type.
v
A variable: The type of this expression is the type of the variable v, and the value of this expression is the current value stored in v.
v++
v must have type int. Produces the value of v, but increments v immediately after producing the value.
v--
v must have type int. Produces the value of v, but decrements v immediately after producing the value.
++v
v must have type int. Increments v, then produces this value.
--v
v must have type int. Decrements v, then produces this value.
new t {e1, e2, ..., en}
(Object construction) Here t must be the name of an array type or a structure type. If t is an array type whose base type is t0, each ei must have type t0. This expression refers to a newly constructed array of n items consisting of the values of each subexpression, respectively. If t is a structure type, this expression refers to a newly constructed object of type t whose field values are, in order, e1 through en.
new t [e1] [e2] ... [en]
(Object construction) Here each of the ei's must have integer type. This expression refers to a newly constructed array of en items, each of which is a newly constructed array of e_n-1 items and so on, until we need to talk about the newly constructed array of e1 objects of type t. All array elements are to be initialized with the proper initial values for their type. If during execution any of the ei's evaluate to a non-positive integer, the thread dies.
start f (e1, ..., en);
(Thread construction) Spawns a new thread that runs f, which must name a function whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and copied to the parameters of f, which is then run on a new thread. The newly created thread is the value of this expression.
(e)
Evaluates e and produces this value.
-e
e must have type an arithmetic type. Evaluates e and produces the negation of e.
~e
e must have type int. Evaluates e and produces the bitwise complement of e.
!e
e must have type boolean. If e evaluates to true, the entire expression produces false, otherwise it produces true.

#e

e must be an array or a string. Produces the number of items if an array, or the number of characters if a string.

int x = #"dog";                    // x gets 3
int[] a = new int[]{4,5,8,-3,2};
x = #a;                            // x gets 5
string s = "";
x = #s;                            // x gets 0

$e
e can be any expression whatsoever. Produces a string representation of its argument. For ints, chars, reals and strings the produced string is identical to the output of printf with the %d, %c, %s and %f format specifiers, respectively. For booleans, the produced string is either "true" or "false". Arrays produce strings of the form [e1, e2, ..., en] where each ei is the result of applying $ to the elements of the object. (This implies that self-referential arrays will generate infinite strings thus causing a stack overflow error on most machines, but that will be considered the programmer's fault.) Values of an enumerated type are represented with their declared name. The null value produces "null". Structures, streams and threads produce implementation-defined strings.
e1 * e2
Either both subexpressions must have arithmetic type, or e1 must have string type and e2 have type int. In the former case, the subexpressions are evaluated in any order and their product is produced. In the latter, the subexpressions are evaluated in any order and the entire expression produces the string which is e2 copies of e1 concatenated together.
e1 / e2
Each subexpression must have an arithmetic type. Both expressions are evaluated, in any order, and the entire expression produces the quotient of e1 divided by e2. The type of the quotient is double only if either operand is double, otherwise the type is int.
e1 % e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the modulo of e1 and e2.
e1 + e2
Either both subexpressions must have arithmetic type, or both must have string type. In the former case, the subexpressions are evaluated in any order and their sum is produced. In the latter, the subexpressions are evaluated in any order and the (left-to-right) concatenation of the strings is produced.
e1 - e2
Each subexpression must have an arithmetic type. Evaluates the subexpressions in any order, then produces the difference of e1 and e2.
e1 << e2
Each subexpression must have type int. Produces the value of e1 shifted left e2 positions.
e1 >> e2
Each subexpression must have type int. Produces the value of e1 arithmetically shifted right e2 positions.
e1 <= e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than or equal to the value of e2.
e1 < e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than the value of e2.
e1 == e2
e1 must be type-compatible with the type of e2 or e2 must be type compatible with the type of e1; however, neither expression can be a stream or a thread. The subexpressions are evaluated in any order, and the entire expression produces whether these values are the same, taking into account any necessary promotions of int values to real values where necessary.
e1 != e2
Equivalent to !(e1==e2).
e1 > e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than the value of e2.
e1 >= e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than or equal to the value of e2.
e1 & e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise and of e1 and e2.
e1 ^ e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise exclusive or of e1 and e2.
e1 | e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise inclusive or of e1 and e2.
e1 && e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to false, the entire expression immediately produces false (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 || e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to true, the entire expression immediately produces true (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 ? e2 : e3
Here e1 must have type boolean, and e2 and e3 must be of the same type or one of them may be null and the other have a reference type. First e1 is evaluated. If it evaluates to true, the entire expression evaluates and produces e2, otherwise it evaluates and produces e3.

4 Standard Library

The following are assumed to be declared in a scope outside of the main program's declaration sequence. (This means they can be freely redefined in a program, but the new declarations will hide the "standard" ones.

int codepoint(char c)
Returns the codepoint of c.
char character(int i)
Returns the character at codepoint i.
string format(string format, ...)
Similar to sprintf in C, except that the formatted string is returned from the function rather than updated through a pointer.
enum StreamMode {READ, CREATE, APPEND}
A stream opened in READ mode can only be read from. A stream opened in CREATE mode can only be written from; if the corresponding resource exists at the time it is opened the resource will be emptied, otherwise it will be created. A stream opened in APPEND mode behaves like one opened in CREATE mode, except that an existing resource will not be emptied.
stream open(string filename, StreamMode mode)
Opens the file with the given name and mode, returning a stream object.
void close(stream s)
Closes the stream, releasing system resources. Dies if the stream hasn't been opened.
void print(stream f, string format, ...)
Equivalent to fprintf in C. Dies if the stream has not been opened in CREATE or APPEND mode or has already been closed.
void print(string format, ...)
Equivalent to printf in C. Dies if the stream has not been opened in CREATE or APPEND mode or has already been closed.
string getString(stream s)
Reads from the stream up to and including the first newline character, or until the end of the stream is reached. Returns a string consisting of all consumed characters not including the newline character. Bytes are converted to characters according to the default character encoding. Returns null if the end of stream had previously been reached. Dies if the stream has not been opened in READ mode or has already been closed. This is a blocking call.
string getString()
Equivalent to calling the one argument form of getString with standard input.
char getChar(stream s)
Returns the next character to be read from the stream, using the default character encoding to transform bytes into characters, or \ffffffff; if there are no characters remaining. Dies if the stream has not been opened in READ mode or has already been closed. This is a blocking call.
char getChar()
Equivalent to calling the one argument form of getChar with standard input.
real sqrt(real x)
Returns the square root of x.
const pi = acos(-1.0)
Convenient constant for π
real sin(real x)
Returns the sine of x.
real cos(real x)
Returns the cosine of x.
real atan(real x, real y)
Returns the arctangent of the angle between the positive x-axis and the line from the origin to (x,y).
real ln(real x)
Returns the natural log of x.
void sleep(int millis)
Puts the current thread to sleep for (at least) millis milliseconds.
void join(thread t)
Blocks until t has finished.
real now()
The number of milliseconds since the epoch.