Hana is an imperative, block structured, statically-typed programming
language. It has much in common with C, such as its
lack of true modularity, but differs from C in being oriented toward
application-level, not system-level programming. Arrays and
structures are heap-allocated, and there are no explicit pointers.
Functions are allowed to be nested, overloaded, and variadic. There
are built-in string, thread and stream types, and convenient operators
such as #
for length and $
for conversion to strings.
This document defines the language Hana.
The source of a Hana program is a Unicode string. A lexically
valid Hana program is a string that can be tokenized according
to the common "longest substring" rule, with comments and whitespace
allowed to separate tokens.
Comments start with "//"
and end with the earliest #xA or #xD.
Whitespace is any string consisting entirely of the
characters #x9, #xA, #xD and #x20.
The Hana tokens are (1) identifiers, (2) reserved words, (3) integer literals, (4) real literals, (5) string literals, (6) character literals, and (7) symbols. Identifiers are nonempty strings of letters (Unicode properties Lu, Ll, Lt, Lm, Lo), decimal digits (Unicode property Nd) and underscores (#x5F) beginning with a letter, except for the following reserved words:
boolean stream const if break thread volatile else continue int for return char struct null while start real enum true unless die string void false until new
Identifiers and reserved words are case sensitive. The symbols are:
. , ... ; :: ( ) [ ] { } = == != < <= > >= ? : || && ! + - * / % | ^ & ~ << >> ++ -- # $
Integer literals are nonempty strings of decimal digits. Real literals are described with the following regular expression:
\d+\.\d+([Ee][+-]?\d+)?
String literals are sequences of zero or more non-control characters or escape sequences, delimited by double quotes (#x22). The escape sequences are:
\n | newline |
\t | tab |
\ xxxxxxxx; |
where xxxxxxxx is a one to eight character hexadecimal digit sequence, this escape sequence stands for a character with a given codepoint. |
\" | the double quote character |
\' | the single quote character |
\\ | the backslash character |
A character literal is a non-control character or escape sequence surrounded by single quotes (#x27).
A syntactically valid Hana program is one that is lexically valid and whose tokenization is derivable from the following grammar, presented in EBNF. Here the vertical bar denotes alternatives, square brackets enclose optional items, Kleene star denotes zero or more, Kleene plus denotes one or more, and parentheses are used for grouping. Symbols appear in single quotes. Reserved words are shown here in all lowercase and not quoted. The start symbol is PROGRAM.
PROGRAM → STMT+ DEC → TYPEDEC | VARDEC | FUNDEC TYPEDEC → struct ID '{' (TYPE ID ';')* '}' | enum ID '{' ID (',' ID)* '}' TYPE → (boolean | char | int | real | string | stream | thread | ID) ('[' ']')* VARDEC → volatile? TYPE ID ('=' EXP)? ';' | const TYPE ID '=' EXP ';' FUNDEC → (TYPE | void) ID '(' PARAMS ')' BLOCK PARAMS → (TYPE ID (',' TYPE ID)* (',' '...')? | '...')? BLOCK → '{' STMT* '}' STMT → DEC | SIMPLESTMT ((if | unless | while | until) EXP)? ';' | if '(' EXP ')' BLOCK (else if '(' EXP ')' BLOCK)* (else BLOCK)? | while '(' EXP ')' BLOCK | for '(' (TYPE ID '=' EXP)? ';' EXP? ';' ASSIGNMENT? ')' BLOCK | for ID '(' EXP ')' BLOCK SIMPLESTMT → break | continue | return EXP? | die EXP | ASSIGNMENT | CALL ASSIGNMENT → INCREMENT | VAR '=' EXP INCREMENT → INCOP VAR | VAR INCOP EXP → EXP0 ('?' EXP0 ':' EXP0)* EXP0 → EXP1 ('||' EXP1)* EXP1 → EXP2 ('&&' EXP2)* EXP2 → EXP3 ('|' EXP3)* EXP3 → EXP4 ('^' EXP4)* EXP4 → EXP5 ('&' EXP5)* EXP5 → EXP6 (RELOP EXP6)? EXP6 → EXP7 (SHIFTOP EXP7)* EXP7 → EXP8 (ADDOP EXP8)* EXP8 → EXP9 (MULOP EXP9)* EXP9 → PREFIXOP? EXP10 EXP10 → LITERAL | VAR | INCREMENT | NEWOBJECT | start CALL | '(' EXP ')' LITERAL → null | true | false | INTLIT | FLOATLIT | CHARLIT | STRINGLIT | ID '::' ID VAR → ID | CALL | VAR '[' EXP ('...' EXP)? ']' | VAR '.' ID NEWOBJECT → new TYPE ('{' ARGS '}' | ('[' EXP ']')+) CALL → ID '(' ARGS ')' ARGS → (EXP (',' EXP)*)? RELOP → '<' | '<=' | '==' | '!=' | '>=' | '>' SHIFTOP → '<<' | '>>' ADDOP → '+' | '-' MULOP → '*' | '/' | '%' PREFIXOP → '-' | '!' | '~' | '#' | '$' INCOP → '++' | '--'
We describe the semantics of Hana informally but somewhat precisely.
A program is a sequence of one or more statements. Some statements, called declaration statements, declare entities; others simply execute.
// This is a complete Hana program. When executed, it writes // "hello, world" to standard output. string greeting = "hello"; print(greeting + ", " + place()); string place() {return "world";}
Blocks exist to control the scope of declarations. A block is a sequence of zero or more statements.
A declaration binds an identifier to an entity. There are seven kinds of declarations:
Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any holes in the scope created by declarations of identifiers with the same name within (inner) blocks contained within the scope. (This means the visible region may be discontinuous).
Color c = Color::GREEN; // line 1 enum Color {RED, GREEN, BLUE} // line 2 print($c); // line 3 char f(string s) { // line 4 Point q() {return null;} // line 5 struct Point {int x; int y;} // line 6 c = Color::RED; // line 7 } // line 8 f("Can't see a point here"); // line 9 // Scope of Color and f is lines 1-9 // Scope of Point and q is lines 5-7.
The declared identifiers at the top-level of a block or program (types, variables, and functions) must be mutually distinct, except that multiple functions can share the same name.
Declarations within inner blocks hide declarations with the same name in an enclosing outer block.
The identifiers declared as parameters of a function must be unique among themselves and the identifiers declared at the top level of the function's body.
void test(int x, int y) { real z; string x; // error - x is already a parameter if (z > 1.0) { int x = 5; // this x is fine, however } }
The identifiers within a particular enumeration type declaration must be unique among themselves.
enum Light {RED, AMBER, GREEN} // okay enum StickyStuff {GLUE, AMBER, CEMENT, PASTE} // okay enum Call {WOO, HOO, HOO} // error!
The types int and real are arithmetic types. The arithmetic types, enumeration types, and the type char comprise the primitive types. Types that are not primitive types are called reference types.
A function has a name, an optional return type, a parameter
list, and a body. The identifiers declared as parameters must all be
unique. Functions marked void
in their declarations
are called "void functions" or "procedures" and have no return
type; functions with "..." at the end of their parameter list are
called variadic functions.
The signature of a function is a triple (t0, [t1, ..., tn], b) where t0 is either the return type of the function or the string "void"; t1 through tn are, in order, the types of the parameters, and b is a boolean value stating whether the function is variadic.
An expression list (e1, ..., en) is said to match a signature (t0, [t1, ..., tk], b) whenever it holds that
Note that the return type has no effect on the definition of matching.
A variable is something that stores a value. Hana is a statically-typed language so all variables also have a type. Variables are either writable or not writable, and are either volatile or not volatile. The kinds of variables are:
i
(Simple variable)
Here i is a simple identifier with the
same name as an identifier declared in a visible variable declaration,
parameter declaration, or iterator declaration. The type of this variable
is the type given the identifier in the innermost visible declaration.
It is writeable unless it refers to an interator or is marked
const
in its declaration. If a variable is marked
volatile
then changes to the variable in one thread
must be visible to other threads that may access the variable.
v[e]
(Subscripted variable) Here v is a variable of an array type or the string type and e is an expression of type int. The type of this variable is v's base type if v is an array, or char if a string. This variable is the array component or character at (zero-based) index e, and is writable unless v is a string. If during execution v is null, or e evaluates to a value less than zero or greater than or equal to the length of v, the thread dies.
v[e1 ... e2]
(Slice variable) Here v is a variable of an array type or the string type and e1 and e2 are expressions of type int. The type of this variable is the same as v's type. This variable is a new array or string whose first element is at the (zero-based) index max(e1,0), and whose last is at the index min(e2,length(v)-1). If e1 is beyond the last index of v, or is greater than e2, then this variable yields the empty array (or string). It is not writable. If during execution v is null, the thread dies.
v.f
(Selected variable) Here v is a variable of a struct type and f must be an identifier declared as a field of v's type. This variable refers to the f-field of the object referred to by v. The type of this variable is the type associated with the field f, and is writable. If during execution v is null, the thread dies.
f(e1, ..., en)
(Function call result)
Here f must name a function whose signature is matched by the
argument list e1 through en, and be the only
visible function that is so matched. Each expression is evaluated
in any order and the function f is called with the arguments
copied to the parameters. The function must not have been declared
as void
. This variable refers to the result of calling the
function, has the type of the function, and is not writable.
A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:
v++
++v
(Increment statement) Here v must have type int. Increments v.
v--
--v
(Decrement statement) Here v must have type int. Decrements v.
f(e1, ..., en)
(Call statement)
Here f must name a method whose signature is matched by the
argument list e1 through en, and be the only
visible function that is so matched. Each expression is evaluated
in any order and the function f is called with the arguments
copied to the parameters. The function must have been declared
void
.
v = e;
(Assignment statement) Here e must be type compatible with the type of v. v is determined and e is evaluated, then the value of e is copied into v.
break;
(Break statement) This statement may only appear within a while or for statement that is properly within the same function as the break statement. The break statement terminates the execution of the innermost enclosing while or for statement.
continue;
(Continue statement) This statement may only appear within a while or for statement that is properly within the same function as the continue statement. The continue statement terminates the current iteration of the innermost enclosing while or for statement.
return;
(Return statement)
Causes an immediate return from the innermost enclosing function, which
must have been marked void
.
return e;
(Return statement)
Evaluates e then causes the innermost enclosing function to
immediately return the value of e. The function must not have
been marked void
, and e must be type compatible
with its return type.
die e;
(Die statement) Evaluates e, which must have string type, writes this string to standard error, then causes the currently executing thread to die.
while (e) b
(While statement) Here e must have type boolean. First e is evaluated. If e produces false the execution of the while statement terminates. If e produces true, b is executed then the while statement is executed again.
if (e1) b1 else if (e2) b2 else if (e3) b3 ... else bn
(If statement) Each ei must have type boolean. Each ei is evaluated in order from left to right until one of them is true or they have all been evaluated. If any of the ei's evaluate to true the corresponding bi is executed, completing the execution of the if-statement. If none of the ei's evaluate to true, bn is executed (if it exists).
for (t i = e1; e2; s) b
(For statement)
This is equivalent to {t i = e1; while (e2) {b; s;}}
.
If e2 is missing it is assumed to be true
.
for i (a) b
(For statement) Here a must be an expression with an array type. This statement declares a new variable i whose scope is b. It executes b once for each value of i in the range of a. The expression a is evaluated only once, at the beginning of the execution of the for statement. The variable i is not writable.
s if e;
Evaluates e, then, only if it evaluated to true, executes s.
s unless e;
Evaluates e, then, only if it evaluated to false, executes s.
s while e;
Evaluates e, then, only if it evaluated to true, executes s then executes the entire statement again.
s until e;
Executes s, then evaluates e. It the evaluation produces true, the entire statement completes. Otherwise it is executed again.
Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. String, stream, thread, array and structure values are therefore never manipulated directly, but only through references.
An expression e is type-compatible with a type t if and only if
An expression of type int can appear anywhere an expression of type real is expected; in this case the integer value is implicitly converted to one of type real. The conversion must maintain the expression's value; this is always possible since the type real has 53 bits of precision.
The Hana expressions are:
true
The literal of type boolean denoting truth.
false
The literal of type boolean denoting falsity.
T :: x
An enumeration literal, where T is the name of an enumeration type and x is a literal of that type.
null
A literal representing a reference to no object, and whose value is the sole value of the null type.
v
A variable: The type of this expression is the type of the variable v, and the value of this expression is the current value stored in v.
v++
v must have type int. Produces the value of v, but increments v immediately after producing the value.
v--
v must have type int. Produces the value of v, but decrements v immediately after producing the value.
++v
v must have type int. Increments v, then produces this value.
--v
v must have type int. Decrements v, then produces this value.
new t {e1, e2, ..., en}
(Object construction) Here t must be the name of an array type or a structure type. If t is an array type whose base type is t0, each ei must have type t0. This expression refers to a newly constructed array of n items consisting of the values of each subexpression, respectively. If t is a structure type, this expression refers to a newly constructed object of type t whose field values are, in order, e1 through en.
new t [e1] [e2] ... [en]
(Object construction) Here each of the ei's must have integer type. This expression refers to a newly constructed array of en items, each of which is a newly constructed array of en-1 items and so on, until we need to talk about the newly constructed array of e1 objects of type t. All array elements are to be initialized with the proper initial values for their type. If during execution any of the ei's evaluate to a non-positive integer, the thread dies.
start f (e1, ..., en);
(Thread construction) Spawns a new thread that runs f, which must name a function whose signature is matched by the argument list e1 through en, and be the only visible function that is so matched. Each expression is evaluated in any order and copied to the parameters of f, which is then run on a new thread. The newly created thread is the value of this expression.
(e)
Evaluates e and produces this value.
-e
e must have type an arithmetic type. Evaluates e and produces the negation of e.
~e
e must have type int. Evaluates e and produces the bitwise complement of e.
!e
e must have type boolean. If e evaluates to true, the entire expression produces false, otherwise it produces true.
#e
int x = #"dog"; // x gets 3 int[] a = new int[]{4,5,8,-3,2}; x = #a; // x gets 5 string s = ""; x = #s; // x gets 0
$e
e can be any expression
whatsoever. Produces a string representation of its
argument. For ints, chars, reals and strings the produced
string is identical to the output of printf
with the
%d, %c, %s and %f format specifiers, respectively.
For booleans, the produced string is either "true" or
"false". Arrays produce strings of the
form [e1, e2, ..., en] where each
ei is the result of applying $ to
the elements of the object. (This implies that
self-referential arrays will generate infinite strings thus causing a
stack overflow error on most machines, but that will be considered
the programmer's fault.) Values of an enumerated type
are represented with their declared name. The null value
produces "null".
Structures, streams and threads produce implementation-defined
strings.
e1 * e2
Either both subexpressions must have arithmetic type, or e1 must have string type and e2 have type int. In the former case, the subexpressions are evaluated in any order and their product is produced. In the latter, the subexpressions are evaluated in any order and the entire expression produces the string which is e2 copies of e1 concatenated together.
e1 / e2
Each subexpression must have an arithmetic type. Both expressions are evaluated, in any order, and the entire expression produces the quotient of e1 divided by e2. The type of the quotient is double only if either operand is double, otherwise the type is int.
e1 % e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the modulo of e1 and e2.
e1 + e2
Either both subexpressions must have arithmetic type, or both must have string type. In the former case, the subexpressions are evaluated in any order and their sum is produced. In the latter, the subexpressions are evaluated in any order and the (left-to-right) concatenation of the strings is produced.
e1 - e2
Each subexpression must have an arithmetic type. Evaluates the subexpressions in any order, then produces the difference of e1 and e2.
e1 << e2
Each subexpression must have type int. Produces the value of e1 shifted left e2 positions.
e1 >> e2
Each subexpression must have type int. Produces the value of e1 arithmetically shifted right e2 positions.
e1 <= e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than or equal to the value of e2.
e1 < e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than the value of e2.
e1 == e2
e1 must be type-compatible with the type of e2 or e2 must be type compatible with the type of e1; however, neither expression can be a stream or a thread. The subexpressions are evaluated in any order, and the entire expression produces whether these values are the same, taking into account any necessary promotions of int values to real values where necessary.
e1 != e2
Equivalent
to !(e1==e2).
e1 > e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than the value of e2.
e1 >= e2
Each subexpression must have arithmetic type, or must both be chars, or both must be strings. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than or equal to the value of e2.
e1 & e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise and of e1 and e2.
e1 ^ e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise exclusive or of e1 and e2.
e1 | e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise inclusive or of e1 and e2.
e1 && e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to false, the entire expression immediately produces false (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 || e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to true, the entire expression immediately produces true (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 ? e2 : e3
Here e1 must have type boolean, and e2 and e3 must be of the same type or one of them may be null and the other have a reference type. First e1 is evaluated. If it evaluates to true, the entire expression evaluates and produces e2, otherwise it evaluates and produces e3.
The following are assumed to be declared in a scope outside of the main program's declaration sequence. (This means they can be freely redefined in a program, but the new declarations will hide the "standard" ones.
int codepoint(char c)
Returns the codepoint of c.
char character(int i)
Returns the character at codepoint i.
string format(string format, ...)
Similar to sprintf in C, except that the formatted string is returned from the function rather than updated through a pointer.
enum StreamMode {READ, CREATE, APPEND}
A stream opened in READ mode can only be read from. A stream opened in CREATE mode can only be written from; if the corresponding resource exists at the time it is opened the resource will be emptied, otherwise it will be created. A stream opened in APPEND mode behaves like one opened in CREATE mode, except that an existing resource will not be emptied.
stream open(string filename, StreamMode mode)
Opens the file with the given name and mode, returning a stream object.
void close(stream s)
Closes the stream, releasing system resources. Dies if the stream hasn't been opened.
void print(stream f, string format, ...)
Equivalent to fprintf in C. Dies if the stream has not been opened in CREATE or APPEND mode or has already been closed.
void print(string format, ...)
Equivalent to printf in C. Dies if the stream has not been opened in CREATE or APPEND mode or has already been closed.
string getString(stream s)
Reads from the stream up to and including the first newline character, or until the end of the stream is reached. Returns a string consisting of all consumed characters not including the newline character. Bytes are converted to characters according to the default character encoding. Returns null if the end of stream had previously been reached. Dies if the stream has not been opened in READ mode or has already been closed. This is a blocking call.
string getString()
Equivalent to calling the one argument
form of getString
with standard input.
char getChar(stream s)
Returns the next character to be read from the stream, using the default character encoding to transform bytes into characters, or \ffffffff; if there are no characters remaining. Dies if the stream has not been opened in READ mode or has already been closed. This is a blocking call.
char getChar()
Equivalent to calling the one argument
form of getChar
with standard input.
real sqrt(real x)
Returns the square root of x.
const pi = acos(-1.0)
Convenient constant for π
real sin(real x)
Returns the sine of x.
real cos(real x)
Returns the cosine of x.
real atan(real x, real y)
Returns the arctangent of the angle between the positive x-axis and the line from the origin to (x,y).
real ln(real x)
Returns the natural log of x.
void sleep(int millis)
Puts the current thread to sleep for
(at least) millis
milliseconds.
void join(thread t)
Blocks until t
has
finished.
real now()
The number of milliseconds since the epoch.