The Language Hana

The Hana language is an extension of the Carlos language, more or less.

0 Introduction

Hana is an imperative, block structured, statically-typed programming language. It has much in common with C, such as its lack of true modularity, but differs from C in being oriented toward application-level, not system-level programming. Arrays and structures are heap-allocated, and there are no explicit pointers. Functions are allowed to be nested, overloaded, and variadic. There are built-in string, thread and stream types, and convenient operators such as # for length and $ for conversion to strings.

This document defines the language Hana.

1  Microsyntax

The source of a Hana program is a Unicode string. A lexically valid Hana program is a string that can be tokenized according to the common "longest substring" rule, with comments and whitespace allowed to separate tokens. Comments start with "//" and end with the earliest #xA or #xD. Whitespace is any string consisting entirely of the characters #x9, #xA, #xD and #x20.

The Hana tokens are (1) identifiers, (2) reserved words, (3) integer literals, (4) real literals, (5) string literals, (6) character literals, and (7) symbols. Identifiers are nonempty strings of letters (Unicode properties Lu, Ll, Lt, Lm, Lo), decimal digits (Unicode property Nd) and underscores (#x5F) beginning with a letter, except for the following reserved words:

    boolean     stream     const        if         break
                thread     volatile     else       continue
    int                                 for        return
    char        struct     null         while      start
    real        enum       true         unless     die
    string      void       false        until      new

Identifiers and reserved words are case sensitive. The symbols are:

    .       ,       ...     ;       ::
    (       )       [       ]       {       }
    =
    ==      !=      <       <=      >       >=
    ?       :       ||      &&      !
    +       -       *       /       %
    |       ^       &       ~       <<      >>
    ++      --
    #       $

Integer literals are nonempty strings of decimal digits. Real literals are described with the following regular expression:

    \d+\.\d+([Ee][+-]?\d+)?

String literals are sequences of zero or more non-control characters or escape sequences, delimited by double quotes (#x22). The escape sequences are:

\nnewline
\ttab
\xxxxxxxx; where xxxxxxxx is a one to eight character hexadecimal digit sequence, this escape sequence stands for a character with a given codepoint.
\"the double quote character
\'the single quote character
\\the backslash character

A character literal is a non-control character or escape sequence surrounded by single quotes (#x27).

2  Macrosyntax

A syntactically valid Hana program is one that is lexically valid and whose tokenization is derivable from the following grammar, presented in EBNF. Here the vertical bar denotes alternatives, square brackets enclose optional items, Kleene star denotes zero or more, Kleene plus denotes one or more, and parentheses are used for grouping. Symbols appear in single quotes. Reserved words are shown here in all lowercase and not quoted. The start symbol is PROGRAM.

  PROGRAM       →  STMT+
  DEC           →  TYPEDEC | VARDEC | FUNDEC
  TYPEDEC       →  struct ID '{' (TYPE ID ';')* '}'
                |  enum ID '{' ID (',' ID)* '}'
  TYPE          →  (boolean | char | int | real | string | stream | thread | ID) ('[' ']')*
  VARDEC        →  volatile? TYPE ID ('=' EXP)? ';'
                |  const TYPE ID '=' EXP ';'
  FUNDEC        →  (TYPE | void) ID '(' PARAMS ')' BLOCK
  PARAMS        →  (TYPE ID (',' TYPE ID)* (',' '...')? | '...')?
  BLOCK         →  '{' STMT* '}'
  STMT          →  DEC
                |  SIMPLESTMT ((if | unless | while | until) EXP)? ';'
                |  if '(' EXP ')' BLOCK (else if '(' EXP ')' BLOCK)* (else BLOCK)?
                |  while '(' EXP ')' BLOCK
                |  for '(' (TYPE ID '=' EXP)? ';' EXP? ';' ASSIGNMENT? ')' BLOCK
                |  for ID '(' EXP ')' BLOCK
  SIMPLESTMT    →  break
                |  continue
                |  return EXP?
                |  die EXP
                |  ASSIGNMENT
                |  CALL
  ASSIGNMENT    →  INCREMENT | VAR '=' EXP
  INCREMENT     →  INCOP VAR | VAR INCOP
  EXP           →  EXP0 ('?' EXP0 ':' EXP0)*
  EXP0          →  EXP1 ('||' EXP1)*
  EXP1          →  EXP2 ('&&' EXP2)*
  EXP2          →  EXP3 ('|' EXP3)*
  EXP3          →  EXP4 ('^' EXP4)*
  EXP4          →  EXP5 ('&' EXP5)*
  EXP5          →  EXP6 (RELOP EXP6)?
  EXP6          →  EXP7 (SHIFTOP EXP7)*
  EXP7          →  EXP8 (ADDOP EXP8)*
  EXP8          →  EXP9 (MULOP EXP9)*
  EXP9          →  PREFIXOP? EXP10
  EXP10         →  LITERAL
                |  VAR
                |  INCREMENT
                |  NEWOBJECT
                |  start CALL
                |  '(' EXP ')'
  LITERAL       →  null
                |  true
                |  false
                |  INTLIT
                |  FLOATLIT
                |  CHARLIT
                |  STRINGLIT
                |  ID '::' ID
  VAR           →  ID | CALL | VAR '[' EXP ('...' EXP)? ']' | VAR '.' ID
  NEWOBJECT     →  new TYPE ('{' ARGS '}' | ('[' EXP ']')+)
  CALL          →  ID '(' ARGS ')'
  ARGS          →  (EXP (',' EXP)*)?
  RELOP         →  '<' | '<=' | '==' | '!=' | '>=' | '>'
  SHIFTOP       →  '<<' | '>>'
  ADDOP         →  '+' | '-'
  MULOP         →  '*' | '/' | '%'
  PREFIXOP      →  '-' | '!' | '~' | '#' | '$'
  INCOP         →  '++' | '--'

3  Semantics

We describe the semantics of Hana informally but somewhat precisely.

3.1  Programs

A program is a sequence of one or more statements. Some statements, called declaration statements, declare entities; others simply execute.

// This is a complete Hana program.  When executed, it writes
// "hello, world" to standard output.

string greeting = "hello";
print(greeting + ", " + place());
string place() {return "world";}

3.2  Blocks

Blocks exist to control the scope of declarations. A block is a sequence of zero or more statements.

3.3  Declarations

A declaration binds an identifier to an entity. There are seven kinds of declarations:

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any holes in the scope created by declarations of identifiers with the same name within (inner) blocks contained within the scope. (This means the visible region may be discontinuous).

The declared identifiers at the top-level of a block or program (types, variables, and functions) must be mutually distinct, except that multiple functions can share the same name.

Declarations within inner blocks hide declarations with the same name in an enclosing outer block.

The identifiers declared as parameters of a function must be unique among themselves and the identifiers declared at the top level of the function's body.

void test(int x, int y) {
    real z;
    string x;         // error - x is already a parameter
    if (z > 1.0) {
        int x = 5;    // this x is fine, however
    }
}

The identifiers within a particular enumeration type declaration must be unique among themselves.

enum Light {RED, AMBER, GREEN}                     // okay
enum StickyStuff {GLUE, AMBER, CEMENT, PASTE}      // okay
enum Call {WOO, HOO, HOO}                          // error!

3.4  Types

Hana features the following types:

The types int and real are arithmetic types. The arithmetic types, enumeration types, and the type char comprise the primitive types. Types that are not primitive types are called reference types.

3.5  Functions

A function has a name, an optional return type, a parameter list, and a body. The identifiers declared as parameters must all be unique. Functions marked void in their declarations are called "void functions" or "procedures" and have no return type; functions with "..." at the end of their parameter list are called variadic functions.

The signature of a function is a triple (t0, [t1, ..., tn], b) where t0 is either the return type of the function or the string "void"; t1 through tn are, in order, the types of the parameters, and b is a boolean value stating whether the function is variadic.

An expression list (e1, ..., en) is said to match a signature (t0, [t1, ..., tk], b) whenever it holds that

Note that the return type has no effect on the definition of matching.

3.6  Variables

A variable is something that stores a value. Hana is a statically-typed language so all variables also have a type. Variables are either writable or not writable, and are either volatile or not volatile. The kinds of variables are:

3.7  Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

3.8  Expressions

Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. String, stream, thread, array and structure values are therefore never manipulated directly, but only through references.

An expression e is type-compatible with a type t if and only if

An expression of type int can appear anywhere an expression of type real is expected; in this case the integer value is implicitly converted to one of type real. The conversion must maintain the expression's value; this is always possible since the type real has 53 bits of precision.

The Hana expressions are:

4  Standard Library

The following are assumed to be declared in a scope outside of the main program's declaration sequence. (This means they can be freely redefined in a program, but the new declarations will hide the "standard" ones.