The Language Jax

The Jax language was made up by students in a compiler class in 2003.

0 Introduction

Jax (a Java almost-xubxet) is an imperative, object oriented programming language. Its syntax is almost a pure subset of Java; differences are due to the fact that Jax leaves out many features of Java while throwing in a couple nice features of its own. Notable features of Jax include:

Classes and interfaces
Purposeful lack of implementation inheritance
Overloading of constructors and methods
Language support for threading and synchronization
A built-in string type
C's printf in its standard library
Convenient length ('#') and toString ('$') operators

This document defines the language Jax.

1 Microsyntax

The source of a Jax program is a sequence of Unicode characters. Comments start with "//" and extend to the end of the line. Whitespace is any sequence of one or more characters with codepoints in the set {9, 10, 13, 32}. Tokens are formed by successively taking the longest substring that makes a valid token. Whitespace and comments always separate tokens.

Identifiers are nonempty strings of letters, decimal digits and underscores beginning with a letter, except for the following reserved words:

    interface   class     extends        implements   public   static
    final       boolean   char           int          double   string
    void        break     return         if           else     while
    for         in        synchronized   true         false    null
    this        new

Identifiers and reserved words are case sensitive. Integer literals are nonempty strings of digits. Floating point literals are described with the following regular expression:

    digit⁺ '.' digit^* [('e'|'E') ['+'|'-'] digit⁺]

String literals are sequences of zero or more printable characters, spaces or escape sequences, delimited by double quotes. The escape sequences are:

\n	newline
\t	tab
\xxxxxxxx;	where xxxxxxxx is a one to eight character hexadecimal digit sequence, this escape sequence stands for a character with a given codepoint.
\"	the double quote character
\'	the single quote character
\\	the backslash character

A character literal is a character or escape sequence surrounded by single quotes. Note that the escape sequences must be resolved during lexical analysis. Neither string literals nor character literals may extend across a line break. Integer literals, float literals, character literals, string literals, and identifiers are all tokens.

2 Macrosyntax

We give the Macrosyntax for Jax in EBNF. Brackets denote optional items, curly braces denote items that appear zero or more times, and vertical bars separate alternatives. Reserved words appear in lower case.

  PROGRAM      →  {UNIT}
  UNIT         →  INTERFACE | CLASS
  INTERFACE    →  interface ID [extends ID {',' ID}] '{' {METHODSIG ';'} '}'
  CLASS        →  class ID [implements ID {',' ID}] '{' {MEMBER} '}'
  MEMBER       →  FIELD | METHOD | CONSTRUCTOR
  FIELD        →  [public] [static] [final] TYPE ID ['=' EXP] ';'
  METHOD       →  [public] [static] METHODSIG BLOCK
  METHODSIG    →  (TYPE | void) ID '(' PARAMS ')'
  CONSTRUCTOR  →  [public] ID '(' PARAMS ')' BLOCK
  PARAMS       →  [TYPE ID {',' TYPE ID}]
  TYPE         →  (boolean | char | int | double | string | ID) {'[' ']'}
  BLOCK        →  '{' {STMT} '}'
  STMT         →  TYPE ID = EXP ';'
               |  EXP ';'
               |  VAR = EXP ';'
               |  break ';'
               |  return [EXP] ';'
               |  if '(' EXP ')' BLOCK {else if '(' EXP ')' BLOCK} [else BLOCK]
               |  while '(' EXP ')' BLOCK
               |  for '(' [TYPE ID = EXP] ';' [EXP] ';' [EXP] ')' BLOCK
               |  for '(' ID in EXP ')' BLOCK
               |  synchronized '(' EXP ')' BLOCK
  EXP          →  EXP0 {'?' EXP0 ':' EXP0}
  EXP0         →  EXP1 {'||' EXP1}
  EXP1         →  EXP2 {'&&' EXP2}
  EXP2         →  EXP3 {'|' EXP3}
  EXP3         →  EXP4 {'^' EXP4}
  EXP4         →  EXP5 {'&' EXP5}
  EXP5         →  EXP6 [RELOP EXP6]
  EXP6         →  EXP7 {SHIFTOP EXP7}
  EXP7         →  EXP8 {ADDOP EXP8}
  EXP8         →  EXP9 {MULOP EXP9}
  EXP9         →  [PREFIXOP] EXP10
  EXP10        →  LITERAL
               |  VAR
               |  INCDECOP VAR
               |  VAR INCDECOP
               |  new ID '(' ARGS ')'
               |  new TYPE ('[' EXP ']')+
               |  new TYPE ['{' ARGS '}']
               |  '(' EXP ')'
  LITERAL      →  null
               |  true
               |  false
               |  INTLIT
               |  FLOATLIT
               |  CHARLIT
               |  STRINGLIT
  VAR          →  [VARPREFIX] ID ['(' ARGS ')'] {VARSUFFIX}
  VARPREFIX    →  this '.' |  ID '::'
  VARSUFFIX    →  '[' EXP ']' | '.' ID ['(' ARGS ')']
  ARGS         →  [EXP {, EXP}]
  RELOP        →  '<' | '<=' | '==' | '!=' | '>=' | '>'
  SHIFTOP      →  '<<' | '>>'
  ADDOP        →  '+' | '-'
  MULOP        →  '*' | '/' | '%'
  PREFIXOP     →  '-' | '!' | '~' | '#' | '$'
  INCDECOP     →  '++' | '--'

3 Semantics

We give the semantics of Jax informally.

3.1 Programs and Units

A Jax program is a collection of units that includes the units from the standard library. A unit is a class or an interface. Classes and interfaces are similar to their Java counterparts, with some notable limitations.

Interfaces contain method signatures only. Each of the signatures are implicitly public and non-static.

Fields marked final cannot be modified at all; that is, there are no blank finals as in Java.

Classes cannot extend other classes, they can only implement interfaces. Classes may not be abstract; it is an error for a class that is declared to implement interfaces to fail to implement every method in those interfaces.

3.2 Declarations

A declaration binds an identifier to an entity. There are five types of declarations:

class and interface declarations,
member (field, method, and constructor) declarations,
local variable declarations,
parameter declarations, and
for-index declarations.

3.2.1 Scope

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the identifier's scope. The scope is determined as follows:

The scope of an identifier declared in a class or interface declaration is the maximum possible scope.
The scope of an identifier declared as a member is the class body in which it was declared if undecorated, or equal to the scope of the class if marked public.
The scope of an identifier declared in a local variable declaration begins immediately after the declaration and extends to the end of its innermost enclosing block.
The scope of an identifier declared in a parameter declaration is the block of the method in which the parameter declaration appears.
The scope of an identifier declared as a for-statement index is (1) for the for statement with the three-part header, the two expressions in the for-statement specifier and the block of the for-statement, or (2) for the "for-in" statement, only its block.

3.2.2 Uniqueness

The following rules restrict the choices for identifiers:

The identifier in a constructor must be exactly the same as the name of the class in which it appears.
No field or method may be declared with the name of the class or interface in which it appears.
No two fields in a class may be declared with the same identier.
It is illegal for two constructors in a class to have the same number of parameters and have the types of corresponding parameters be the same.
It is illegal for two methods in a class to have the same name and the same number of parameters and have the types of corresponding parameters be the same.
No two local variables immediately within a block may have the same name; however, "inner blocks" may have local variables that hide the declaration of a variable in an outer block (or member of a class or signature of an interface).
The preceding rule is to be applied with the understanding that constructor and method parameters are to be considered local variables in the constructor's or method's block, and that for-statement-indices are considered local variables of the statement's block.

3.3 Types

Jax features the following types:

the type boolean consisting entirely of the values true and false.
the type int of 32-bit two's-complement integers.
the type char of characters from the Universal Character Set.
the type double of IEEE-754 double precision values.
the type string of character strings.
array types, which are defined in a program. Array values are zero-based integer indexed sequences of values from its base type Array types are written t[]; the base type is t.
class and interface types, which are defined in a program.

The types int and double are called the arithmetic types; the type string together with array types, class types and interface types, are called the reference types.

3.4 Blocks

Blocks are used to control the scope of variable declarations. A block consists of zero or more statements.

3.5 Variables

A variable is something that stores a value. All variables have a type. The kinds of variables are:

i
Here i is a simple identifier, which must denote a local-variable, parameter, for-index, or a field of the class in which this variable reference appears. The type of this variable is the type given to the field, parameter, local or for-index in its declaration. Local variables and parameters are always writeable, for-index variables are always read-only, and fields are read-only id and only if they are marked final in their declaration.
this
The variable expression this may only appear within a non-static method; it denotes a read-only variable whose value is a reference to the object on which the method was called.
m(a₁, ..., a_n)
A read-only variable denoting the returned value from a method call. Here m must denote a method of the class in which the reference appears.
v[e]
Here v is a variable of an array type and e is an expression of type int. The type of this variable is a's base type. This variable is the array component at (zero-based) index e. The variable is not read-only.
v.f
Here v is a variable of a class type and f must be an identifier declared as a non-static field of v's type. This variable refers to the f-field of the object referred to by v. The type of this variable is the type associated with the field f. The variable is read-only if and only if the field is declared final.
v.m(a₁, ..., a_n)
Here v is a variable of a class type and m must be a non-static method declared in in the type of v. This is a read-only variable denoting the returned value from calling m with arguments a1 through an.
C::f
Here C is a class and f must be an identifier declared as a static field of C. This variable refers to the f-field of the class C. The type of this variable is the type associated with the field f. The variable is read-only if and only if the field is declared final.
C::m(a₁, ..., a_n)
Here C is a class or interface and m must be a static method declared in C. This is a read-only variable denoting the returned value from calling m with arguments a1 through an.

3.6 Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

t i = e;
The variable declaration. e is evaluated and then a new local variable i of type t is declared and initialized with the value of e.
e;
The expression statement. e is evaluated and its value is ignored.
v = e;
The assignment statement. e must be type compatible with the type of v and v must be writable (in other words, not read-only). v is determined and e is evaluated, then the value of e is copied into v.
break;
The break statement. This statement may only appear within a while or for statement. The break terminates the execution of the innermost enclosing while or for statement.
return;
The return statement. Causes an immediate return from the enclosing method or constructor. If a method, the method must have been marked void in its declaration.
return e;
The return statement. Evaluates e then causes the innermost enclosing method to immediately return the value of e. The method must have a return type, and e must be type compatible with it.
if (e1) b1 else if (e2) b2 else if (e3) b3 ... else bn
The if statement. Each ei must have type boolean. Each e_i is evaluated in order from left to right until one of them is true or they have all been evaluated. If any of the e_i's evaluate to true the the corresponding b_i is executed, completing the execution of the if-statement. If none of the e_i's evaluate to true, b_n is executed (if it exists).
while (e) b
The while-statement. e must have type boolean. First e is evaluated. If e is false the execution of the while statement terminates. If e is true, b is executed then the while-statement is executed again.
for (t i = e1; e2; e3) b
The for-statement. This is equivalent to {t i = e1; while (e2) {b; e3}}
for (i in a) b
The array iteration statement. Here a must be an expression with an array type. This statement declares a new variable i whose scope is b. It executes b first with i set to a[0], then a[1] and so one for each value in a. The expression a is evaluated only once, at the beginning of the execution of the array iteration statement. The variable i may not be modified within b.
synchronized (e) b
Here e must evaluate to an object. If e is unlocked, locks e, executes b then unlocks e. If e is already locked, blocks until e is unlocked. Threads queue on blocked objects in FIFO fashion.

3.7 Expressions

Each expression has a type and a value. The value of an expression with a reference type is either null or a reference to an object. Arrays and objects are therefore never manipulated directly, but only through references.

An expression e is type-compatible with a type t if and only if

e has type t, or
e has type c, where c is a class that implements interface t or any descendant of t, or
e has type i, where i is an interface that is a descendant of interface t, or
e has type int, and t is double, or
e is null and t is a reference type.

An expression of type int can appear anywhere an expression of type double is expected; in this case the integer value is implicitly converted to one of type double. The conversion must maintain the expression's value; this is always possible since the type double has 53 bits of precision.

The signature of a method or constructor refers to the number, type, and order of its parameters, for example, if a method or constructor f is declared

f(t1 p1, t2 p2, t3 p3)

then its signature is the type list (t1, t2, t3). Constructors and methods can be declared with the ellipsis '...'; for example,

f(t1 p1, t2 p2, t3 p3, ...)

has signature (t1, t2, t3, MORE). An expression list (e1, ..., en) is said to match a signature (t1, ..., tk) if (1) n=k and each ei is type-compatible (see Section 3.8) with ti, or (2) tk=MORE and k-1<=n and each of e1 through e[k-1] are type compatible with t1 through t[k-1].

The Jax expressions are as follows. Note the semantics given to operators here refers only to the built-in (non-overloaded) behavior of the operator.

An integer literal, which has type int.
A character literal, which has type char.
A floating point literal, which has type double.
A string literal, which has type string.
true
The literal of type boolean denoting truth.
false
The literal of type boolean denoting falsity.
null
A literal representing a reference to no object, and whose actual type depends on its context. Technically, every reference type t contains a value null_t.
v
Where v is a variable. The type of this expression is the type of the variable v, and the value of this expression is the current value stored in v.
v++
v must have type int. Produces the value of v, but increments v immediately after producing the value.
v--
v must have type int. Produces the value of v, but decrements v immediately after producing the value.
++v
v must have type int. Increments v, then produces this value.
--v
v must have type int. Decrements v, then produces this value.
new t (e1, e2, ..., en)
Here t names a class that has a constructor with a signature matched by the argument list e1 through en. Calls the constructor with the arguments copied to the parameters and produces a reference to the newly constructed object.
new t [e1][e2]...[en]
Produces a reference to a new array object of type t[][]...[] (an "n-dimensional array") with the specified number of components in each dimension. The values of the newly constructed object are undefined.
new t {e1, e2, ..., en}
Here t must be an array type. Produces a reference to a new array object with values e1 through en.
f(e1, ..., en)
f must name a method whose signature is matched by the argument list e1 through en. Each expression is evaluated in any order and the method f is called with the arguments copied to the parameters. If the method was marked with a type in its declaration, this expression produces the returned value from the method. Otherwise this expression produces no value. It is possible to think of such an expression producing a value of the pseudo type "void" which is not type-compatible with any type, not even itself.
(e)
Evaluates e and produces this value.
-e
e must have type an arithmetic type. Evaluates e and produces the negation of e.
~e
e must have type int. Evaluates e and produces the bitwise complement of e.
!e
e must have type boolean. If e evaluates to true, the entire expression produces false, otherwise it produces true.
#e
e must have be an array or a string. Produces the number of items if an array, or the number of characters if a string.
$e
e can be any expression whatsoever. Produces a string representation of its argument. For ints, chars, doubles and strings the produced string is identical to the output of printf with the %d, %c, %s and %f format specifiers, respectively. For booleans, the produced string is either "true" or "false". Arrays produce strings of the form [e1, e2, ..., en] where each ei is the result of applying $ to the elements of the object. For objects of classes, the produced string is equal to calling the method public string toString() if such a method exists; if it does not, the produced string is the classname followed by "@" followed by some hexadecimal digits.
e1 * e2
Both subexpressions must have arithmetic type. The subexpressions are evaluated in any order and their product is produced.
e1 / e2
Each subexpression must have an arithmetic type. Both expressions are evaluated, in any order, and the entire expression produces the quotient of e1 divided by e2. The type of the quotient is double only if either operand is double, otherwise the type is int.
e1 % e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the modulo of e1 and e2.
e1 + e2
Either both subexpressions must have arithmetic type, or both must have string type. In the former case, the subexpressions are evaluated in any order and their sum is produced. In the latter, the subexpressions are evaluated in any order and the (left-to-right) concatenation of the strings is produced.
e1 - e2
Each ei must have an arithmetic type. Evaluates the subexpressions in any order, then produces the difference of e1 and e2.
e1 << e2
Each ei must have type int. Produces the value of e1 shifted left e2 positions.
e1 >> e2
Each ei must have type int. Produces the value of e1 arithmetically shifted right e2 positions.
e1 <= e2
Each subexpression must have arithmetic type, or must both be chars. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than or equal to the value of e2.
e1 < e2
Each subexpression must have arithmetic type, or must both be chars. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is less than the value of e2.
e1 == e2
e1 must be type-compatible with the type of e2 or e2 must be type compatible with the type of e1. The subexpressions are evaluated in any order, and the entire expression produces whether these values are the same, taking into account any automatic conversions.
e1 != e2
Equivalent to !(e1==e2).
e1 > e2
Each subexpression must have arithmetic type, or must both be chars. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than the value of e2.
e1 >= e2
Each subexpression must have arithmetic type, or must both be chars. Both expressions are evaluated, in any order, and the entire expression produces whether the value of e1 is greater than or equal to the value of e2.
e1 & e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise and of e1 and e2.
e1 ^ e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise exclusive or of e1 and e2.
e1 | e2
Each subexpression must have type int. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise inclusive or of e1 and e2.
e1 && e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to false, the entire expression immediately produces false (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 || e2
Each subexpression must have type boolean. First e1 is evaluated. If it evaluates to true, the entire expression immediately produces true (without evaluating e2). Otherwise e2 is evaluated and the entire expression produces the value of e2.
e1 ? e2 : e3
Here e1 must have type boolean, and e2 and e3 must be of the same type. First e1 is evaluated. If it evaluates to true, the entire expression evaluates and produces e2, otherwise it evaluates and produces e3.

4 Standard Library

The following classes are assumed to exist in the runtime environment of every Jax program.

4.1 The Text class

class Text {
    public static int codepoint(char c) {...}
    public static char character(int i) {...}
    public static int indexOf(string s, char c) {...}
    public static char charAt(string s, int i) {...}
    public static string substring(string s, int start, int length) {...}
    public static int parseInt(String s) {...}
    public static double parseDouble(String s) {...}
    public static boolean useLocalizationsFrom(Stream s) {...}
    public static string format(String s, ...) {...}
}

codepoint(c): Returns the codepoint of character c.
character(i): Returns the character whose codepoint is i.
indexOf(s, c): Returns the index of the first occurrence of c within s, or -1 if c does not occur within s.
charAt(s, i): Returns the character at position i within s.
string substring(s, start, length): Returns the string consisting of the length characters of s starting at startIndex. If startIndex is beyond the end of s, returns the empty string. If length is too large, then the returned string consists only of the characters up to the end of s.
parseInt(s): Returns the integer that s represents.
parseDouble(s): Returns the double that s represents.
useLocalizationsFrom(s): Reads text from stream s, which must be a sequence of lines of the form k=v, and makes all these pairs comprise the current localization dictionary used by Text.format().
format(s, ...): Simlar to sprintf() in C, except that the format string is a localization key, and the created string is returned from the method, rather than updated through a pointer argument. If s is not a key in the current localization dictionary, then it is used directly as the format string.

4.2 The Math class

class Math {
    public static final double PI = ...;
    public static double sqrt(double x) {...}
    public static double sin(double x) {...}
    public static double cos(double x) {...}
    public static double atan(double x, double y) {...}
    public static double ln(double x) {...}
}

PI: The value of π
sqrt(x): The square root of x.
sin(x): The sine of x.
cos(x): The cosine of x.
atan(x, y)
ln(x): The natural logarithm of x.

4.3 The Io class

class Io {
    public static int printf(String format, ...) {...}
    public static final Stream STDIN = ...;
    public static final Stream STDOUT = ...;
}

printf(format, ...): Convenient shorthand for Io.STDOUT.printf().
STDIN: The standard input stream.
STDOUT: The standard output stream.

4.4 The Stream class

class Stream {
    public Stream forFile(String filename, String mode) {...}
    public Stream forSocket(Socket socket, String mode) {...}
    int read() {...}
    char readChar() {...}
    public ByteArray read(int count) {...}
    public string readLine() {...}
    public int write(ByteArray bytes) {...}
    public void write(String s) {...}
    public int printf(String format, ...) {...}
    public boolean close() {...}
}

forFile(filename, mode): Returns a stream associated with the file with the given name. Mode is "w" for write-only or "r" for read-only.
forSocket(socket, mode): Returns a stream associated with this socket. Mode is "w" for write-only or "r" for read-only. Returns null if the socket has not established a connection, or is a listening socket.
read(): Reads the next octet from this stream. Blocks until an octet is available. Returns the octet in the lower eight bits of its result (with the upper 24 bits clear), or returns -1 if the stream is not open.
char readChar(): Returns the next character to be read from standard input, or \ffffffff; if there are no characters remaining. This is a blocking call.
read(count): Reads at most the requested number of octets from the available octets on this stream. Returns the number of octets actually read, which may be less than the amount requested.
string readLine(): Reads characters from the stream up to and including the first newline character, or until the end of the input file is reached. Returns a string consisting of all consumed characters not including the newline character. Octets are converted to characters according to the default character encoding. Returns null if the end of file had previously been reached. This is a blocking call.
write(bytes): Writes the bytes from the specified array to this stream.
write(s): Writes the given string to this stream.
printf(format ...): Same as fprintf in C.
close(): Closes this stream.

4.5 The Runnable interface

interface Runnable {
    void run();
}

4.6 The Thread class

class Thread {
    public static Thread start(Runnable r) {...}
    public static Thread currentThread() {...}
    public static void sleep(int millis) {...}
    public void interrupt() {...}
    public boolean isInterrupted() {...}
}

start(r): Starts a new thread on which to run r's run() method. Returns a reference to this new thread.
currentThread(): Returns a reference to the currently executing thread.
sleep(millis): Causes the current thread to sleep for the specified number of milliseconds.
interrupt(): Sets the interrupted status of this thread to true.
isInterrupted(): Returns the interrupted status of this thread.

4.7 The Socket class

class Socket {
    public static Socket createListener(int port) {...}
    public Socket accept() {...}
    public static Socket createClientFor(int inetAddress, int port) {...}
    public void close() {...}
}

createListener(port): Creates a listening TCP socket on the specified port. Returns null if another socket is already bound to that port.
accept(): A blocking call that returns, for this listening socket, a new socket to communicate with the client that has just connected.
createClientFor(inetAddress, port): Returns a new socket connected to the socket at the specified remote IP address and port. Returns null if a connection cannot be established.
close(): Closes this socket.

5 Programs

Unlike Java, Jax does not require a dynamic runtime system with class loaders and the potential for a NoSuchMethodFoundError to be thrown. Instead, Jax code is intended to be compiled and linked into a standard executable file, which a host operating system can run in the usual way.

At link time, a class containing a public static void method called main, taking a single argument of type String[] must be specified; this specifies the entry point of the executable program. The operating system passes command line arguments to this method.

The Jax language definition does not specify the manner in which source code (a sequence of characters) is presented to a compiler. All that is required is that the compiler see some set of classes and interfaces. An implementation may require all units to appear in a single soruce file; other implementations may allow separate compilation of classes and interfaces. Compilers that accept separately complied units must specify some mechanism to handle units not defined in the current file being compiled; for example, if a compiler encounters a reference to a class C not in the current source file, it might look for a file called C.jax and compile that, returning to the original file when C has been compiled. Some safeguard must be built into this mechanism to prevent circular references from causing the compiler to enter an infinite loop. A compiler may employ a more sophisticated approach, looking not only for C.jax but also C.o (or C.obj). If the date on the object file is newer than that of the source file, a compiler might assume the foreign class is already compiled, and extract interface information from the object file. Mechanisms for doing this are compiler-specific and not part of the Jax language specification.