Manatee is an imperative, block structured, statically-typed programming language. It is designed to be nice and non-threatening (like Manatees!) to novice programmers:
The language is designed to be implementable by small teams of undergraduate students in a single semester, while still being a generally usable language. It features variables, booleans, characters, whole numbers, floating point numbers, strings, arrays, objects, procedures, functions, assignments, loops, exceptions, primitive built-in I/O, and a source-level module import facility. Functions are not first-class in this version of the language.
This document defines the language Manatee.
The source of a Manatee program is a Unicode string. A lexically valid Manatee program is a string that can be tokenized according to the usual longest match rule, with comments and whitespace allowed to separate tokens. Whitespace is text matching the regex:
[ \t]+and a comment is text matching:
--[^\r\n]*
In other words, whitespace is a sequence of one or more tab characters and spaces. A comment begins with two hyphen characters (U+002D) and extends to the end of the current line, namely, the next carriage return (U+000D) or line feed (U+000A).
The Manatee tokens are identifiers (ID
), reserved words,
symbols, integer literals (INTLIT
), number literals
(NUMLIT
), string literals (STRLIT
),
character literals (CHARLIT
), and breaks (BR
).
Identifiers are nonempty strings of letters (Unicode categories Lu, Ll, Lt, Lm, Lo) and decimal digits (Unicode category Nd) beginning with a letter, except for the reserved words. The reserved words and symbols can be inferred from the macrosyntax in the following section; the reserved words are precisely those in lowercase. Identifiers and reserved words are case sensitive.
Integer literals are described with the regular expression:
\d+
and number literals with:
\d+\.\d+([×x]10\^\-?\d+)?
In other words, number literals consist of a non empty sequence of plain decimal digits (0-9) followed by the decimal point (a period) followed optionally by a exponent part. This part begins with either a multiplication sign (U+00D7) or a Latin small letter X (U+0078); this is followed by the digit 1 (U+0031) then the digit 0 (U+0030), the circumflex accent (U+005E), an optional hyphen (U+002D), then the exponent value. No spaces or underscores are allowed inside the literal.
A string literal is a sequence of zero or more non-control characters or escape sequences, delimited by double quotes (U+0022). A character literal is a non-control character or escape sequence surrounded by single quotes (U+0027).The escape sequences are:
Sequence | Meaning |
---|---|
\n | the line feed character (U+000A) |
\t | the tab character (U+0009) |
\" | the quotation character (U+0022) |
\' | the apostrophe character (U+0027) |
\\ | the backslash character (U+005C) |
\( hex)
| where hex is a one to six character string of hexadecimal digits; this escape sequence stands for a character whose codepoint is the value of the hex string. |
Finally, a break is text matching the regex
\n|\r\n?
A syntactically valid Manatee program is one that is lexically
valid and whose tokenization is derivable from the following grammar,
presented in an EBNF variant. Here the vertical bar
denotes alternatives, the question mark means optional, Kleene
star denotes zero or more, Kleene plus denotes one or more, and parentheses
are used for grouping. Symbols appear in single quotes.
Reserved words are shown here in all lowercase and are not quoted.
The start symbol is SCRIPT
. The tokens ID
,
INTLIT
,
NUMLIT
, STRLIT
, CHARLIT
, and BR
are
defined above in the section on microsyntax.
SCRIPT → BR* IMPORT* STMT+ IMPORT → use module ID BR+ STMT → (DEC | SIMPLESTMT MODIFER? | COMPLEXSTMT) BR+ DEC → VARDEC | TYPEDEC | PROCDEC | FUNCDEC VARDEC → my ID is always? EXP | my ID is (a | an) TYPE TYPE → truth value | character | whole? number | string | ID | TYPE list TYPEDEC → (a | an) ID has ':' BR+ ((a | an) TYPE ID BR+)+ end PROCDEC → to ID PARAMS? BLOCK end FUNDEC → to get (a | an | some | the)? TYPE ID (of PARAMS)? BLOCK end PARAMS → TYPE ID (',' TYPE ID)* (','? and TYPE ID)? BLOCK → ':' BR+ STMT+ SIMPLESTMT → write EXP | read EXP | increment EXP (by EXP)? | decrement EXP (by EXP)? | set EXP (',' EXP)* to EXP (',' EXP)* | exit the loop | return EXP? | fail (with EXP)? | do nothing | do ID EXPLIST? (after EXP (second | seconds))? MODIFIER → (if | unless | while | until) EXP COMPLEXSTMT → CONDITIONAL | LOOP | TRY CONDITIONAL → if EXP BLOCK (else if EXP BLOCK)* (else BLOCK)? end LOOP → LOOPCONTROL BLOCK end LOOPCONTROL → loop (EXP times)? | (while | until) EXP | for each ID in EXP (down? to EXP (by EXP)?)? TRY → try BLOCK recover BLOCK end EXP → EXP1 (or EXP1)* EXP1 → EXP2 (and EXP2)* EXP2 → EXP3 (bit or EXP3)* EXP3 → EXP4 (bit xor EXP4)* EXP4 → EXP5 (bit and EXP5)* EXP5 → EXP6 (RELOP EXP6)? EXP6 → EXP7 (SHIFTOP EXP7)* EXP7 → EXP8 (ADDOP EXP8)* EXP8 → EXP9 (MULOP EXP9)* EXP9 → PREFIX* EXP10 EXP10 → EXP11 SUFFIX* EXP11 → LITERAL | ID | ID '{' ID ':' EXP (',' ID ':' EXP)* '}' | '[' EXPLIST? ']' | '(' EXP ')' LITERAL → nothing | yes | no | INTLIT | NUMLIT | CHARLIT | STRLIT EXPLIST → EXP (',' EXP)* RELOP → '<' | '<=' | '=' | '≠' | '>=' | '>' | divides | is not? SHIFTOP → '<<' | '>>' | left shifted | right shifted ADDOP → '+' | '-' | in MULOP → '*' | '/' | modulo PREFIX → '-' | not | length of | complement of SUFFIX → '(' EXPLIST? ')' | '[' EXP ']' | '.' ID
A script is a sequence of zero or more imports followed by one or more statements. Some statements, called declaration statements, declare entities (types, variables, procedures, and functions).
Here is the simplest Hello World program:
write "Hello, world"
And here is a convoluted Hello World program that screams its greeting:
-- This is a pretty interesting Manatee program. -- It writes "hello, world" to standard output. -- It is more confusing than it needs to be. use module Text my place is "world" greet(place) to greet string s: write uppercase("hello, " + s) end
Finally, a very slow prime number printing example:
to get the truth value prime of whole number n: return no if n < 2 for each d in 3 to n - 1 by 2: return no if d divides n end return yes end for each k in 1 to 100: write k if prime(k) end
An import is replaced with the contents of the named module at compile time.
In other words, the import acts like the #include directive in C — it brings in source code, not an intermediate compiled form.
A declaration binds an identifier to an entity. There are seven kinds of declarations:
Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any holes in the scope created by declarations of identifiers with the same name within (inner) scopes nested within.
This allows types and functions to be mutually recursive.
All indentifiers declared in variable, procedure, function, type, and iterator declarations of the same statement sequence must be mutually distinct. Here the phrase "of the same statement sequence" is meant to exclude any statements that are part of nested sequences introduced by compund statements and procedure and function declarations.
my x is a number -- ok, variable declaration my x is a number -- error, x already declared my x is a string -- also error (type does NOT matter) my x is always 5 -- also error (neither does readonly status) loop: my x is a number -- ok, inside an inner statement sequence end while not 1 = 2: my x is a string -- ok, inside a different loop end for each x in [1, 10, 100]: -- error, x already declared write x end to echo string x: -- ok, parameter declaration (new scope) write x write x end to get number x of string y: -- error, x already declared do nothing end to x string y: -- error, x already declared do nothing end an x has: -- error, x already declared a string name end a y has: a number x -- ok, property declarations not considered end
Declarations within inner statement sequences hide declarations with the same name in an enclosing outer ones.
The identifiers declared as parameters of a procedure function must be unique among themselves and the identifiers declared at the top level of the function's body.
to split number list x: my z is a number -- ok my x is a string -- error, x is already a parameter end
truth value
consisting entirely of the values
yes
and no
.
whole number
of 32-bit two's-complement integers.
character
of characters from the Universal Character Set.
number
of IEEE-754 double precision values.
string
of character strings.
t list
whose values are zero-based integer-indexed sequences of values
of type t.
nothing
.
The types whole number
and number
are
arithmetic types. The arithmetic types together with the
types truth value
and character
comprise the primitive
types. Types that are not primitive types (namely the array types,
the object types, and the null type) are called reference types.
Object types must be declared. For example:
a point has: -- declares type point a number x -- declares property x of type point a number y -- declares property y of type point end
a dress has: a string brand a whole number size a string list colors a truth value used end
The names of the properties of a type must be mutually distinct, though they can of course overlap with all other identifiers in a script.
A variable is something that stores a value. Manatee is a statically-typed language so all variables also have a permanent type. Variables are either writable or not writable.
Variables are declared in a (1) variable declaration statement, (2) as a parameter inside a function declaration, or (3) as an iterator in a for-statement. A variable declaration gives the variable name and either a type or an initializing expression.
my height is 193.0 -- ok, new variable with type number my weight is a number -- ok, weight initially undefined my limit is always 140 -- ok, limit has type whole number my p is point {x: 3, y: 9} -- ok, p has type point my v is [3,1.0,8,2] -- ok, v has type number list my x is x -- error, cannot infer the type!
Parameter declarations must always have a type so there is no need to worry about inference in those cases.
When declared in a for-statement the type of the variable is inferred.
In addition, variables are writable or non-writable. Variables declared
in a variable declaration are writable unless marked
always
in its declaration. Parameters are always writable.
Iterators are always non-writeable.
my pet is "dog" -- declares a writable variable my limit is always 1024 -- declares a read-only variable write pet + "house" -- writes "doghouse" write limit / 2 -- writes 512 set pet to "rat" -- ok set limit to 2048 -- error (variable given fixed value in declaration) for each i in 1 to 10: increment i by 2 -- error (i is an iterator) end
A procedure has a name, a parameter list, and a body. A function has a name, a parameter list, a body and a return type. Procedure and function declarations also declare their parameters. All parameters must be declared with an explicit type; parameter types are never inferred. The return type of a function must also be explicit.
to get number average of number x and number y: return (x + y) / 2 end
to get number max of number x, number y, and number z: my number t is x set t to y if y > t set t to z if z > t return t end
use module Text to scream string message: write uppercase(message) end do scream "wowie, wow, wow!"
to expand circle c and number factor: set c.radius to c.radius times factor end
The parameter signature of a procedure or function is [t_{1}, ..., t_{n}] where t_{1} through t_{n} are, in order, the types of the parameters.
An expression list (e_{1}, ..., e_{n}) is said to match a parameter signature [t_{1}, ..., t_{k}] whenever it holds that
Note that the return type has no effect on the definition of matching.
Each expression has a type and a value. The value of an expression
with a reference type is either nothing
or a reference to an array or object.
Arrays and objects are therefore never manipulated directly, but only through references.
An expression e is type-compatible with a type t if and only if
whole number
, and t is number
, or
nothing
and t is a reference type.
list
, and all values
in e are type-compatible with t'.
An expression of type whole number
can appear anywhere an expression
of type number
is expected; in this case the integer value
is implicitly converted to one of type number
. The conversion
must maintain the expression's value; this is always possible since
the type number
has 53 bits of precision, while the type
whole number
has only 32.
A literal is an expression that directly denotes a value. The Manatee expressions include the following literals:
whole number
.
number
.
character
.
string
.
yes
, the literal of type truth value
denoting truth.
no
, the literal of type truth value
denoting falsity.
nothing
, a literal representing
a reference to no object, and whose value is the sole value of the null
type.
Certain Manatee expressions are called variable expressions, or L-Values. These denote expressions that can be, unless marked non-writable: assigned to, incremented, decremented, or read into. These expressions are as follows:
id
is a simple identifier with the
same name as an identifier declared in a visible variable declaration,
parameter declaration, or iterator declaration. (Notice that types, procedures,
functions, and properties on the other hand are not variables!) The type of this variable
expression is the type given the identifier in the innermost visible declaration.
whole number
. The type
of the entire expression is v's base type if v is an array, or
character
if a string. The expression produces
the array component or character at (zero-based) index e, and
is writable unless v is a string. (Strings are immutable.)
If during execution v is nothing
, a fail event is generated
with the string "nonexistent array".
If e evaluates
to a value less than zero or greater than or equal to the length
of v, a fail event is generated with the string "out of bounds".
my scores is [3, 6.2, 9] write scores[0] -- prints 3 write scores[1] -- prints 6.2 write scores[2] -- prints 9 write "Bangarang"[7] -- prints 'n' write scores[-3] -- fails with "out of bounds" my x is a number list write x[4] -- fails with "nonexistent array"
nothing
, a fail event is generated
with the string "nonexistent object".
a point object has: a number x a number y end my p is point {x: 2, y: 4} write p.x -- prints 2 write p.y -- prints 4
nothing
, or (3) be a mix of number
and whole number
expressions.
The type of the entire expression is that of the most general type over
all of the e_{i}'s.
"missing_properties"
is generated.
Extending the language to allow properties to take on defaults so as to be omitted from constructor expressions can be part of a homework assignment or exam.
my message is "howareyoutoday?" write slice(message,3,6) -- prints "areyou" write slice(message,3,8)[2] -- prints 'e' set slice(message,3,8)[2] to 'f' -- ok set slice(message,3,8) to 'x' -- error: function calls are not variables
truth value
. Evaluates e, then
produces yes
if e evaluated to no
, and produces
no
if e evaluated to yes
.
my x is length of "dog" -- x is now 3 my primes is [2,3,5,7,11] set x to length of primes -- x is now 5 my s is "" set x to length of s -- x is now 0
whole number
. Evaluates e, then
produces the bitwise complement of e.
string
and e_{2}
have type whole number
. In the former case, the subexpressions are
evaluated in any order and their numeric product is produced. The product will be of
type whole number
only if both subexpressions have type
whole number
; otherwise the type will be number
. In the latter case,
the subexpressions are evaluated in any order and the entire expression
produces the string which is e_{2} copies of e_{1}
concatenated together.
whole number
only if both subexpressions have type
whole number
; otherwise the type will be number
.
whole number
.
Both expressions are evaluated, in any order, and the entire
expression produces a whole number
which is the modulo
of e_{1}
and e_{2}.
whole number
only if both subexpressions are whole numbers, otherwise
the sum has type number
.
list
and e_{2}
is compatible with type t, then the sum is a new list consisting of
the elements of e_{1} followed by e_{2}.
list
and e_{1}
is compatible with type t, then the sum is a new list consisting of
e_{1} followed by the elements of e_{2}.
list
, then the sum is a new list consisting of
the elements of e_{1} followed by the elements of e_{2}.
whole number
only if both expressions have
type whole number
, otherwise the difference shall have type number
.
character
and
e_{2} type string
, or e_{2}
must have type t
list and the type of e_{1}
must be compatible with t. Returns whether or not e_{1}
is a member of e_{2}.
whole number
.
Both expressions are evaluated, in any order. Produces the value of
e_{1} shifted left e_{2} positions.
whole number
.
Both expressions are evaluated, in any order. Produces the value of
e_{1} arithmetically shifted right e_{2} positions.
4 = 4.0
), characters by codepoint, and strings lexicographically.
(not ((e_{1}) =
(e_{2})))
.
(((e_{1}) < (e_{2}))
or ((e_{1}) = (e_{2})))
.
(not ((e_{1}) <= (e_{2})))
.
(not ((e_{1}) < (e_{2})))
.
(not ((e_{1}) is
(e_{2})))
.
whole number
. Both expressions are evaluated
in any order, and the entire expression produces the boolean value of whether
e_{1} evenly divides e_{2}.
7 divides 14 -- yes 7 divides 12 -- no 7 divides 3 -- no 7 divides 0 -- yes
whole number
. Both expressions are evaluated
in any order, and the entire expression produces the whole number
value of the
bitwise AND of e_{1} and e_{2}.
whole number
. Both expressions are evaluated
in any order, and the entire expression produces the whole number
value of the
bitwise OR of e_{1} and e_{2}.
whole number
. Both expressions are evaluated
in any order, and the entire expression produces the whole number
value of the
bitwise XOR of e_{1} and e_{2}.
truth value
. First e_{1} is
evaluated. If e_{1} is no
, the entire expression
immediately produces no
(without evaluating e_{2}).
Otherwise e_{2} is evaluated and the entire expression produces the value of
e_{2}.
truth value
. First e_{1} is
evaluated. If e_{1} is yes
, the entire expression
immediately produces yes
(without evaluating e_{2}).
Otherwise e_{2} is evaluated and the entire expression produces the value of
e_{2}.
A statement is code that is executed solely for its side effect; it produces no value.
"nothing_to_read"
.
If v has type string
, then reads as much of
standard input as it can up to and including the next line ending character (or characters).
If v has type character
then reads the next character from standard input.
If v has a numeric type then reads as much as possible that parses as
a value of the actual type; if however no input can be parsed
as a number, fails with the string "cannot_read_number"
.
If v is a truth variable then reads the next
character and produces yes
if the character matches the regex [1WwTtYy] or
no
if the character matches [0FfNn] else fails with the string
"cannot_read_truth_value"
.
whole number
. Increments v
by e, or by 1 if no e is present.
whole number
. Decrements v
by e, or by 1 if no e is present.
string
. Evaluates, then generates
a fail event with, e. If e is not present it is as if it were
"unspecified_error"
.
after
-clause is present, then
e must have type number
. The call will be scheduled to
run e seconds in the future. Negative values of e are treated
as if they were 0.
truth value
. Evaluates e, then, only if
the value produced is yes
, executes s.
truth value
. Evaluates e, then, only if
the value produced is no
, executes s.
truth value
. Evaluates e, then, only if
the value produced is yes
, executes s and then the entire while-statement
again.
truth value
. Executes s, then evaluates
e. If the value produced is yes
, the entire statement completes.
Otherwise the entire until-statement is executed again.
truth value
.
Evaluates the e_{i}'s in order until one of them produces yes
.
If one does, the corresponding body is executing, thus completing the execution of the entire
statement. If no expression produces yes
, then b_{n}
is executed (if it is present).
The loop can only terminate upon execution
of a return
or exit
statement, or upon a failure event that
is not handled internally.
whole number
. Evaluates n,
then executes body
n times (unless terminated early via return, exit, or failure event).
truth value
. Evaluates e, then, only if
the value produced is yes
, executes body and then the entire while-statement
again.
truth value
. Executes body, then evaluates
e. If the value produced is yes
, the entire statement completes.
Otherwise the entire until-statement is executed again.
character
if c is a
string, or the base type of c if c is an array. Executes body
for each element or character of c assigned to x.
whole number
or both have type character. Evaluates x and y,
then declares a new iterator i with the type of x. Executes body
for each value in [x, x+1, ..., y]
assigned to x.
whole number
or both have type character. Evaluates x and y,
then declares a new iterator i with the type of x. Executes body
for each value in [x, x-1, ..., y]
assigned to x.
The following modules are officially a part of the Manatee language, and must therefore be included in all implementations.
This section is under construction.
The Time
library contains various operations for working with
datetimes. The datetime type will represent days within 100 million days of
the epoch, 1970-01-01T00:00Z. Days contain 86,400,000,000 milliseconds, and leap
seconds are not taken into account.
a datetime has: a whole number year a whole number month a whole number day a whole number hour a whole number minute a whole number second a whole number millisecond a number time_zone_offset -- in hours end whole number now -- The number of milliseconds since the epoch. whole number epoch_time of datetime d -- Returns number of milliseconds since the epoch for the given datetime string datestring of whole number epoch time and whole number tzoffset -- Returns a (proleptic) ISO 8601 date string for the timestamp -- epochtime using the time zone offset tzoffset
The Math
library contains various mathematical constants and functions.
-- This is the standard Manatee Math library. number sqrt of number x -- Returns the square root of x. pi is always acos(-1.0) -- Convenient constant for π. number sin of number x -- Returns the cosine of x. number cos of number x -- Returns the cosine of x. number acos of number x -- Returns the arc cosine of x (might be NaN). number atan of number x and number y -- Returns the arctangent of the angle between the positive x-axis -- and the line from the origin to (x,y) number ln of number x -- Returns the natural log of x. whole number random from number x -- Returns a random int between 0 (inclusive) and x (exclusive).
use module Math write sqrt(400) -- 20 write pi -- 3.141592653589793 write sin(-0.3) -- -0.29552020666133955 write cos(2) -- -0.4161468365471424 write acos(0.5) -- 1.0471975511965976 write atan(5,-12) -- 0.3947911196997615 write atan(4,0) -- 1.5707963267948966 write ln(142341394) -- 18.77373891323974 write random(6) -- could be 0, 1, 2, 3, 4, or 5
The Text
library contains functions that operate on text strings.
string lowercase of string s -- Returns the lower case equivalent of s (or s itself if no equivalent exists). string uppercase of string s: -- Returns the upper case equivalent of s (or s itself if no equivalent exists). whole number position of string s and character c -- Returns the leftmost position of c in s, or -1 if c does not appear in s. whole number slice of string x, whole number start and whole number length -- Returns the slice (substring) of s from position start that -- contains (at most) length characters.
use module Text write lowercase("CheezBurger") -- "cheezburger" write uppercase("cheezBuRGER") -- "CHEEZBURGER" write position("kthxbye", 'x') -- 3 write position("random", 'w') -- -1 write slice("ROTFLMAO", 2, 5) -- "TFLMA" write slice"ROTFLMAO", 4, 20) -- "LMAO"