Carlos is a small statically-typed, block-structured programming language with arrays, simple structs, optionals, and first-class functions. It has no explicit pointers; instead, several types have reference semantics. However, there is no billion-dollar mistake, since the null value is not a member of all reference types, but only of optional types. It assumes the existence of a garbage collector.
const languageName = "Carlos";
function greeting(): string {
return random(["Welcome", "こんにちは", "Bienvenido"]);
}
print("👋👋👋");
repeat 5 {
print(greeting() + " " + languageName);
}
This document defines the language Carlos.
A programis a sequence of one or more statements.
// This program prints "hello, world" to standard output.
let greeting = "hello"; // variable declaration
function place(): string { // function declaration
return "world";
}
print(greeting + ", " + place()); // function call statement
Apologies for the semicolons, but they do make the language somewhat easier to parse.
All values in Carlos are instances of a type. The language has the following types:
boolean
consisting entirely of the values true
and false
.
int
of (unbounded) integers.
float
of IEEE-754 binary64 values.
string
of character strings.
void
which has no instances at all.
any
which is the union of all types.[T]
, whose instances are zero-based integer-indexed sequences of values of type T
.
T?
, representing “boxes” that either contain a value of type T or don’t contain anything.
(T1,T2,...,Tn)->T0
, consisting of all functions mapping (ordered) parameters of types T1
through Tn
to a value of type T0
.
Booleans, integers, floats, and strings have literal forms:
true // has type boolean
false // has type boolean
2 // has type int
89123131257125891258707192 // has type int
2.0 // has type float
55.9 // has type float
819.999E-15 // has type float
"¡hola!" // has type string
"π™\t\n \" \' \u{1f4a9}" // has type string
"I'm 💀" // has type string
The values in an array expression must all have exactly the same type; it is a compile-time error if they do not. Empty arrays must mention the intended type of the elements:
[true, true, false] // has type [boolean]
[3, 0, 5, 99 ** 250] // has type [int]
[[1, 4], [2, 3]] // has type [[int]]
// [3, 0, 2.0] // ERROR Cannot mix ints and floats
// [3, "3"] // ERROR: Cannot mix ints and strings
[string]() // empty array of type [string]
// [] // Syntax error, cannot infer a type
For optionals, use the some
operator to create an optional with a present value, and the no
operator to make an empty optional over the given type:
some 5 // has type int?
some ["red", "green", "blue"] // has type [string]?
no boolean // has type boolean?
There are no literals for functions; they have to be named:
function sameMagnitude(x: int, y: int): boolean {
return x == y || x == -y;
}
function printSecond(messages: [string]) {
print(messages[1]);
}
function twice(f: (float)->float, x: float): float {
return f(f(x));
}
sameMagnitude // has type (int, int)->boolean
printSecond // has type ([string])->void
twice // has type ((float)->float, float)->float
There are no type expressions for struct types; structure types have to be declared and referred to only by name. Instances of struct types are created using the struct name as a constructor:
struct Vector {i: float j: float} // Declares new type Vector
let v = Vector(2.0, -5.0 * π); // Structs created by initializing with type name
// Only way to refer to this type is via its name
function magnitude(v: Vector): float {
return hypot(v.i, v.y);
} // type of magnitude is (Vector)->float
In Carlos, types are not first-class values. They can be named by identifiers, and you will see both type expressions and type names appear in the source code, but only in parameter declarations, the return type portion of a function declaration, in empty array expressions, with the no
operator, and in casts. But you cannot assign them to variables or pass them to functions or return them from functions.
// let x = int // ERROR, int is not a legal expression value
// print(string) // ERROR, string is not a legal expression value
let supervisor = no string // OK, empty optional of type string?
let scores = [float]() // OK, just no scores yet
A declaration binds an identifier to an entity. There are six kinds of declarations:
Here is a short example showing all six kinds:
let friends = ["EY", "MA", "AA"]; // variable decl: friends
struct Vector { // type decl: Vector
i: float // field decl: i
j: float
}
for friend in friends { // iterator decl: friend
print(friend);
}
function triple(x: int): int { // function decl: triple, parameter decl: x
return x * 3;
}
Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. In Carlos, shadowing is never permitted, so the visible region of a declaration coincides exactly with the declaration’s scope. (This is different from most other static languages in which the visible region is the scope minus any "inner" scopes of declarations of identifiers with the same name and may therefore be discontinuous).
Example. In the following script:
function g(a: [int]) {} // line 1
let c = g([1, 2, 5]); // line 2
let y = c + 2; // line 3
print(c); // line 4
function f(string s) { // line 5
// Note: let y = 0 would be illegal here // line 6
struct Point { x: int y: int } // line 7
function q(): Point? { return no Point } // line 8
c = 50; // line 9
} // line 10
f("Can't see point or s here") // line 11
we have the following scopes:
Identifier | declared on line | has scope |
---|---|---|
g | 1 | 1-11 |
a | 1 | body of g |
c | 2 | 2-11 |
y | 3 | 3-11 |
f | 5 | 5-11 |
s | 5 | 5-10 |
Point | 7 | 7-10 |
q | 8 | 8-10 |
All declarations in a scope must declare distinct identifiers. Also notice that no shadowing rule only applies to a single path of nested scopes from the top-level scope. Identifiers in “sibling scopes” do not shadow each other. In addition, struct fields are not part of any path, so there are no restrictions on their names:
function test(x: int, y: boolean) {
const z = 0;
// let x = "hello"; // ERROR: x is already a parameter
if z > 1.0 {
// let x = 2; // ERROR: shadowing not allowed
const a = 5; // OK: new var, local to the if-block
}
struct Point {
x: float // OK: fields not part of scope chain
y: float // ...so they don't actually "shadow"
}
for i in 0 ..< 10 {
// let i = 3; // ERROR: in same scope as iterator i
while false {
// let i = 1; // ERROR: no shadowing
let a = false; // OK: does not conflict with the other a
}
} // No var a is in scope out here though
}
Each variable, parameter, or field is constrained to hold values from a certain type. The type constraint for a variable is given in its initializing expression:
let gameOver = false; // gameOver now constrained to booleans
gameOver = true; // OK: true is boolean
// gameOver = 5.5; // ERROR: gameOver can only take booleans.
Parameters and return types do not have initializers, so Carlos requires their types must be specified explicitly (while in principle, some complex type inference could be done, Carlos wants to be a smaller language that is easier to implement and restricts its type inference to very local situations):
function repeated(s: string; n: int): string {
let result = "";
repeat n {
result = result + s;
}
return result;
}
As a simple language, Carlos does not have any anonymous functions; all functions must be named. The identifier used in the function declaration is essentially a read-only variable: you can assign it to other variables but you cannot reassign it:
function next(n: int): int {
return n + 1;
}
let successor = next; // OK
print(successor(2)); // prints 3
// let next = successor; // ERROR: next is read only
Functions are first-class objects, too. They can be assigned and passed as parameters and returned from functions:
function triple (x: int): int { return x * 3 }
function twice(f: (int)->int, x: int): int { return f(f(x)) }
print(twice(triple, 8)); // prints "72"
It is possible to write any array, optional, or function type as a type expression from its constituent types; however there is no expression for a structure type. Structure types are defined in a struct
declaration and bound to an identifier. The identifier is the only way to refer the type.
// We can use type expressions for function and array types
function applyToSecondInt(f: (int)->int, a: [int]): int {
return f(a[1]);
}
struct Vector {i: float j: float}
// But for struct types we have to use a type name!
function magnitude(v: Vector): float {
return hypot(v.i, v.y);
}
The fields within a struct type can be any identifier must be mutually distinct:
// struct S {x: int x: int} // ERROR: duplicate field x
A variable is something that stores a value. All variables are type-constrained. Variables introduced with let
are mutable; those introduced with const
are immutable.
let turnsRemaining = 2;
const adjustment = 0.00137;
turnsRemaining = 3; // OK to assign, variable is mutable
turnsRemaining--; // OK, ++ and -- are updates
// adjustment = 2.2E-7; // ERROR: variable is immutable
Variables directly store values of booleans, ints, floats, and strings; however, they can only contain references to arrays, structs, functions, and optionals. This means that aliasing of an object can occur:
let a = [1, 2, 3]
let b = a; // same object pointed to by a and b
a[2] = 1000; // updates the object
print(b); // prints [1, 2, 1000]
Variables also come into existence as iterators in a for
statement or as parameters. Iterators and parameters are always immutable. Note that identifiers can also be bound to types and functions, but types and functions are not variables. The type constraint on any simple variable or iterator is determined by its initializer. The type constraint on a parameter is, per the syntax definition of the language, always given explicitly.
A variable is one kind of assignable expression. Variables are given values (1) in a variable declaration, (2) in an assignment statement, or (3) if a parameter, during a function call where arguments are assigned to parameters. In all cases, the type $S$ of the value being assigned (the source) is checked to see if it is assignable to the type constraint $T$ on the variable (the target). The assignment is valid if and only if $S$ is assignable to $T$, which is the case if and only if:
any
.That is, Carlos covariant in return types, contravariant in parameters, and invariant elsewhere.
A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:
let
$i$ =
$e$ ;
const
$i$ =
$e$ ;
Variable declaration statement.
Declares a new variable with name $i$. This declaration must be unique within its scope. Evaluates $e$ and assigns this value to the new variable. The new variable will be constrained to be assigned only variables of $e$’s type. The variable is mutable iff declared with let
.
struct
$S$ {
$x_1: T_1 \ldots x_n: T_n$ }
Struct type declaration. Defines a new struct type named $S$ with the given fields. The identifier $S$ must be unique within its scope, and the fields $x_1 \ldots x_n$ must be mutually distinct.
function
$f$ (
$x_1: T_1 \ldots x_n: T_n$ )
:
$T_0\;b$
Function declaration. Defines a new function named $f$ with type $(T_1, \ldots, T_n) \rightarrow T_0$. The identifier $f$ must be unique within its scope. The parameters must be unique within their scope.
=
$e$ ;
Assignment statement. $v$ must be an assignable expression the type of $e$ must be assignable to the type of $v$. $v$ is determined and then $e$ is evaluated, then the value of $e$ is copied into $v$.
++;
Increment statement.
$v$ must be an assignable expression and must have type int
. Increments $v$.
--;
Decrement statement.
$v$ must be an assignable expression and must have type int
. Decrements $v$.
(
$e_1, \ldots, e_n$ );
Call statement. $f$ must have type $(T_1,...T_n)\rightarrow \mathtt{void}$ where each $e_i$ is assignable to type $T_i$. The arguments are evaluated in any order and the function $f$ is called with the arguments copied to the parameters.
break;
Break statement.
This statement may only appear within a loop (for
, while
, or repeat
) statement that is properly within the same function as the break statement. The break statement terminates the execution of the innermost enclosing loop.
return;
Short Return statement. Causes an immediate return from the innermost enclosing function, whose type must have a return type of void
.
return
$e$ ;
Return statement. Evaluates $e$ then causes the innermost enclosing function to immediately return the value of $e$. $e$ must be assignable to the function’s return type.
if
$e$ $b$
Short If statement.
$e$ must have type boolean
. If $e$ evaluates to true, $b$ is executed, otherwise nothing else happens.
if
$e$ $b$ else
$s$
If statement.
$e$ must have type boolean
. If $e$ evaluates to true, $b$ is executed, otherwise $s$ is executed.
while
$e$ $b$
While statement.
$e$ must have type boolean
. First $e$ is evaluated. If $e$ produces false the execution of the while statement terminates. If $e$ produces true, b is executed then the entire while-statement is executed again.
repeat
$e$ $b$
Repeat statement.
$e$ is evaluated and must have type int
. Body $b$ is executed the number of times equal to the value of $e$. If this value is negative the loop executes zero times.
for
$i$ in
$r$ $b$
For-each statement. $r$ must have an array type or string type. $r$ is evaluated and then $b$ is executed repeatedly with $i$ bound to successive elements from $r$ if $r$ is an array, and for successive code points from $r$ if $r$ is a string.
for
$i$ in
$e_1$ ...
$e_2$ $b$
Through-range iteration statement.
$e_1$ and $e_2$ must have type int
. They are first evaluated in any order and then $b$ is executed repeatedly with $i$ bound to successive elements in the range, which goes from $e_1$ to $e_2$ inclusive.
for
$i$ in
$e_1$ ..<
$e_2$ $b$
Upto-range iteration statement.
$e_1$ and $e_2$ must have type int
. They are first evaluated in any order and then $b$ is executed repeatedly with $i$ bound to successive elements in the range, which goes from $e_1$ to $e_2$ exclusive.
An expression evaluates to a value or crashes the program. Carlos is a statically-typed language, meaning that the type of every expression can be inferred by looking at the program text; there is no need to run the program to determine an expression’s type.
The Carlos expressions are:
int
.float
.string
.
String literals are made from a sequence of character specifications enclosed in quotation marks (U+0022). A character specification is either a non-control character, an extended grapheme cluster, or an escape sequence. An escape sequence is one of:
Escape | Description |
---|---|
\n | newline (U+000A) |
\t | tab (U+0009) |
\u{ xxxxxx} | where xxxxxx is a one to six character hexadecimal digit sequence standing for a character with a given code point. |
\" | the quotation mark character (U+0022) |
\' | the apostrophe character (U+0027) |
\\ | the backslash character (U+005C) |
true
The literal of type boolean
denoting truth.
false
The literal of type boolean
denoting falsity.
Simple variable. An identifier is allowed to be an expression if and only if it is bound to a visible variable, parameter, or iterator.
.
$f$
Member expression. $s$ must have a struct type. $f$ must be an identifier declared as a field of $s$’s type. The member expression is an assignable expression, refers to the $f$-field of $s$, is mutable iff $s$ is mutable, and its type is the type associated with the field $f$.
[
$e$]
Subscript expression. $a$ must have an array type. $e$ must have type int
. The subscript expression is an assignable expression, refers to the array component at (zero-based) index $e$, is mutable iff $a$ is mutable, and its type is the base type of the array type. If during execution $e$ evaluates to a value less than zero or greater than or equal to the length of $a$, the program dies.
(
$e_1 \ldots e_n$)
Call expression. $f$ must either (1) have type $(T_1,...T_n)\rightarrow T_0$ where each $e_i$ has type $T_i$, or (2) be an identifier bound to a structure type. In the former case, the arguments are evaluated in any order and the function is called with the arguments copied to the parameters, with the result being the return value of the function and its type is the return type of the function type. In the latter case the variable is a newly created struct instance of the given type; the arguments are evaluated and assigned to the fields of the new object. The call expression is not an assignable expression.
?.
$f$
Optional member variable. $s$ must be a variable of a optional type wrapping a struct type. $f$ must be an identifier declared as a field of $s$’s wrapped type. The optional member expression is an assignable expression, refers to the $f$-field of the object referred to by $s$ if something is wrapped or an empty optional otherwise, is mutable iff $s$ is mutable, and its type is the optional type wrapping the type associated with the field $f$.
struct Point { x: float y: float } let p = some Point(3, 5) let q = no Point print(p?.x) // type is int?, value is (some 5) print(q?.x) // type is int?, value is (no int)
[
$e_1, \ldots, e_n$ ]
Array object construction. Each $e_i$ must have exactly the same type $T$ (there is no allowance for subtyping or anything like that). This expression refers to a newly constructed array of $n$ items consisting of the values of each subexpression, respectively, and has type $[T]$.
[
$T$]()
Empty array. A newly constructed empty array of type $[T]$.
some
$e$
An optional that wraps $e$.
no
$T$
The empty optional of type $T?$.
#
$e$
Here $e$ must be an expression of some array type. Produces the length of the array. The type of this expression is int
.
random
$e$
Here $e$ must be an expression of some array type. Produces a random element from the array.
(
$e$ )
Evaluates $e$ and produces its value.
-
$e$
$e$ must have type int
or float
. Evaluates $e$ and produces the negation of $e$.
~
$e$
$e$ must have type int
. Evaluates $e$ and produces the bitwise complement of $e$.
!
$e$
$e$ must have type boolean. If $e$ evaluates to true, the entire expression produces false, otherwise it produces true.
*
$e_2$
Both subexpressions must have type int
or both must have type float
. The subexpressions are evaluated in any order and their product is produced.
/
$e_2$
Both subexpressions must have type int
or both must have type float
. The subexpressions are evaluated in any order and their quotient is produced. The quotient will have the same type as the operands, so division of integers produces the floor division.
%
$e_2$
Each subexpression must have type int
. Both expressions are evaluated, in any order, and the entire expression produces an integer which is the remainder of $e_1$ divided by $e_2$.
+
$e_2$
Both subexpressions must have type int
or both must have type float
or both must have type string
. The subexpressions are evaluated in any order and their sum (for numbers) or concatenation (for strings) is produced.
-
$e_2$
Both subexpressions must have type int
or both must have type float
. The subexpressions are evaluated in any order and their difference is produced.
<<
$e_2$
Each $e_i$ must have type int
. Produces the value of $e_1$ shifted left $e_2$ positions.
>>
$e_2$
Each $e_i$ must have type int
. Produces the value of $e_1$ arithmetically shifted right $e_2$ positions.
<=
$e_2$
Each subexpression must have type int
, or must both have type float
, or must both have type string
. Both expressions are evaluated, in any order, and the entire expression produces whether the value of $e_1$ is less than or equal to the value of $e_2$.
<
$e_2$
Each subexpression must have type int
, or must both have type float
, or must both have type string
. Both expressions are evaluated, in any order, and the entire expression produces whether the value of $e_1$ is less than the value of $e_2$.
==
$e_2$
$e_1$ and $e_2$ must have the same type. The subexpressions are evaluated in any order, and the entire expression produces whether these values are equal, defined as follows. Ints, floats, and booleans are equal if they have the same value. Strings are the same if their byte sequences are equal (they are not normalized prior to comparison. Arrays, structs, and functions are equal iff the two expressions reference the exact same object. Two optionals are equal iff they are both empty optionals or their wrapped values are equal.
!=
$e_2$
Equivalent to !(e1==e2).
>
$e_2$
Each subexpression must have type int
, or must both have type float
, or must both have type string
. Both expressions are evaluated, in any order, and the entire expression produces whether the value of $e_1$ is greater than the value of $e_2$.
>=
$e_2$
Each subexpression must have type int
, or must both have type float
, or must both have type string
. Both expressions are evaluated, in any order, and the entire expression produces whether the value of $e_1$ is greater than or equal to the value of $e_2$.
&
$e_2$
Each subexpression must have type int
. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise AND of $e_1$ and $e_2$.
^
$e_2$
Each subexpression must have type int
. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise XOR of $e_1$ and $e_2$.
|
$e_2$
Each subexpression must have type int
. Both expressions are evaluated, in any order, and the entire expression produces an int which is the bitwise (inclusive) OR of $e_1$ and $e_2$.
&&
$e_2$
Each subexpression must have type boolean
. First $e_1$ is evaluated. If it evaluates to false, the entire expression immediately produces false (without evaluating $e_2$). Otherwise $e_2$ is evaluated and the entire expression produces the value of $e_2$.
||
$e_2$
Each subexpression must have type boolean
. First $e_1$ is evaluated. If it evaluates to true, the entire expression immediately produces true (without evaluating $e_2$). Otherwise $e_2$ is evaluated and the entire expression produces the value of $e_2$.
??
$e_2$
$e_1$ must have type $T?$ for some type $T$ and $e_2$ must have type $T$. First $e_1$ is evaluated. If it wraps a value, the entire expression immediately produces the wrapped value (without evaluating $e_2$). Otherwise $e_2$ is evaluated and the entire expression produces the value of $e_2$.
The source of a Carlos program is a Unicode string. Here is the grammar in Ohm notation:
Carlos {
Program = Statement+
Statement = VarDecl
| TypeDecl
| FunDecl
| Exp9 ("++" | "--") ";" --bump
| Exp9 "=" Exp ";" --assign
| Exp9_call ";" --call
| break ";" --break
| return Exp ";" --return
| return ";" --shortreturn
| IfStmt
| LoopStmt
VarDecl = (let | const) id "=" Exp ";"
TypeDecl = struct id "{" Field* "}"
Field = id ":" Type
FunDecl = function id Params (":" Type)? Block
Params = "(" ListOf<Param, ","> ")"
Param = id ":" Type
Type = Type "?" --optional
| "[" Type "]" --array
| "(" ListOf<Type, ","> ")" "->" Type --function
| id --id
IfStmt = if Exp Block else Block --long
| if Exp Block else IfStmt --elsif
| if Exp Block --short
LoopStmt = while Exp Block --while
| repeat Exp Block --repeat
| for id in Exp ("..." | "..<") Exp Block --range
| for id in Exp Block --collection
Block = "{" Statement* "}"
Exp = Exp1 "?" Exp1 ":" Exp --conditional
| Exp1
Exp1 = Exp1 "??" Exp2 --unwrapelse
| Exp2
Exp2 = Exp3 ("||" Exp3)+ --or
| Exp3 ("&&" Exp3)+ --and
| Exp3
Exp3 = Exp4 ("|" Exp4)+ --bitor
| Exp4 ("^" Exp4)+ --bitxor
| Exp4 ("&" Exp4)+ --bitand
| Exp4
Exp4 = Exp5 ("<="|"<"|"=="|"!="|">="|">") Exp5 --compare
| Exp5
Exp5 = Exp5 ("<<" | ">>") Exp6 --shift
| Exp6
Exp6 = Exp6 ("+" | "-") Exp7 --add
| Exp7
Exp7 = Exp7 ("*"| "/" | "%") Exp8 --multiply
| Exp8
Exp8 = Exp9 "**" Exp8 --power
| Exp9
| ("#" | "-" | "!" | some | random) Exp9 --unary
Exp9 = true ~mut
| false ~mut
| floatlit ~mut
| intlit ~mut
| no Type ~mut --emptyopt
| Exp9 "(" ListOf<Exp, ","> ")" ~mut --call
| Exp9 "[" Exp "]" --subscript
| Exp9 ("." | "?.") id --member
| stringlit ~mut
| id --id
| Type_array "(" ")" ~mut --emptyarray
| "[" NonemptyListOf<Exp, ","> "]" ~mut --arrayexp
| "(" Exp ")" ~mut --parens
intlit = digit+
floatlit = digit+ "." digit+ (("E" | "e") ("+" | "-")? digit+)?
stringlit = "\"" char* "\""
char = ~control ~"\\" ~"\"" any
| "\\" ("n" | "t" | "\"" | "\\") --escape
| "\\u{" hex hex? hex? hex? hex? hex? "}" --codepoint
control = "\x00".."\x1f" | "\x80".."\x9f"
hex = hexDigit
mut = ~"==" "=" | "++" | "--"
let = "let" ~alnum
const = "const" ~alnum
struct = "struct" ~alnum
function = "function" ~alnum
if = "if" ~alnum
else = "else" ~alnum
while = "while" ~alnum
repeat = "repeat" ~alnum
for = "for" ~alnum
in = "in" ~alnum
random = "random" ~alnum
break = "break" ~alnum
return = "return" ~alnum
some = "some" ~alnum
no = "no" ~alnum
true = "true" ~alnum
false = "false" ~alnum
keyword = let | const | struct | function | if | else | while | repeat
| for | in | break | return | some | no | random | true | false
id = ~keyword letter alnum*
space += "//" (~"\n" any)* --comment
}
The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program (but they may be used as fields, though doing so is not recommended).
function print(s: any): void
Writes a representation of $s$ to standard output.
function codepoints(s: string): [int]
Returns the code points of the $s$.
function bytes(s: string): [int]
Returns the bytes of the UTF-8 encoding of $s$.
const π
Returns the best approximate value of $\pi$ in the type float
.
function sqrt(x: float): float
Returns the square root of $x$.
function sin(x: float): float
Returns the sine of $x$ radians.
function cos(x: float): float
Returns the cosine of $x$ radians.
function exp(x: float): float
Returns $e^x$.
function ln(x: float): float
Returns the natural log of $x$.
function hypot(x: float, y: float): float
Returns the hypotenuse of a right triangle with sides $|x|$ and $|y|$.