The Language Carlos

The Carlos language was created sometime in the 1990s for a compiler construction class and has evolved a bit since. It is meant to be small, but not trivial, with features interesting enough to make writing a compiler an educational experience, but extremely clean so writing a compiler does not require handling difficult-to-implement tricky cases.

1  Introduction

Carlos is a statically-typed, block-structured programming language with first-class functions. It has no explicit pointers, so assumes the existence of a garbage collector. While several types do have reference semantics, there is no billion-dollar mistake, since the null value is not a member of all reference types, but only of optional types. The language is rather small, with arrays, simple structs, and functions.

const languageName = "Carlos";

function greeting(): string {
  return random(["Welcome", "こんにちは", "Bienvenido"]);
}

print("👋👋👋");
repeat 5 {
  print(greeting() + " " + languageName);
}

This document defines the language Carlos.

2  Semantics

2.1  Programs

A programis a sequence of one or more statements.

// This program prints "hello, world" to standard output.

let greeting = "hello";             // variable declaration
function place(): string {          // function declaration
  return "world";
}
print(greeting + ", " + place());   // function call statement

Apologies for the semicolons, but they do make the language somewhat easier to parse.

2.2  Values and Types

All values in Carlos are instances of a type. The language has the following types:

Booleans, integers, floats, and strings have literal forms:

true                                // has type boolean 
false                               // has type boolean 
2                                   // has type int
89123131257125891258707192          // has type int
2.0                                 // has type float
55.9                                // has type float
819.999E-15                         // has type float
"¡hola!"                            // has type string
"π™\t\n \" \' \u{1f4a9}"            // has type string
"I'm 💀"                            // has type string

The values in an array expression must all have exactly the same type; it is a compile-time error if they do not. Empty arrays must mention the intended type of the elements:

[true, true, false]                 // has type [boolean]
[3, 0, 5, 99 ** 250]                // has type [int]
[[1, 4], [2, 3]]                    // has type [[int]]
// [3, 0, 2.0]                      // ERROR Cannot mix ints and floats
// [3, "3"]                         // ERROR: Cannot mix ints and strings
[string]()                          // empty array of type [string]
// []                               // Syntax error, cannot infer a type

For optionals, use the some operator to create an optional with a present value. The operator no makes an empty optional on the given type.

some 5                              // has type int?
some ["red", "green", "blue"]       // has type [string]?
no boolean                          // has type boolean?

There are no literals for functions; they have to be named:

function sameMagnitude(x: int, y: int): boolean {
  return x == y || x == -y;
}

function printSecond(messages: [string]) {
  print(messages[1]);
}

function twice(f: (float)->float, x: float): float {
  return f(f(x));
}

sameMagnitude                       // has type (int, int)->boolean
printSecond                         // has type ([string])->void
twice                               // has type ((float)->float, float)->float

There are no type expressions for struct types; structure types have to be declared and referred to only by name. Instances of struct types are created using the struct name as a constructor:

struct Vector {i: float j: float}   // Declares new type Vector

let v = Vector(2.0, -5.0);          // Structs created by initializing with type name

// Only way to refer to this type is via its name
function magnitude(v: Vector): float {
  return hypot(v.i, v.y);
}                                   // type of magnitude is (Vector)->float

In Carlos, types are not first-class values. They can be named by identifiers, and you will see both type expressions and type names appear in the source code, but only in parameter declarations, the return type portion of a function declaration, in empty array expressions, with the no operator, and in casts. But you cannot assign them to variables or pass them to functions or return them from functions.

// let x = int                      // ERROR, int is not a legal expression value
// print(string)                    // ERROR, string is not a legal expression value
let supervisor = no string          // OK, empty optional of type string?
let scores = [float]()              // OK, just no scores yet

2.3  Declarations

A declaration binds an identifier to an entity. There are six kinds of declarations:

Here is a short example showing all six kinds:

let friends = ["EY", "MA", "AA"];   // variable decl of friends
struct Vector {                     // type decl of Vector
  i: float                          // field decl of i
  j: float
}
for friend in friends {             // iterator decl of friend
  print(friend);
}
function triple(x: int): int {      // function decl of triple, parameter decl of x
  return x * 3;
}

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. In Carlos, shadowing is never permitted, so the visible region of a declaration coincides exactly with the declaration's scope. (This is different from most other static languages in which the visible region is the scope minus any "inner" scopes of declarations of identifiers with the same name and may therefore be discontinuous).

Example. In the following script:

function g(a: [int]) {}                       // line 1
let c = g([1, 2, 5]);                         // line 2
let y = c + 2;                                // line 3
print(c);                                     // line 4
function f(string s) {                        // line 5
  // Note: let y = 0 would be illegal here    // line 6
  struct Point { x: int y: int }              // line 7
  function q(): Point? { return no Point }    // line 8
  c = 50;                                     // line 9
}                                             // line 10
f("Can't see point or s here")                // line 11

we have the following scopes:

Identifierdeclared on linehas scope
g11-11
a1body of g
c22-11
y33-11
f55-11
s55-10
Point77-10
q88-10

All declarations in a scope must declare distinct identifiers. Also notice that no shadowing rule only applies to a single path of nested scopes from the top-level scope. Identifiers in “sibling scopes” do not shadow each other. In addition, struct fields are not part of any path, so there are no restrictions on their names:

function test(x: int, y: boolean) {
  const z = 0;
  // let x = "hello";         // ERROR: x is already a parameter
  if z > 1.0 {
    // let x = 2;             // ERROR: shadowing not allowed
    const a = 5;              // OK: new var, local to the if-block
  }
  struct Point {
    x: float                  // OK: fields not part of scope chain
    y: float                  // ...so they don't actually "shadow"
  }
  for i in 0 ..< 10 {
    // let i = 3;             // ERROR: in same scope as iterator i
    while false {
      // let i = 1;           // ERROR: no shadowing
      let a = false;          // OK: does not conflict with the other a
    }
  }                           // No var a is in scope out here though
}

Each variable, parameter, or field is constrained to hold values from a certain type. The type constraint for a variable is given in its initializing expression:

let gameOver = false;         // gameOver now constrained to booleans
gameOver = true;              // OK: true is boolean
// gameOver = 5.5;            // ERROR: gameOver can only take booleans.

Parameters and return types do not have initializers, so Carlos requires their types must be specified explicitly (while in principle, some complex type inference could be done, Carlos wants to be a smaller language that is easier to implement and restricts its type inference to very local situations):

function repeated(s: string; n: int): string {
  let result = "";
  repeat n {
    result = result + s;
  }
  return result;
}

2.4  Functions

As a simple language, Carlos does not have any anonymous functions; all functions must be named. The identifier used in the function declaration is essentially a read-only variable: you can assign it to other variables but you cannot reassign it:

function next(n: int): int {
  return n + 1;
}

let successor = next;               // OK
print(successor(2));                // prints 3

// let next = successor;            // ERROR: next is read only

Functions are first-class objects, too. They can be assigned and passed as parameters and returned from functions:

function triple (x: int): int { return x * 3 }

function twice(f: (int)->int, x: int) { return f(f(x)) }

print(twice(triple, 8));            // prints "72"

2.5  Structs

It is possible to write any array, optional, or function type as a type expression from its constituent types; however there is no expression for a structure type. Structure types are defined in a struct declaration and bound to an identifier. The identifier is the only way to refer the type.

// We can use type expressions for function and array types
function applyToSecondInt(f: (int)->int, a: [int]): int {
  return f(a[1]);
}

struct Vector {i: float j: float}

// But for struct types we have to use a type name!
function magnitude(v: Vector): float {
  return hypot(v.i, v.y);
}

The fields within a struct type can be any identifier must be mutually distinct:

// struct S {x: int x: int}         // ERROR: duplicate field x

No field of a struct is allowed to have the type of the struct itself, as that would lead to an infinitely-sized structure. However, a struct may contain optionals, arrays, or functions that use the type itself:

struct S {
  a: S?          // OK: might be empty
  b: [S]         // OK: might be empty
  c: (S) -> S    // OK: function objects are not stored inline
  // d: S        // ERROR: struct would be infinite in size
}

2.7  Variables

A variable is something that stores a value. All variables are type-constrained. Variables are either writable (if declared with let) or not writable (if declared with const).

let turnsRemaining = 2;
const adjustment = 0.00137;

turnsRemaining = 3;                 // OK to assign, variable is writable
turnsRemaining--;                   // OK, ++ and -- are updates
// adjustment = 2.2E-7;             // ERROR: variable is read-only (const)

Variables directly store values of booleans, numbers, chars, and strings; however, they can only contain references to arrays, structs, functions, and optionals. This means that aliasing of an object can occur:

let a = [1, 2, 3]
let b = a;                          // same object pointed to by a and b
a[2] = 1000;                        // updates the object
print(b);                           // prints [1, 2, 1000]

The kinds of variables are:

Variables are given values (1) in a variable declaration, (2) in an assignment statement, or (3) if a parameter, during a function call where arguments are assigned to parameters. In all cases, the type $S$ of the value being assigned (the source) is checked to see if it is assignable to the type constraint $T$ on the variable (the target). The assignment is valid if and only if $S$ is assignable to $T$, which is the case if and only if:

That is, Carlos covariant in return types, contravariant in parameters, and invariant elsewhere.

2.7  Statements

A statement is code that is executed solely for its side effect; it produces no value. The kinds of statements are:

2.8  Expressions

An expression evaluates to a value or crashes the program. Carlos is a statically-typed language, meaning that the type of every expression can be inferred by looking at the program text; there is no need to run the program to determining an expression's type.

The Carlos expressions are:

3  Syntax

The source of a Carlos program is a Unicode string. Here is the grammar in Ohm notation:

carlos.ohm
Carlos {
  Program     = Statement+

  Statement   = VarDecl
              | TypeDecl
              | FunDecl
              | Exp9 ("++" | "--") ";"                        --bump
              | Exp9 "=" Exp ";"                              --assign
              | Exp9_call ";"                                 --call
              | break ";"                                     --break
              | return Exp ";"                                --return
              | return ";"                                    --shortreturn
              | IfStmt
              | LoopStmt

  VarDecl     = (let | const) id "=" Exp ";"
  TypeDecl    = struct id "{" Field* "}"
  Field       = id ":" Type
  FunDecl     = function id Params (":" Type)? Block
  Params      = "(" ListOf<Param, ","> ")"
  Param       = id ":" Type

  Type        = Type "?"                                      --optional
              | "[" Type "]"                                  --array
              | "(" ListOf<Type, ","> ")" "->" Type           --function
              | id                                            --id

  IfStmt      = if Exp Block else Block                       --long
              | if Exp Block else IfStmt                      --elsif
              | if Exp Block                                  --short
  LoopStmt    = while Exp Block                               --while
              | repeat Exp Block                              --repeat
              | for id in Exp ("..." | "..<") Exp Block       --range
              | for id in Exp Block                           --collection
  Block       = "{" Statement* "}"

  Exp         = Exp1 "?" Exp1 ":" Exp                         --conditional
              | Exp1
  Exp1        = Exp1 "??" Exp2                                --unwrapelse
              | Exp2
  Exp2        = Exp3 ("||" Exp3)+                             --or
              | Exp3 ("&&" Exp3)+                             --and
              | Exp3
  Exp3        = Exp4 ("|" Exp4)+                              --bitor
              | Exp4 ("^" Exp4)+                              --bitxor
              | Exp4 ("&" Exp4)+                              --bitand
              | Exp4
  Exp4        = Exp5 ("<="|"<"|"=="|"!="|">="|">") Exp5       --compare
              | Exp5
  Exp5        = Exp5 ("<<" | ">>") Exp6                       --shift
              | Exp6
  Exp6        = Exp6 ("+" | "-") Exp7                         --add
              | Exp7
  Exp7        = Exp7 ("*"| "/" | "%") Exp8                    --multiply
              | Exp8
  Exp8        = Exp9 "**" Exp8                                --power
              | Exp9
              | ("#" | "-" | "!" | some | random) Exp9        --unary
  Exp9        = true ~mut
              | false ~mut
              | floatlit ~mut
              | intlit ~mut
              | no Type ~mut                                  --emptyopt
              | Exp9 ("(" | "?(") ListOf<Exp, ","> ")" ~mut   --call
              | Exp9 ("[" | "?[") Exp "]"                     --subscript
              | Exp9 ("." | "?.") id                          --member
              | stringlit ~mut
              | id                                            --id
              | Type_array "(" ")" ~mut                       --emptyarray
              | "[" NonemptyListOf<Exp, ","> "]" ~mut         --arrayexp
              | "(" Exp ")" ~mut                              --parens

  intlit      = digit+
  floatlit    = digit+ "." digit+ (("E" | "e") ("+" | "-")? digit+)?
  stringlit   = "\"" char* "\""
  char        = ~control ~"\\" ~"\"" any
              | "\\" ("n" | "t" | "\"" | "\\")                --escape
              | "\\u{" hex hex? hex? hex? hex? hex? "}"       --codepoint
  control     = "\x00".."\x1f" | "\x80".."\x9f"
  hex         = hexDigit
  mut         = ~"==" "=" | "++" | "--"

  let         = "let" ~alnum
  const       = "const" ~alnum
  struct      = "struct" ~alnum
  function    = "function" ~alnum
  if          = "if" ~alnum
  else        = "else" ~alnum
  while       = "while" ~alnum
  repeat      = "repeat" ~alnum
  for         = "for" ~alnum
  in          = "in" ~alnum
  random      = "random" ~alnum
  break       = "break" ~alnum
  return      = "return" ~alnum
  some        = "some" ~alnum
  no          = "no" ~alnum
  true        = "true" ~alnum
  false       = "false" ~alnum
  keyword     = let | const | struct | function | if | else | while | repeat
              | for | in | break | return | some | no | random | true | false
  id          = ~keyword letter alnum*

  space      += "//" (~"\n" any)*                             --comment
}

4  Standard Library

The following identifiers are pre-defined in a scope that surrounds the program. This means that none of these identifiers may be declared anywhere in a program (but they may be used as fields, though doing so is not recommended).