The Language Roflkode

The language Roflkode was designed for a compiler construction class. It borrows heavily from Adam Lindsay's LOLCODE language.

Roflkode is an imperative, block structured, statically-typed programming language that is very similar to Adam Lindsay's language LOLCODE.

The language is designed to be implementable by small teams of undergraduate students in a single semester. Therefore it subsets LOLCODE in places; for example, it does not do interpolation within strings. In addition, in order to address some "interesting" aspects of compiler construction, it deviates from LOLCODE by adopting static typing, infix operators, operator precedence, and bracketing for array, bucket, and call expressions (!!!! 0_0 SIAS 0_0 !!!!).

This document defines the language Roflkode.

1  Microsyntax

The source of a Roflkode program is a Unicode string. A lexically valid Roflkode program is a string that can be tokenized according to the usual "longest substring" rule, with comments and whitespace allowed to separate tokens. Whitespace is text matching the regex

[ \t]+
and a comment is text matching
BTW[^\r\n\x85\u2028\u2029]*

This precludes any identifier from starting with BTW.

The Roflkode tokens are identifiers (id), reserved words, symbols, integer literals (intlit), number literals (numlit), string literals (strlit), character literals (charlit), and breaks (br). Identifiers are nonempty strings of letters (Unicode properties Lu, Ll, Lt, Lm, Lo) and decimal digits (Unicode property Nd) beginning with a letter, except for the reserved words. The reserved words and symbols can be inferred from the macrosyntax in the following section; the reserved words are the ones in ALL CAPS.

Identifiers and reserved words are case sensitive.

Integer literals are described with the regular expression

-?\d+

and number literals with:

-?\d+\.\d+([Ee][+-]?\d+)?

String literals are sequences of zero or more non-control characters or escape sequences, delimited by double quotes (U+0022). The escape sequences are:

SequenceMeaning
:)the newline character (U+000A)
:>the tab character (U+0009)
:"the double quote character (U+0022)
:'the single quote character (U+0027)
::the colon character (U+003A)
:(h) where h is a one to six character string of hexadecimal digits; this escape sequence stands for a character whose codepoint is the value of the hex string.

A character literal is a non-control character or escape sequence surrounded by single quotes (U+0027).

The br token is defined by the regex

,|[\r\n\x85\u2028\u2029]

In other words, a br is generally a line end, but it can also be a comma for those cases when the programmer would like to pack extra code on a single line.

2  Macrosyntax

A syntactically valid Roflkode program is one that is lexically valid and whose tokenization is derivable from the following grammar, presented in an EBNF variant. Here the vertical bar denotes alternatives, the question mark means optional, Kleene star denotes zero or more, Kleene plus denotes one or more, and parentheses are used for grouping. Symbols appear in single quotes. Reserved words are shown here in all uppercase and are not quoted. The start symbol is script. The tokens intlit, numlit, strlit, charlit, and br are defined above in the section on microsyntax.

  script       →  br* HAI br+ import* stmt+ KTHXBYE br*
  import       →  CAN HAS id '?' br+
  stmt         →  (dec | simplestmt modifier? | complexstmt) br+
  dec          →  vardec | typedec | fundec
  vardec       →  I HAS A type? id (ITZ 4EVER? exp)?
  type         →  B00L | KAR | INT | NUMBR | YARN | id | type LIST
  typedec      →  TEH BUKKIT UV br* (type id br*)* AKA id
  fundec       →  I CAN (MAEK type)? id params? br+ stmt+ SRSLY
               |  THEM CAN (MAEK type)? id params?
  params       →  WIF? UR type id (AN type id)*
  simplestmt   →  YO exp+
               |  FACEPALM exp+
               |  UPZORZ var
               |  NERFZORZ var
               |  var R exp
               |  GTFO id
               |  HWGA id?
               |  HEREZ UR exp
               |  DIAF exp?
               |  GIMMEH var
               |  BRB exp
               |  id args
  modifier     →  (IF | CEPT IF | WHIEL | TIL) exp
  complexstmt  →  conditional
               |  switch
               |  loop
               |  try
  conditional  →  exp '?' br+ WERD br+ stmt+ (MEBBE exp br* stmt+)* (NO WAI br* stmt+)? OIC
  switch       →  exp WTF '?' br+ (OMG literal br+ stmt+)+ OMGWTF br+ stmt+ OIC
  loop         →  IM IN UR id loopcontrol? br+ stmt+ LOL
  loopcontrol  →  (WHIEL | TIL) exp
               |  (UPPIN | NERFIN) id (FROM exp TO exp | THRU exp)
  try          →  PLZ simplestmt br+ AWSUM THX br+ stmt+ O NOES br+ stmt+ MKAY
  exp          →  exp1 (ORELSE exp1)*
  exp1         →  exp2 (ANALSO exp2)*
  exp2         →  exp3 (BITOR exp)*
  exp3         →  exp4 (BITXOR exp4)*
  exp4         →  exp5 (BITAND exp5)*
  exp5         →  exp6 (relop exp6)?
  exp6         →  exp7 (shiftop exp7)*
  exp7         →  exp8 (addop exp8)*
  exp8         →  exp9 (mulop exp9)*
  exp9         →  prefixop? exp10
  exp10        →  literal
               |  var
               |  id '<:' exp* ':>'
               |  '[:' exp* ':]'
               |  '(' exp* ')'
  literal      →  N00B
               |  WIN
               |  FAIL
               |  intlit
               |  numlit
               |  charlit
               |  stringlit
  var          →  id args? ('!?' exp '?!' | '!!!' id)*
  args         →  '(:' exp* ':)'
  relop        →  PWNS | PWNED BY | SAEM AS | PWNS OR SAEM AS | PWNED BY OR SAEM AS | DIVIDZ
  shiftop      →  BITZLEFT | BITZRIGHT
  addop        →  UP | NERF | '~~'
  mulop        →  TIEMZ | OVR | LEFTOVR
  prefixop     →  NAA | BITZFLIP | SIEZ UV | B00LZOR | INTZOR | NUMZOR | KARZOR | YARNZOR

3  Semantics

3.1  Scripts

A script is a sequence of zero or more imports followed by one or more statements, all bracketed by HAI and KTHXBYE. Some statements, called declaration statements, declare entities (types, variables, and functions); others simply execute.

BTW This is a pretty interesting Roflkode program.
BTW It writes "hello, world" to standard output.

HAI
I HAS A YARN place ITZ "world"
YO greet (: place :)
I CAN MAEK YARN greet WIF UR YARN s
    HEREZ UR "hello, " ~~ s
SRSLY
KTHXBYE

3.2  Imports

An import is replaced with the contents of the named module at compile time.

In other words, the import acts like the #include directive in C — it brings in source code, not an intermediate compiled form.

3.3  Declarations

A declaration binds an identifier to an entity. There are seven kinds of declarations:

Each occurrence of an identifier is either a defining occurrence or a using occurrence. Using occurrences are legal only in the visible region of the declaration that declares the identifier. The visible region of a declaration is the declaration's scope minus any holes in the scope created by declarations of identifiers with the same name within (inner) scopes nested within.

This means the visible region may be discontinuous.

All of the declared identifiers that are part of the same statement sequence must be mutually distinct. Here "part of the same statement sequence" does not refer to any mixing of sequences with those nested inside of them.

HAI
I HAS A NUMBR x
I HAS A NUMBR x        BTW error
IM IN UR house
    I HAS A NUMBR x    BTW this is okay, diff from "outer" x
    GTFO house
LOL
KTHXBYE

Declarations within inner statement sequences hide declarations with the same name in an enclosing outer ones.

The identifiers declared as parameters of a function must be unique among themselves and the identifiers declared at the top level of the function's body.

I CAN bat UR INT x AN INT y
    I HAS A NUMBR z
    I HAS A YARN x                BTW error - x is already a parameter
    z PWNS 1.0?
        WERD
            I HAS A INT x ITZ 5   BTW this x is fine, however
    OIC
SRSLY

3.4  Types

Roflkode features the following types:

The types INT and NUMBR are arithmetic types. The arithmetic types and the type KAR comprise the primitive types. Types that are not primitive types are called reference types.

Bukkit types must be declared. Examples of bukkit type declarations:

TEH BUKKIT UV NUMBR x NUMBR y AKA point
TEH BUKKIT UV
  YARN brand
  INT size
  NUMBR price
  YARN LIST colors
  B00L used
AKA dress

3.5  Functions

A function has a name, an optional return type, a parameter list, and a body. The identifiers declared as parameters must all be unique. Functions without a return type in their declarations are called "void functions" or "procedures."

The signature of a function is a pair (t0, [t1, ..., tn]) where t0 is either the return type of the function or the string "void", and t1 through tn are, in order, the types of the parameters.

An expression list (e1, ..., en) is said to match a signature (t0, [t1, ..., tk]) whenever it holds that

Note that the return type has no effect on the definition of matching.

3.6  Variables

A variable is something that stores a value. Roflkode is a statically-typed language so all variables also have a type. Variables are either writable or not writable.

Variables are declared in a variable declaration or as a parameter declaration inside a function declaration. A variable declaration gives the variable name and optionally a type and an initializing expression. The initializing expression is always optional; the type my be omitted only if the type may be unambiguously inferred from the initializing expression.

I HAS A INT count ITZ 508
I HAS A NUMBR height            BTW ok, height initially undefined
I HAS A weight ITZ 140          BTW ok, weight has type INT
I HAS A p ITZ point <: 3 9 :>   BTW ok, p has type point
I HAS A v ITZ [: 3 1.0 8 2 :]   BTW ok, v has type NUMBR LIST
I HAS A cheezburger             BTW error - no can infer the type
I HAS A x ITZ x                 BTW error - no can infer the type here either

Parameter declarations must always have a type so there is no need to worry about inference in those cases.

The kinds of variable expressions used in referencing occurrences are:

3.7  Statements

A statement is code that is executed solely for its side effect; it produces no value.

3.8  Expressions

Each expression has a type and a value. The value of an expression with a reference type is either N00B or a reference to an object. Arrays and bukkits are therefore never manipulated directly, but only through references.

An expression e is type-compatible with a type t if and only if

An expression of type INT can appear anywhere an expression of type NUMBR is expected; in this case the integer value is implicitly converted to one of type NUMBR. The conversion must maintain the expression's value; this is always possible since the type NUMBR has 53 bits of precision.

The Roflkode expressions are:

4  Standard Libraries

The following modules are officially a part of the Roflkode language, and must therefore be included in all implementations.

tiem

The tiem library contains various operations for working with dates and times.

THEM CAN MAEK INT nao
  BTW The number of milliseconds since the epoch.

THEM CAN MAEK INT tmrw
    BTW The number of milliseconds since the epoch for midnight tomorrow
    BTW (in whatever timezone the agent is set to).

THEM CAN MAEK YARN date WIF UR INT epochtime AN INT tzoffset
    BTW Returns a (proleptic) ISO 8601 date string for the timestamp
    BTW epochtime using the time zone offset tzoffset

maf

The maf library contains various mathematical constants and functions.

BTW This is the standard Roflkode maf library.

THEM CAN MAEK NUMBR sqrt WIF UR NUMBR x
    BTW Returns the square root of x.

I HAS A NUMBR pi ITZ 4EVER acos -1.0
    BTW Convenient constant for π.

THEM CAN MAEK NUMBR sin WIF UR NUMBR x
    BTW Returns the sine of x.

THEM CAN MAEK NUMBR cos WIF UR NUMBR x
    BTW Returns the cosine of x.

THEM CAN MAEK NUMBR acos WIF UR NUMBR x
    BTW Returns the arc cosine of x (might be NaN).

THEM CAN MAEK NUMBR atan WIF UR NUMBR x AN NUMBR y
    BTW Returns the arctangent of the angle between the positive x-axis
    BTW and the line from the origin to (x,y)

THEM CAN MAEK NUMBR ln WIF UR NUMBR x
    BTW Returns the natural log of x.

THEM CAN MAEK INT confuzzle WIF UR INT x
    BTW Returns a random int between 0 (inclusive) and x (exclusive).

Usage examples:

HAI
CAN HAS maf?
YO sqrt(:400:)          BTW 20
YO pi                   BTW 3.141592653589793
YO sin(:-0.3:)          BTW -0.29552020666133955
YO cos(:2:)             BTW -0.4161468365471424
YO acos(:0.5:)          BTW 1.0471975511965976
YO atan(:5 -12:)        BTW 0.3947911196997615
YO atan(:4 0:)          BTW 1.5707963267948966
YO ln(:142341394:)      BTW 18.77373891323974
YO confuzzle(:6:)       BTW could be 0, 1, 2, 3, 4, or 5
KTHXBYE

txt

The txt library contains functions that operate on text strings.

THEM CAN MAEK YARN lc WIF UR YARN s
    BTW Returns the lower case version of s.

THEM CAN MAEK YARN uc WIF UR YARN s
    BTW Returns the upper case version of s.

THEM CAN MAEK INT pos WIF UR YARN s AN KAR c
    BTW Returns the leftmost position of c in s, or -1 if c does not appear in s.

THEM CAN MAEK YARN slice WIF UR YARN s AN INT start AN INT length
    BTW Returns the slice (substring) of s from position start that
    BTW contains (at most) length characters.

Usage examples:

HAI
CAN HAS txt?
YO lc (: "CheezBurger" :)       BTW "cheezburger"
YO uc (: "cheezBuRGER" :)       BTW "CHEEZBURGER"
YO pos (: "kthxbye" 'x' :)      BTW 3
YO pos (: "random" 'w' :)       BTW -1
YO slice (: "ROTFLMAO" 2 5 :)   BTW "TFLMA"
YO slice (: "ROTFLMAO" 4 20 :)  BTW "LMAO"
KTHXBYE

5  Future Work

The following are planned for Roflkode 2: