Language Design

So, you want to design your own language? Of course you do. Or perhaps you are taking a class and are being forced to create a programming language under penalty of a bad grade. What kinds of things do you need to know?

Designing a Language

Of course you want to design (and implement!) your own programming language! It’s fun. It’s creative. It’s empowering.

How do we do it? In a nutshell, the process involves:

  1. Answering some very important preliminary questions, including one which has you sketching out example programs in your language
  2. Defining, precisely, the syntax (structure) and semantics (meaning) of your language
  3. Creating a prototype implementation

In practice, steps 2 and 3 often happen together, because, while writing the compiler, you may be like “woah this is impossible” and then realize “oh shoot this part of the language wasn’t designed right!”

Okay let’s talk about the process.

Prerequisites

It helps to be experienced. But it’s okay if you’re not—you can get lucky, too!

You should have a good sense of:

Existing Languages

Study existing languages! It’s nice to have a feel for a variety of languages. Here are a few that are good to be familiar with (and the reasons why they are good to study):

Also check out this mini-encyclopedia of 70 languages.

Here are some excellent cross-language comparisons that help you to hone your understanding of how different syntaxes can express the same idea:

These are really good too:

Getting Ready

Remember, many people have designed languages before you. They made mistakes. They came up with brilliant ideas. Many were wildly successful. Some never made it big. Some people have brought in years of research on how people think and learn to come up with principles for language (and environment) design.

Don’t Forget The Literature!

You should learn form their experiences. Study classic papers. Read web essays. Here is a small sampling:

Think about the future:

Getting Started

Ready to strike out on your own? Here are some things to think about, in the form of a, you guessed it, a checklist:

Choosing a Starter Set of Features

Come up with a list of capabilities, or features. Make sure they enable programmers to express their creations by following the suggestions and principle in the Learnable Programming essay, including:

Exercise: Skim the Learnable Programming essay. Or, better, read the whole thing if when you have time.

What kind of questions might you have here? Here are some totally random ideas:

What did I just read?

Feeling like only about 20% of the questions above made any sense? Feeling like that vocabulary came out of nowhere? That’s fine for now. Learning about programming languages can be a never-ending lifelong journey, but you can use the questions you don’t quite understand now as a place to start some research.

Oh, I have a glossary you might find helpful.

From Features To Abstract Syntax

When you have a good idea of your language features, you’ll want to figure out a good way to organize them, structurally. This is known as your language’s abstract syntax. In an abstract syntax we don’t worry much about punctuation and parentheses and such microscopic details. We are interested in the overall structure. Here’s a look at abstract syntax trees in JavaScript:

When implementing a language, the abstract syntax trees are actual objects. And for a real programming language like JavaScript, there are a ton of possible AST nodes. The AST node types in JavaScript come from a specification called EsTree. You may wish to read the whole spec, but I've summarized the main interfaces here. See if it helps you in any way get a “big picture” of JavaScript.

Program sourceType:["script"|"module"] body:[Statement|ModuleDeclaration]
Statement
    Declaration
        FunctionDeclaration(Function) id:Identifier
        VariableDeclaration declarations:[VariableDeclarator] kind:("var"|"let"|"const")
        ClassDeclaration(Class) id:Identifier
    EmptyStatement
    DebuggerStatement
    ExpressionStatement expression:Expression
    BlockStatement body:[Statement]
    ReturnStatement argument:Expression?
    LabeledStatement label:Identifier body:Statement
    BreakStatement label:Identifier?
    ContinueStatement label:Identifier?
    IfStatement test:Expression consequent:Statement alternate:Statement?
    SwitchStatement discriminant:Expression cases:[SwitchCase]
    WhileStatement test:Expression body:Statement
    DoWhileStatement body:Statement test:Expression
    ForStatement init:(VariableDeclaration|Expression)? test:Expression? update:Expression? body:Statement
    ForInStatement left:(VariableDeclaration|Pattern) right:Expression body:Statement
        ForOfStatement await:boolean
    ThrowStatement argument:Expression
    TryStatement block:BlockStatement handler:CatchClause? finalizer:BlockStatement?
    WithStatement object:Expression body:Statement
Function id:Identifier? params:[Pattern] body:BlockStatement generator:bool async:bool
VariableDeclarator id:Pattern init:Expression?
SwitchCase test:Expression? consequent:[Statement]
CatchClause param:(Pattern?) body:BlockStatement
Expression
    ThisExpression
    Identifier(Pattern) name:string
    Literal value:(string|bool|number|Regexp|bigint)?
        RegExpLiteral regex:{pattern:string flags:string}
        BigIntLiteral bigint:string
    ArrayExpression elements:[(Expression|SpreadElement)?]
    ObjectExpression properties:[Property|SpreadElement]
    FunctionExpression(Function)
    ArrowFunctionExpression(Function) body:(BlockStatement|Expression) expression:bool
    UnaryExpression operator:UnaryOperator prefix:bool argument:Expression
    UpdateExpression operator:UpdateOperator argument:expression prefix:bool
    BinaryExpression operator:BinaryOperator left:Expression right:Expression
    AssignmentExpression operator:AssignmentOperator left:Pattern right:Expression
    LogicalExpression operator:LogicalOperator left:Expression right:Expression
    MemberExpression(ChainElement) object:(Expression|Super) property:Expression computed:bool
    ChainExpression expression:ChainElement 
    ConditionalExpression test:Expression consequent:Expression alternate:Expression
    CallExpression(ChainElement) callee:(Expression|Super) arguments:[(Expression|SpreadElement)]
    YieldExpression argument:Expression? delegate:bool
    TemplateLiteral quasis:[TemplateElement] expressions:[Expression]
    TaggedTemplateExpression tag:Expression quasi:TemplateLiteral
    NewExpression
    SequenceExpression expressions:[Expression]
    ClassExpression(Class)
    AwaitExpression argument:Expression
    ImportExpression source:Expression
    MetaProperty meta:Identifier property:Identifier
Class id:Identifier? superClass:Expression? body:ClassBody
ClassBody body:[MethodDefinition]
MethodDefinition key:Expression value:FunctionExpression kind:("constructor"|"method"|"get"|"set") computed:bool static:bool
SpreadElement argument:Expression
Property key:Expression value:Expression kind:("init"|"get"|"set") method:bool shorthand:bool computed:bool
    AssignmentProperty value:Pattern kind:"init" method:false
Pattern
    ObjectPattern properties:[AssignmentProperty|RestElement]
    ArrayPattern elements:[Pattern?]
    RestElement argument:Pattern
    AssignmentPattern left:Pattern right:Expression
Super
TemplateElement tail:boolean value:{cooked:string? raw:string}
ChainElement optional:boolean
enum UnaryOperator {"-"|"+"|"!"|"~"|"typeof"|"void"|"delete"}
enum UpdateOperator {"++"|"--"}
enum BinaryOperator {"=="|"!="|"==="|"!=="|"<"|"<="|">"|">="|"<<"|">>"|">>>"|"+"|"-"|"*"|"/"|"%"|"**"|"|"|"^"|"&"|"in"|"instanceof"}
enum AssignmentOperator {"="|"+="|"-="|"*="|"/="|"%="|"**="|"<<="|">>="|">>>="|"|="|"^="|"&="}
enum LogicalOperator {"||"|"&&"|"??"}
ModuleDeclaration
    ImportDeclaration specifiers:[ImportSpecifier|ImportDefaultSpecifier|ImportNamespaceSpecifier] source:Literal
    ExportNamedDeclaration declaration:Declaration? specifiers:[ExportSpecifier] source:Literal?
    ExportDefaultDeclaration declaration:(Declaration|Expression)
    ExportAllDeclaration source:Literal exported:(Identifier?)
ModuleSpecifier local:Identifier
    ImportSpecifier imported:Identifier
    ImportDefaultSpecifier
    ImportNamespaceSpecifier
    ExportSpecifier exported:Identifier

I’ve built an interactive application for you to explore AST generation. Please try it out!. Hopefully you can get a sense of the connection between concrete and abstract syntax by doing so.

Sketching

Now the real fun begins:

Do a lot of experimentation here! You will probably want to put creative effort into designing languages people like to use! What kind of syntax issues do they deal with? Dozens, actually, and we can’t cover them all. But how about a taste of just a few. We’ll peek at just a few issues that sometimes generate strong opinions.

Overall Phrase Structure

You will need to adopt a scheme for showing structure. The popular approaches are: Curly-brace (JavaScript, Java, C++, C#), Terminal-end (Ruby, Ada), Nested parentheses (Lisp, Clojure, Racket), Indentation (Python), Blocks (EToys, Scratch, Snap!), Pictures (Piet), Other (Haskell, Erlang, Prolog).

The important idea here is that a single abstract syntax can be realized with many different concrete syntaxes. For example, the AST:

little-ast.png

represents each of the following (and more!):

while y - 5 == 3:
    print(x * (3 + y))

while y - 5 == 3 {
  print(x * (3 + y))
}

while y - 5 == 3 loop
  print(x * (3 + y))
end

(while (= (- y 5) 3)
    (print (* x (+ 3 y))))
Exercise: The last syntax is used in Clojue, Lisp, Scheme and others. Note that in some ways, this is the abstract syntax. How so?

Delimiters

How to separate one construct from another is a really big issue in syntax design, believe it or not. We can identify two main classes of languages: those in which newlines are significant and those in which they are not.

“Insignificant” Newlines

In many languages, newlines are just like any other whitespace character (except for minor exceptions such as single-line comments and single-line string literals. Then, unless you have an S-Expression-based syntax as in LISP, Scheme, and Clojure, you’ll need semicolons to terminate (or separate) statements. This means you can (but shouldn’t) write code like:

#define ZERO 0
    unsigned  gcd(   unsigned   int  // Euclid's algorithm
      x,unsigned   y) {   while ( /* hello */  x>   ZERO
   ){unsigned temp=x;x=y   %x;y  = temp ;}return

   y ;}

“Significant” Newlines

Where you place your newlines matters greatly in, let’s see, Assembly languages, Python, Ruby, JavaScript, Elm, Haskell, Go, Swift, and yes, many others. The rules can get pretty technical.

Python scripts are defined as sequences of logical lines, delimited by the token NEWLINE. A statement may not cross logical lines, except in the case of compound statements in which each constituent simple statement ends with a NEWLINE. Logical lines are made up of one or more physical lines according to line joining rules. Lines are implicitly jointed within parentheses, brackets, or braces; lines can be explicitly joined by ending with a backslash. These rules are somewhat exclusive of comments and string literals.

Ruby looks at the end of each line and says “well if up to here it looks like we’ve completed a statement the we have.” This means you have to be careful where you break lines:

puts 5
  + 3
puts 5 +
  3

prints 5 then 8.

Exercise: Why?

“Possibly Significant” Newlines

JavaScript requires most statements to be terminated by semicolons, but the compiler will put one in for you if it looks like you might have missed one. The rules by which this automatic semicolon insertion (ASI) is done have to be learned and they might be hard to remember.

If you are going to be a serious JavaScript programmer, you need to learning the rules of ASI whether you choose to use semicolons or not.

Exercise: Research the famous Rules of Automatic Semicolon Insertion. Which statements are supposed to be terminated by a semicolon? When is a semicolon inserted? Give four examples of how writing JavaScript in a "free-form" manner is impossible because of semicolon insertion.
Exercise: Get your ASI Certification

Some people feel very strongly whether to use or not to use semicolons:

zzzzing.png

Function Calls

Most programming languages have functions. Seriously. But there are a lot of ways to work them into your design. Basic questions include: Must functions have exactly one argument, or zero or more arguments? Parens or no parens? Positional or keyword arguments? Argument labels? If no arguments, can we omit parentheses?

You can play around and see what you can come up with:

push(myStack, 55)
push myStack 55
[push myStack 55]
(push myStack 55)
push(on: myStack, theValue: 55)
push(theValue: 55, on: myStack)
push on:myStack theValue:55
push({ on: myStack, theValue: 55 })
push { on: myStack, theValue: 55 }
push({ theValue: 55, on: myStack })

You might want to consider an ultra-low precedence function application, like they have in Haskell and F#:

sum (filter even (map square a))
sum $ filter even $ map square $ a
sum <| filter even <| map square <| a
a |> filter even |> map square |> sum

The flip side of function calls is function definitions. You’re likely familiar with default parameters, and rest parameters. Python has cool mechanisms for requiring arguments to be positional or keyword, based on the definition. Examples:

def sqrt(x, /)
def line(*, x1, x2, y1, y2, width, style, color)
def f(a, b, /, c, d, *, e, f)

Syntactic Sugar

Syntactic sugar refers to forms in a language that make certain things easier to express, but can be considered surface translations of more basic forms.

This is best understood by example. There are zillions of examples out there. Here are a few. (Disclaimer: Some of these are just examples I made up and are not part of any real language.)

ConstructDesugared FormDescription
x += nx = x + nCompound assignment
a + boperator+(a, b) or
"+"(a, b) or
__add__(a, b)
Common in languages that allow overloading
a[i]*(a + i)(C, C++ pointer arithmetic) And i[a] works too!
p -> x(*p).x(C, C++) Field of struct being pointed to
ff()Some languages let you leave off parentheses in calls with no arguments
f xf(x) or
x.f()
Some languages let you leave off parentheses in calls with one argument
x op yop(x, y) or
x.op(y)
Some languages let you leave off parentheses in calls with two arguments
let x=E1 in E2(x => E2)(E1)Let-expression (in functional languages)
(E1 ; E2)(() => E2)(E1)Expression sequencing (in eager functional languages)
r = [s
    for x in a
    if e]
r = []
for x in a:
  if e:
    r.add(s)
List comprehension
x orelse yif x then x else y(Standard ML) short-circuit disjunction
x andalso yif x then y else x(Standard ML) short-circuit conjunction
[x, y, z]x :: y :: z :: nilLists in Standard ML
"a${x}b""a" + x + "b"String interpolation
Exercise: Find some more examples.

When the sugared form is completely gratuitous or actually makes the code less readable, you sometimes hear the term syntactic syrup or syntactic saccharin.

Syntactic Salt

Here’s the definition from The New Hacker’s Dictionary:

The opposite of syntactic sugar, a feature designed to make it harder to write bad code. Specifically, syntactic salt is a hoop the programmer must jump through just to prove that he knows what’s going on, rather than to express a program action. Some programmers consider required type declarations to be syntactic salt. A requirement to write “end if”, “end while”, “end do”, etc. to terminate the last block controlled by a control construct (as opposed to just “end”) would definitely be syntactic salt. Syntactic salt is like the real thing in that it tends to raise hackers’ blood pressures in an unhealthy way.

Candygrammars

Some people love verbose code, because explicit is better than implicit. But if you are language designer, be pragmatic: there is such a thing as code that is too verbose. What about trying to make the code like human language? Here’s an example in Hypertalk (taken from Wikipedia):

on mouseDown
  answer file "Please select a text file to open."
  if it is empty then exit mouseDown
  put it into filePath
  if there is a file filePath then
    open file filePath
    read from file filePath until return
    put it into cd fld "some field"
    close file filePath
    set the textStyle of character 1 to 10 of card field "some field" to bold
  end if
end mouseDown

An example from Manatee:

to get the truth value prime of whole number n:
    return no if n < 2
    for each d in 3 to n - 1 by 2:
        return no if d divides n
    end
    return yes
end
for each k in 1 to 100:
    write k if prime(k)
end

In practice this kind of verbosity is worse than it sounds. Here’s what the New Hacker’s Dictionary has to say about this:

candygrammar /n./ A programming-language grammar that is mostly syntactic sugar; the term is also a play on “candygram.” COBOL, Apple’s Hypertalk language, and a lot of the so-called “4GL” database languages share this property. The usual intent of such designs is that they be as English-like as possible, on the theory that they will then be easier for unskilled people to program. This intention comes to grief on the reality that syntax isn’t what makes programming hard; it’s the mental effort and organization required to specify an algorithm precisely that costs. Thus the invariable result is that candygrammar languages are just as difficult to program in as terser ones, and far more painful for the experienced hacker.

[The overtones from the old Chevy Chase skit on Saturday Night Live should not be overlooked. This was a "Jaws" parody. Someone lurking outside an apartment door tries all kinds of bogus ways to get the occupant to open up, while ominous music plays in the background. The last attempt is a half-hearted "Candygram!" When the door is opened, a shark bursts in and chomps the poor occupant. There is a moral here for those attracted to candygrammars.]

Terseness

Some languages pride themselves on doing a whole lot with few characters:

An example from Ruby (do you see what this does?):

c = Hash.new 0
ARGF.each {|l| l.scan(/[A-Z']+/i).map {|w| c[w.downcase] += 1}}
c.keys.sort.each {|w| puts "#{w}, #{c[w]}"}

An example from APL (The 99 bottles of beer program taken from Rosetta Code):

bob  ←  { (⍕⍵), ' bottle', (1=⍵)↓'s of beer'}
bobw ←  {(bob ⍵) , ' on the wall'}
beer ←  { (bobw ⍵) , ', ', (bob ⍵) , '; take one down and pass it around, ', bobw ⍵-1}
↑beer¨ ⌽(1-⎕IO)+⍳99

Here’s APL again, with an expression to find all the prime numbers up to R:

(~R∊R∘.×R)/R←1↓⍳R

Some people love terse, concise code, because it says only what it needs and reduces the cognitive load, leaving you with less useless noisy syntax to learn. But if you are language designer, be pragmatic: there is such a thing as code that is too terse. Unless...that’s your goal....

Golfing Languages

Golfing languages take terseness to the next level. A golfing language is a kind of esoteric programming language (a non-practical language created to experiment with weird ideas, be hard to program in, or be humorous) that allows programs to be written in an insanely small number of characters (or bytes).

Here are some CJam programs:

Here are some Pyth programs (taken from the documentation):

Exercise: Find a bunch more examples of CJam and Pyth programs. Try them out. You can run them both at TIO. For Pyth, there’s also a hosted interpreter with a Cheat Sheet.
Exercise: Try out Stax.

Prototyping

Unless your language has a trivial syntax (as have most golfing languages), you should use the Ohm Editor. This is an amazing tool for experimenting with programming languages.

Ohm Editor Screenshot

In the upper left panel, design your grammar. You can load/save from your browser’s local storage, and even publish gists to GitHub. In the upper right panel, enter test cases: both tests you want to succeed (thumbs up) and those you want to fail (thumbs down). The bottom panel is an interactive concrete syntax tree for the currently selected test case.

This tool will save you a lot of time.

It is an essential component of your language design toolbox.

How essential is it?

Unless your language is trivial, tools like the Ohm Editor are very important! design is an iterative process, and creativity is enabled and enhanced with immediate feedback. So you should design with tools that allow you to experiment and test your ideas.

That said, it is true that in practice, many production-level compilers do not use Ohm or related tools like ANTLR, Bison, etc.—they do everything by hand.

CLASSWORK
Let’s do a code-along for some language we’ll all make up on the fly together.

During the code-along, bits of Ohm will be introduced as needed. Later, we’ll learn all the details of Ohm.

The language we will design together will be completely unplanned. Let’s see how it all emerges.

The Ohm Editor only gets you part way

The Ohm Editor will help you int the design your language’s syntax. To go farther, and write up an interpreter or compiler, you will need to use more of Ohm.

CLASSWORK
We’ll continue our code-along to produce an interpreter. Again, the code-along will not be a formal introduction to Ohm; we will introduce features as needed. While we will have a lecture on Ohm later in the course, the Ohm documentation is quite good, and there is a helpful Discord community, too.

Defining Your Language

Looking at existing language definition, you many find different artifacts:

A typical definition will have three parts:

Context-Free SyntaxContext-Sensitive Syntax /
a.k.a. Static Semantics
Dynamic Semantics
What are the structural entities (e.g., declarations, expressions, statements, modules) and how are they fit together, perhaps with punctuation? What are some of the non-structural rules that define a legal program (e.g., type checks, argument-parameter matching rules, visibility rules, etc.)? What does a program do? What effects do each of the forms of a well-structured, legal program have on the run-time environment?

Why are there generally three parts instead of two (i.e., just syntax and semantics)? It basically comes down to this. While everyone might agree that the following is structurally malformed:

#<include > stdio.h
main() int }
    printf["Hello, world!\n");]
{

the following program looks good in terms of “structure” but it’s actually meaningless since it violates a contextual rule that says identifiers must be declared before use:

int main() {
    printf("%d\n", x);
}

The latter program is generally considered to have semantic errors, but they are static semantic errors because they can be detected by a compiler before the program is ever run. This is in contrast to a dynamic semantic error, which can only be detected at run time.

Example: You’ve probably heard the distinction between “static” and “dynamic” before. Perhaps you know that “static typing” involves type checking is done prior to program execution and “dynamic typing” involves checking during program execution. Most languages do a little of both, but one or the other usually predominates. Sometimes you get a good deal of both: in TypeScript for example, you have a set of static types which are different than the eight dynamic types. Fun.
Not everyone likes the term “static semantics”

Some people prefer to use the term “syntax” for everything that is checkable before execution, leaving the term “semantics” to deal exclusively with run time meaning. This is fine. If you go this route, you would then split “syntax” into “context-free syntax” (from the grammar) and “context-sensitive” syntax (that comes from the contextual rules).

Typically, languages are defined with precise, formal notation for the context-free syntax, such as Ohm grammars, Syntax diagrams, CFGs, BNF, EBNF, ABNF, PEGs (all things we will see later in the course), while the semantics are normally given in prose, though carefully written. That said, there do exist formal notations for semantics, among them Denotational, Operational, and Axiomatic Semantics.

We should mention there’s one other way to define semantics, called “Compile it and run it”. Here the language designer writes their own compiler or interpreter, and the meaning of a program is whatever is produced by this implementation. Thus the compiler or interpreter itself defines the language, and thus the compiler correctness problem is trivial—the compiler is by definition always correct)

Details coming soon

We’ll study formal syntax and formal semantics later in the course. We’ll also spend a lot of time on the border between what is easy to specify in a “context-free” syntax and what requires context.

Examples

Many well-known programming languages have published, formal definitions. You can find them by searching the web.

For this class, we will be studying five little languages crafted especially to help you in your stufy of language design and implementation. We will be studying them in order, building upon previous languages and learning new things as we progress. This will allow us to introduce the huge topic of language processing in a practical setting, writing real compilers for real languages. The languages are Astro, Bella, Carlos, Dax, and Ekko.

Astro

Astro logo

We all begin as a white belt in every new endeavor. We will start, then, with a very simple, almost trivial, language. All it has are numbers, arithmetic operators, variables, and a few pre-defined constants and functions. Here’s an example program:

// A simple program in Astro
rAd1uS = 55.2 * (-cos(2.8E-20) + 89) % 21;
the_area = π * rAd1uS ** 2;
print(hypot(2.28, 3 - rAd1uS) / the_area);    // woohoo 👻

There are only two kinds of statements: assignments and print statements. Expressions include numbers, variables, function calls, arithmetic expressions with +, -, *, /, %, and **, and can be parenthesized. We will cover the official definition of the language, and use the language to motivate a formal study of syntax.

When studying this language, we’ll learn about the separation of context-free syntax from contextual rules. Contextual rules include such things as: having to match the number of arguments in a call with the number of defined parameters, rudimentary type checking, and not allowing assignments to read-only variables.

As Astro will be our first language, we will use it as a case study to learn the amazing Ohm Language Library to build an interpreter. The details of how the interpreter is constructed are covered in the course notes on Ohm.

Bella

Bella logo

Our second language has a few things Astro does not: a richer set of operators, variable declarations, and user-defined functions. The contextual rules for Bella are much richer than that of Astro, since we now have actual declarations, and scope! Here’s an example program:

let dozen = 12;
print dozen % 3 ** 1;
function gcd(x, y) = y == 0 ? x : gcd(y, x % y);
while dozen >= 3 || (gcd(1, 10) != 5) {
  dozen = dozen - 2.75E+19 ** 1 ** 3;
}

We will first look at the official specification, introducing all sorts of interesting concepts. Then we’ll study a real, actual Bella compiler. Here we learn about designing and architecting a compiler, building the components (analyzer, optimizer, and generator), and getting 100% test coverage. The compiler source code is on GitHub.

Carlos

Carlos logo

In our third language, we encounter arrays, structs, and optionals: our first language that is basically useful. If you are taking the compiler course for which these notes were written, Carlos is a good example of the minimal language complexity you will need for your term project.

const languageName = "Carlos";

function greeting() {
  return random(["Welcome", "こんにちは", "Bienvenido"]);
}

print("👋👋👋");
repeat 5 {
  print(greeting() + " " + languageName);
}

We’ll be visiting the language’s official specification and a compiler on GitHub. The compiler (of course!) uses the Ohm Language Library.

There’s no separate page of notes describing the compiler. After studying the Astro and Bella compilers, you’ll be able to find your way around the code on GitHub (there’s documentation). And don’t worry, it’s development and usage will be covered in class, and the teaching staff can help you with any questions you might have.

Dax

Dax logo

Language number four is a functional language, that is, a language with no assignments! The only bindings of names to entities happens when passing arguments to parameters, though there is that famous let declaration which nicely sugars a function call: it’s nicer to say let x = 5 in x * y end than {x => x * y}(5).

Here’s a sample program to get the feel for the language:

let
  gcd = {x => {y ==> y == 0 ? x : gcd y (x % y)}};
  z = 5
in
  [1, 3, z] |> filter {x => x > 2} |> map {x => x ** 2} |> print
  then
  "hello" |> substring 2 5 |> print
  then
  print (gcd 33 99)   // This is fine, you don't HAVE to use |>
end

If you have not yet seen languages with the awesome |> operator, here’s your chance to be wowed.

We will discuss the language design and compiler later in the course.

Ekko

Ekko logo

Our fifth language, Ekko (starting with E like Erlang and Elixir, which greatly influence it), is a kind of experimental language that deals quite a lot with time.

Ekko mixes styles of asynchronous programming from JavaScript and the distributed process-orientation of Erlang and Elixir: Ekko’s future objects are based on JS promises, and its processes communicate via messages as in Erlang. There’s also quite a bit more temporal goodness, including timeout calls, value histories with time travel (influenced by older versions of Elm), and even explicit parallelism.

As of 2024-01-01, the language is not even designed. The TAs and I will probably work on it throughout the course.

More Examples

Students in previous iterations of the course have designed and implemented their own languages. Here’s a sampling of language over the past decade or so. (Please note that there is a very wide variety of quality in these examples. They are presented here without any evaluative commentary as to whether they are suitable building blocks for one’s own project.)

Recall Practice

Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.

Summary

We’ve covered:

  • What to know before undertaking language design
  • Pointers to excellent articles and essays about language design
  • Two videos (one by Alan Kay, one by Bret Victor) on languages and language design
  • How to begin the language design process
  • Questions to ask while designing your language
  • Things to think about when sketching during design
  • The use of the Ohm Editor in language prototyping
  • Language Definition