Language Design

So, you want to design your own language? Of course you do. Or perhaps you are taking a class and are being forced to create a programming language under penalty of a bad grade. What kinds of things do you need to know?

Prerequisites

It helps to be experienced. But it’s okay if you’re not—you can get lucky, too!

You should have a good sense of:

Study Existing Languages

It’s nice to have a feel for a variety of languages. Here are a few that are good to be familiar with (and the reasons why they are good to study):

Also check out this mini-encyclopedia of 70 languages.

Here are some excellent cross-language comparisons that help you to hone your understanding of how different syntaxes can express the same idea:

These are really good too:

Don’t Forget The Literature

Remember, many people have designed languages before you. They made mistakes. They came up with brilliant ideas. Many were wildly successful. Some never made it big. Some people have brought in years of research on how people think and learn to come up with principles for language (and environment) design.

You should learn form their experiences. Study classic papers. Read web essays. Here is a small sampling:

Think about the future:

Sketch and Prototype

Ready to strike out on your own? Do the following, in no particular order:

Then:

Remember that design is an iterative process, and creativity is enabled and enhanced with immediate feedback. So you should design with tools that allow you to experiment and test your ideas.

You should use the Ohm Editor.

Use the Ohm Editor

A great tool for experimenting with programming languages is the Ohm Editor.

ohm-editor.png

In the upper left panel, design your grammar. You can load/save from your browser’s local storage, and even publish gists to GitHub. In the upper right panel, enter test cases: both tests you want to succeed (thumbs up) and those you want to fail (thumbs down). The bottom panel is an interactive concrete syntax tree for the currently selected test case.

This tool will save you a lot of time.

Things To Think About

Okay I’m just going to rattle off a bunch of this that come into my head that you may or may not want to think about when you design your language:

Theoretical Concerns

Moving on from our particular example, let’s look at things from a more academic perspective and consider real-life programming language definitions. There seem to be three ways to produce a language definition:

Some language definitions are sanctioned by an official standards organization (like ISO, IEC, ECMA, ANSI, etc.) while some don’t even care about standardization.

Exercise: Create a bibliography of the official standards for the major programming languages.

Usually a language is defined by considering its:

Syntax

Structure

Semantics

Pragmatics

Pragmatics

Usage

Syntax

We have a lot of options for defining a syntax:

The first five forms are all equivalent. They describe exactly the class of context-free languages. PEGs capture a different set of languages, including some context-sensitive languages like $a^nb^nc^n$.

It turns out general parsing of CSGs is hard or inefficient. So we normally give a grammar for the context-free parts and leave the context-sensitive parts, like “variables must be declared before being used,” to the semantics.

Exercise: Find out if there are any known complexity bounds (upper or lower) for parsing a context sensitive language.

Syntax is usually (but not always) divided into:

Semantics

A language’s semantics is specified by mapping its syntactic forms (often abstract syntax tree fragments) into their meaning. Common approaches include:

A hugely important distinction is that between:

Example: You’ve probably heard the distinction between “static” and “dynamic” before. Recall that a statically-typed language in which type checking is done prior to program execution and a dynamically typed language is one in which type checking is done during program execution. Most languages do a little of both, but one or the other usually predominates.

Pragmatics

Pragmatics does not affect the formal specification of programming languages. However, pragmatic concerns must guide your design of a programming language, if you want it to be easy to read, easy to write, and able to be implemented efficiently. Pragmatics encompasses: