Standard ML

The ML languages (CAML, Standard ML, OCaml, F#) might surprise you. Lots of great stuff here.

Logo credit

Overview

ML was, long ago, a small programming language. Today, ML is the name for a family of languages that include Standard ML (a.k.a SML), Objective CAML (a.k.a OCaml), F#, LazyML, Alice, and Elm.

The original ML and its immediate descendants were never really widely used, but they have been enormously influential. Many popular languages have adopted several of ML’s amazing innovations.

You’ll find some good stuff here:

Standard ML:

More reasons: Read this overview of features.

Getting Started

The traditional “Hello, world” program:

hello.sml
print "Hello, World\n";

Running Standard ML Programs

There are many implementations of Standard ML, including Moscow ML, Standard ML of New Jersey, MLton, Poly/ML, SML.NET, MLJ and ML Kit. These notes only cover Moscow ML.

You can run the code by dropping it into an interactive shell, or by using a compiler to produce executables.

The Interactive Shell

Just type your declarations and expressions into the interactive shell and the system will respond. Here is an example session (assuming the file hello.sml contains the code above):

$ mosml
Moscow ML version 2.10
Enter `quit();' to quit.
- 4;
> val it = 4 : int
- "hello " ^ " there";
> val it = "hello  there" : string
- fun fact n = if n = 0 then 1 else n * fact(n - 1);
> val fact = fn : int -> int
- fact 5;
> val it = 120 : int
- val x = 12 and y = ~5;
> val x = 12 : int
  val y = ~5 : int
- x - y * 2;
> val it = 22 : int
- use "hello.sml";
[opening file "hello.sml"]
Hello, World
> val it = () : unit
[closing file "hello.sml"]
> val it = () : unit
- length [4, 3, 8, 7, 3];
> val it = 5 : int
- quit();

Note how nice the system is.... If you don’t specify types, it figures them out for you (as long as what you wrote makes sense). If you type in an expression directly, it turns it into a declaration binding the result to the special name it.

Compiling

The Moscow ML compiler is called mosmlc. It won’t compile our “Hello, World” program because it doesn’t like standalone expressions: it requires declarations:

helloworld.sml
val _ = print "Hello, World\n";

Then on Windows you can enter:

> mosmlc helloworld.sml -o helloworld.exe

and on Unix you can enter:

$ mosmlc helloworld.sml -o helloworld
and you have your executable. To compile and run, of course, that would be:
> mosmlc helloworld.sml -o helloworld.exe && helloworld
or:
$ mosmlc helloworld.sml && ./a.out

Here’s a slightly more interesting program, illustrating a bit more of the language:

reverseargs.sml
(*
 * A little program that writes the command line arguments from last
 * to first, to standard output. It is longer than necessary because
 * it is a teaching example.
 *)

(* Returns the reversal of a list. Warning: wheel reinvention. *)

fun reverse [] = []
  | reverse (h :: t) = reverse t @ [h];

fun concat_space s = s ^ " ";

(* Prints each command line arg, suffixed with a space. *)

val _ =
  let
    val args = CommandLine.arguments()
  in
    map (print o concat_space) (reverse args);
    print "\n"
  end;

Program Structure

A program is a sequence of declarations. Declarations begin with:

Most program entities (types, exceptions, values, etc.) are defined in structures. For example, the structure Timer contains the type cpu_timer and the functions startCPUTimer and checkCPUTimer, among other things, so you need to write Timer.cpu_timer and Timer.checkCPUTimer, etc. But entities in the package General (which contains types like int, string, and real and functions like <=, not, and mod do not have to be qualified with the package name.

timerexample.sml
(*
 * Trivial example of timing.
 *)

fun count(n) =
  if (n > 0) then count(n - 1) else ();

val _ =
   let
     val t = Timer.startCPUTimer()
   in
     count(10000000);
     print (Time.toString(#usr(Timer.checkCPUTimer(t))) ^ "\n")
   end;

Lexical Matters

Reserved Words

abstype and       andalso   as     case  do       datatype   else
end     eqtype    exception fn     fun   functor  handle     if
in      include   infix     infixr let   local    nonfix     of
op      open      orelse    raise  rec   sharing  sig        signature
struct  structure then      type   val   where    with       withtype
while   (         )         [      ]     {        }          ,
:       :>        ;         ...    _     |        =          =>
->      #

Identifiers

Identifiers are either:

But reserved words are excluded. And notice no mixing of alphanumeric and symbolic characters! You can’t have the identifier |-o-|, though you can have |-*-|.

Comments

Between (* and *) and they can nest!

Types

Here are some of the types:

Name Examples Notes
unit () This is the only value of this type
bool true
false
These are the only two values of the type
int 0
~4
999999
0xfa33bc
~0x1fcf5
Unary negation is a tilde, so as not to be confused with -, which denotes binary subtraction
word 0w0
0w4
0wx4ccab
real 0.7
~0.7
3.32E5
3E~7
~3E~7
Note the tilde for negation
char #"s"
string "hello"
α -> β fn x => x + 1
fn x => fn y => 1 - x / y
The type of functions taking an argument of type α and returning a value of type β
record types {name="Masha", age=25}
{re=6.2, im=~3.0}
{1=4, 2="d", 3=false}
A built-in type former
α * β (4, "d", false) Abbreviation for records whose fields are numbers starting with 1
α list nil
4::3::12::5::nil
[4, 3, 12, 5]
[ ]
α option SOME 2
NONE
exn Div
Fail "dunno"
This is a very special datatype because new constructors can be added at runtime
α ref ref 2 The gateway into mutable state
α frag QUOTE "hello"
ANTIQUOTE 9
Powerful stuff

Type Variables

Standard ML, like all the MLs takes type inference to a whole new level, compared to languages like C#, Go, Rust, and Swift. Let’s learn by example. Watch how SML infers these types, and pay attention to the introduction of type variables (you’ll know them when you see them):

[true, true, false]                (*  bool list                       *)
[]                                 (*  'a list                         *)
fn s => length s * 2               (*  'a list -> int                  *)
fn x => (15.2, x)                  (*  'a -> real * 'a                 *)
{x = 3, y = 5}                     (*  {x : int, y : int}              *)
fn x => fn y => fn z => (z, x, y)  (*  'a -> 'b -> 'c -> 'c * 'a * 'b  *)
fn (x, y) => x = y;                (*  ''a * ''a -> bool               *)
(fn s => SOME s)[]                 (*  'a list option                  *)

Types with a double apostrophe prefix are equality types. Equality doesn’t work on all types (what does it mean for two functions to be equal, right?). For example, the type int->int will match (unify) with 'a, but it will not unify with ''a:

$ mosml
Moscow ML version 2.10
Enter `quit();' to quit.
- fun addZero x = (x, 0);
> val 'a addZero = fn : 'a -> 'a * int
- addZero(fn n => n * 3);
> val it = (fn, 0) : (int -> int) * int
- fun equal(x, y) = x = y; 
> val ''a equal = fn : ''a * ''a -> bool
- equal(fn n => n * 3, fn n => n * 3);
! Toplevel input:
! equal(fn n => n * 3, fn n => n * 3)
!          ^^^^^^^^^^
! Type clash: match rule of type
!   'a -> 'b
! cannot have equality type ''c

Interested in the details of the SML type system? Read about the Hindley-Milner Type System. It’s pretty cool.

Defining Your Own Types

The type declaration does not make a new type, just a type abbreviation:

type intpair = int * int;
type person = {name: string, birthday: date, weight: real};
type simplefun = real -> real;
type text = string;
type intset = bool list;
type coin_flipping_results = bool list;

Here coin_flipping_results and intset are exactly the same type.

You need to use datatype or abstype to create a new type, distinct from any other. Datatypes are defined with constructors:

datatype weekday = Monday | Tuesday | Wednesday | Thursday | Friday;

datatype file_descriptor = Stdin | Stdout | Stderr | Other of int;

datatype 'a tree = Empty
                 | Node of 'a * 'a tree * 'a tree;

datatype Shape = Circle of {radius: real}
               | Rectangle of {height: real, width: real}
               | Polygon of (real * real) list;

abstype 'a stack = Data of 'a list
with
    exception Underflow;
    val newStack = Data []
    fun push (Data s) x = Data (x::s)
    fun pop (Data []) = raise Underflow
      | pop (Data (_::t)) = Data t
    fun top (Data []) = raise Underflow
      | top (Data (x::t)) = x
    fun listToStack s = Data s
    fun stackToList (Data s) = s
end;

The abstract type is just a datatype that hides its constructors. But its probably better to use structures anyway.

Functions

Functions are just regular values who happen to have a type like 'a -> 'b. Functions must have exactly one parameter and exactly one return type. The syntax of a function is:

fn pattern => exp ('|' pattern => exp)*

Function calls are straighforward:

(fn x => x + 6) 20;
(fn [] => false | (h::t) => true) [4, 5, ~3];
(fn _ => 0) (4, "dog", [4.5, 1.1], (1,1));
(fn 5 => 0 | x => x) 5;
(fn 5 => 0 | x => x) 12;

Note that each match is tried in order. Order matters, so be careful.

Exercise: Why is it not a problem that functions can have one and only one parameter, and one and only one return value? How do we specify zero parameters? Ten parameters? No return values? Three return values?

Defining Functions

Because functions are values, you declare them just like any other value. You can take advantage of the fact that each declaration can use declarations that precede it:

val two = 2;
val addSix = fn x => x + two + 4;
val square = fn x:real => x * x;
val magnitude = fn (x,y) => Math.sqrt(square x + square y);
val rec fact = fn n => if n = 0 then 1 else n * fact (n-1);

You can use fun for convenience (it’s basically val rec):

fun addSix x = x + 6;
fun square (x:real) =  x * x;
fun magnitude (x,y) = Math.sqrt(square x + square y);
fun fact n = if n = 0 then 1 else n * fact (n-1);

You can use multiple patterns with fun too; it sure makes writing recursive functions easy. And pattern-based definitions are much preferred over if-expressions.

fun fact 0 = n
  | fact n = n * fact(n - 1);

fun pow (0, _) = 1.0
  | pow (n, x) = x * pow(n - 1, x);

fun size Empty = 0
  | size (Node (x, left, right)) = 1 + size left + size right;

fun preorder Empty = []
  | preorder (Node (x, left, right)) = [x] @ preorder left @ preorder right;

Currying and Uncurrying

The functions fn (x,y) => x + y and fn x => fn y => x + y are similar. The former is the uncurried version; the latter is curried. Curried functions allow “partial application”:

fun add x y = x + y;
fun addSix = add 6;
add 6 5;                    (* returns 11 *)
addSix 5;                   (* also returns 11 *)

Higher Order Functions

A higher order function is one that takes a function as a parameter and/or returns a function as its result. The curried plus function above is a higher order function, because it takes in an integer and returns a function.

Operators

Operators are really just functions that are given precedence and fixity in a infix, infixr or nonfix directive. These are some of the infix operators defined in the standard library. (Note that many of the built-in operators are overloaded, so the specification has “fake types” — wordint means either int, word or word8; num means wordint or real; numtxt means num, char or string.)

Precedence Id Type Description Associativity
7 * num * num -> num product L
/ real * real -> real floating-point quotient L
div wordint * wordint -> wordint quotient (rounded toward -infinity) L
mod wordint * wordint -> wordint remainder (of div) L
6 + num * num -> num sum L
- num * num -> num difference L
^ string * string -> string concatenation L
5 :: 'a * 'a list -> 'a list push to head of list R
@ 'a list * 'a list -> 'a list append lists R
4 = 'a * 'a -> bool equal to L
<> 'a * 'a -> bool not equal to L
< numtxt * numtxt -> bool less than L
<= numtxt * numtxt -> bool less than or equal to L
> numtxt * numtxt -> bool greater than L
>= numtxt * numtxt -> bool greater than or equal to L
3 := 'a ref * 'a -> unit assignment L
o ('b*'c) * ('a*'b) -> ('a*'c) composition L
0 before 'a * 'b -> 'a return first argument L

And you can define your own:

fun |-*-| (x, y) = 2 * x + 3 * y;
infix 5 |-*-|;
10 |-*-| 4;                            (* returns 32 *)
1 |-*-| 2 |-*-| 3;                     (* returns 25 *)
1 |-*-| (2 |-*-| 3);                   (* returns 41 *)
5 + 3 |-*-| 6 before 8;                (* returns 34 *)

You can abuse the built-in operators:

- infix 2 *;
> infix 2 *
- 4 + 8 * 1 + 4;
> val it = 60 : int
- val op+ = op-;
> val + = fn : int * int -> int
- 5 + 2;
> val it = 3 : int

Modules

Remember, useful programming languages provide modules for:

Structures

ML’s module system is pretty sophisticated. The basic element is the structure:

structure Set =
struct
  datatype 'a set = Items of 'a list;
  val empty = Items [];
  fun add (Items xs) x =
    if List.exists (fn y => y = x) xs then Items xs else Items(x::xs);
  fun remove (Items xs) x =
    Items (List.filter (fn y => y <> x) xs);
  fun cardinality (Items xs) = length xs;
  fun contains (Items xs, x) = List.exists (fn y => y = x) xs;
end;

The contents are accessible with dot-notation (no surprise, probably):

val s = Set.add Set.empty 7;
val s = Set.add s 8;
val s = Set.remove s 2;
Set.cardinality s;
Set.contains (s, 7);
Set.Items [8,8,4];    (* ICK!! That should probably have been hidden *)

Signatures

The previous structure exposes too much information. We can use a signature to restrict the view. We should write:

signature SET =
sig
  type 'a set;
  val empty: 'a set;
  val add: ''a set -> ''a -> ''a set;
  val remove: ''a set -> ''a -> ''a set;
  val cardinality: 'a set -> int;
  val contains: ''a set * ''a -> bool;
end;

Then we say:

structure Set :> SET =
struct
  datatype 'a set = Items of 'a list;
  val empty = Items [];
  fun add (Items xs) x =
    if List.exists (fn y => y = x) xs then Items xs else Items(x::xs);
  fun remove (Items xs) x =
    Items (List.filter (fn y => y <> x) xs);
  fun cardinality (Items xs) = length xs;
  fun contains (Items xs, x) = List.exists (fn y => y = x) xs;
end;

Now Set.Items doesn’t exist! We can now even say:

signature READ_ONLY_SET =
sig
  type 'a set;
  val empty: 'a set;
  val cardinality: 'a set -> int;
  val contains: ''a set * ''a -> bool;
end;

structure ReadOnlySet :> READ_ONLY_SET = Set;

now ReadOnlySet.empty exists, but ReadOnlySet.remove does not.

Functors

If we change the set representation from a list to a binary search tree, we would have to specify how the set elements were to be ordered. This less-than relation is an input to the structure. The way to parameterize structures in SML is with functors:

functor MakeSet (eqtype t; val lt: t * t -> bool):
  sig
    type ''a set
    val empty: t set
    val add: t set -> t -> t set
    val remove: t set -> t -> t set
    val cardinality: t set -> int
    val contains: t set * t -> bool
  end =
struct
  datatype ''a set = Empty | Node of ''a * ''a set * ''a set
  val empty = Empty;
  fun add Empty x = Node (x, Empty, Empty)
    | add (s as Node(y, left, right)) x =
        if (x = y) then s
        else if lt(x, y) then Node(y, add left x, right)
        else Node(y, left, add right x);
  fun remove s x = raise (Fail "remove not implemented");
  fun cardinality Empty = 0
    | cardinality (Node(_, left, right)) =
      1 + cardinality left + cardinality right;
  fun contains (Empty, x) = false
    | contains (Node(y, left, right), x) =
        if lt(x, y) then contains(left, x)
        else if lt(y,x) then contains(right, x)
        else true;
end;

Then:

structure IntSet = MakeSet(type t = int; val lt = op<);

val s = IntSet.add IntSet.empty 8;
IntSet.contains(s, 7);
...

The Standard Basis Library

The following structures comprise the Standard Basis Library

StructureDescription
Arraymutable constant-time-access arrays
Array2two-dimensional arrays
Binarysetbinary tree implementation of finite sets
BoolBooleans
Bytecharacter-byte conversion
Charcharacters
CharArrayarrays of characters
CharVectorvectors of characters (= strings)
CommandLineprogram name and arguments
Datemanipulation of calendar dates
FileSysinteraction with the file system
Generalvarious top-level primitives
Intoperations on integers
Listclassic list manipulation functions
ListPairoperations on pairs of lists
Mathtrigonometric functions etc.
OSoperating system information
Optionpartial functions
Pathfile-system independent path manipulation
Processmanipulating processes
Realarithmetics on floating-point numbers
SignalUnix signals
SML90top-level compatibility with SML’90
Stringstring manipulation
StringCvtconversion to and from strings
Substringmanipulation of constant-time substrings
TextIOtext input-output streams (imperative)
Timetime points and durations
Timermeasuring real time and cpu time
Unixstarting concurrent subprocesses under Unix
Vectorimmutable constant-time-access vectors
Wordwords (31-bit unsigned integers)
Word8bytes (8-bit unsigned integers)
Word8Arrayarrays of bytes
Word8Vectorvectors of bytes

Different installations will add many more structures. For example Moscow ML adds Arraysort, Binarymap, BinIO, Dynarray, Dynlib, Intset, Meta, Mosmlcgi, Mysql, Regex, Signal, Splaymap, Gdimage, among others.

The Moscow ML Library Reference is online.