Types

Programs manipulate values, and manipulate them according to their type. There is more to the study of types than you might think.

Why Types?

We have an intuition that integers are different from strings which are different from people which are different from books which are different from sockets and so on. These differences arise from behavior: You can open a socket, but not an integer; you can request the parents of a person, but not a dictionary; you can push an object onto a stack, but not onto character.

This idea gives rise to a few different ways to define the term type:

A value’s type constrains the way it may be used in a program.

More philosophically:

Types impose constraints on what we can and cannot say.

We can also get algebraic:

A type consists of a set of values and a set of allowable operations.

Since humans naturally classify things (give them types) we can use these types in our languages, making the source more closely resemble our understanding of the domain. We even classify types themselves:

typehierarchy.png

This sketch has little to do with any real language, since the set of types and their organization differs radically among languages. More on this later, after we do a little theory.

Type Systems

In programming language theory, a language’s type system answers questions such as:

  1. Is the set of types fixed, or can you create new ones?
  2. Are types extra-lingual, or are they themselves values? If values, are they first-class? Also if values, do they themselves belong to one of possibly many types or is there just one type? If not, can they be categorized into different, um, say, typeclasses?
  3. What exactly are the types of the language, and how are new types built?
  4. How do we infer the type of an expression (gets tricky with variables)
  5. How do we know when two types are exactly the same?
  6. How do we know when two types are compatible?
  7. What do we do when expressions are used in a way inconsistent with their type?

Fixed Type Systems

Some languages predefine the set of types and do not let you define new ones. These include:

JavaScript really does have only eight types, but it kind of feels like it has more. For example:

class Dog { bark() { return 'woof'; } }
class Rat { squeak() { return 'peep'; } }
const d = new Dog();
const r = new Rat();
d.constructor              // Dog
r.constructor              // Rat
typeof d                   // 'object'
typeof r                   // 'object'

So in JavaScript you can make lots of different kinds of objects (even arrays, functions, dates, and regular expressions are objects), but all of these objects have the same type. They have different constructors, but constructors aren’t types.

Are Types Values?

In JavaScript, the value 100 has the type “number”, but this “number type” itself is not a JavaScript value. Types in JavaScript are extra-lingual. You can ask for an expression’s type, but you’ll just get back a string:

typeof 3 === 'number'
typeof 'hello' === 'string'
typeof { x: 1, y: 2 } === 'object'
typeof typeof 3 === 'string'

Ditto for Lua:

type(3) == 'number'
type('hello') == 'string'
type({ x=1, y=2 }) == 'table'
type(type(3)) == 'string'
In Erlang, you can’t even ask for the type, but there are built-in functions to tell you whether or not an expression has a type:
is_atom(ten).
is_integer($a).
is_float(-3.55e-8).
is_function(fun (X) -> X*X end).
is_reference(make_ref()).
is_tuple({dog, "Nika", 5, 'G-SHEP'}).
is_list("a string").
This is probably a good thing, because as we’ll see, values can have multiple types at the same time. So asking for the type of an expression might be an indication of a type system that is less sophisticated than it could be.

In C, you can't even ask or check for a type, but you can see them! The types exist in the code, but they are not values that can be assigned to variables or passed as arguments or returned from functions.

Exercise: Try writing C code that uses int as a value. What problems do you run into?

In Python, on the other hand, the value 100 has the type int, and int itself is a value in Python. In fact it has the type type. And, you guessed it, type is a value whose type is... yes, type!

type(3) == int
type('hello') == str
type({ 'x':1, 'y':2 }) == dict
type(int) == type
type(type) == type
type(type(type(type(3)))) == type

Same with Ruby, though in Ruby you tend to see classes more than types:

3.class == Integer
'hello'.class == String
{x: 1, y: 2}.class == Hash
Integer.class == Class
Class.class == Class

Java, too, has objects representing classes:

System.out.println(new Integer(3).getClass());        // class java.lang.Integer
System.out.println(int.class);                        // int
System.out.println("hello".getClass());               // class java.lang.String
System.out.println(new int[]{1, 2, 3}.getClass());    // class [I
System.out.println(String.class);                     // class java.lang.String
System.out.println("hi".getClass().getClass());       // class java.lang.Class

Augh, wait, WTF...we have types and now classes?

Types vs. Classes

So what is this thing we call a class? A little hard to define, maybe.... Some languages conflate the notion of type and class, but most people agree there is a difference.

TYPECLASS
Is all about behaviorIs about both structure and behavior
An object can have many typesAn object belongs to exactly one class
Is about interfacing and usageIs about construction, and things like fields/properties and methods
Example: A string object in Java has multiple types, including String, Comparable, and Object. But only one class: String.
String s = "hello";                          // "hello" ∈ type String
Comparable c = s;                            // "hello" ∈ type Comparable
Object o = s;                                // "hello" ∈ type Object
System.out.println(o.getClass().getName());  // THE class of "hello" is String

In most languages, declaring a class gives rise to a type. The exceptions to this rule are JavaScript and CoffeeScript, where the word class does not make a type: instead it is a funny way of declaring a function and a prototype.

Other languages are similar to Java in having a class construct that makes a type, but they also have ways to make types that “mixin” behaviors, for example Ruby has modules and Swift has protocols. Go has structs for the class-concept and interfaces for the pure behavior-only types.

Type Expressions

In many languages, you get a set of basic types and mechanisms for making new types. Some examples:

Language Basic Types Type formers
C char, signed char, unsigned char, short, unsigned short, int, unsigned, long, unsigned long, long long, unsigned long long, float, double, long double, _Bool, float _Complex, double _Complex, long double _Complex, float _Imaginary, double _Imaginary, long double _Imaginary enum, *, [], (), union, struct
Java boolean, byte, char, short, int, long, float, double interface, class, enum, []
Standard ML unit, bool, int, word, real, char, string ->, *, list, option, exn, ref, frag, datatype, {}
Python NoneType, NotImplementedType, ellipsis, int, bool, float, complex, str, bytes, tuple, list, bytearray, set, frozenset, dict, function, generator, method, classmethod, staticmethod, module, slice, range, type class
Go bool, int8 (byte), int16, int32 (rune), int64, int, uint8, uint16, uint32, uint64, uintptr, float32, float64, complex64, complex128, string, error [], map, *, func, struct, interface, chan
Rust bool, i8, i16, i32, i64, i128, isize, u8, u16, u32, u64, u128, usize, f32, f64, char, str, ! (T1, T2), [T ; N], [T], struct, enum, union, ->, &, &mut, *const, *mut, trait, impl
Swift Bool, Int8, Int16, Int32, Int64, Int, UInt8, UInt16, UInt32 UInt64, UInt, Float, Double, Character, String, (T1, T2), T?, [T], Set<T>, [K:V], T->U, struct, class
Haskell Bool, Int, Integer, Float, Double, Char [a], (a,b), a->b, data

So it looks like one of the most common ways to introduce a new type is to use that class notion we saw above. As in Python, for example:

>>> class C(object): pass
...
>>> class D(object): pass
...
>>> c, d = C(), D()
>>> type(c), type(d), type(c) == type(d)
(<class '__main__.C'>, <class '__main__.D'>, False)

But something cooler and more computer sciencey are the type operators of the ML-family of languages, like list (for lists), * (for tuples), and -> (for functions):

7                                     (*   int                       *)
(4, "dog")                            (*   int * string              *)
[5, 3, 2, 7]                          (*   int list                  *)
[ [], [3,3,3], [], [1] ]              (*   int list list             *)
[(3, 4.0),(1, 5.5)]                   (*   (int * real) list         *)
(5, [2.2, 1.0E~4])                    (*   int * real list           *)
fn x => x + 8                         (*   int -> int                *)
fn x => fn y => x ^ Int.toString(y)   (*   string -> int -> string   *)

Mmmm, so here’s something interesting. SML has type called int list. Does Python? No, in Python, the type is just called list. So [1, "hello"] is legal in Python but illegal in SML. SML has parameterized types.

Parameterized Types

In Standard ML, the types int list and string list and ((int * string) -> bool) list are all different, but they are all instances of the parameterized type 'a list. Lots of languages are like this. Here are some simple examples:

LanguageExample Parameterized TypeExample Instance TypeNotes
Standard ML'a liststring listType variables are 'a, 'b, 'c and so on. If the instantiating type must admit equality, then we write ''a, ''b, ''c, and so on.
'a * 'bstring * bool
'a -> 'b -> 'aint->(bool*int)->int
Haskell[a]
a->b
List Int
Int->String
Types begin with a capital letter and type variables with a lowercase letter.
JavaList<T>List<String>
SwiftSet<T>Set<String>

Here are languages that do not have parameterized types, though you think they might: Go, C. Well Go has some types that look parameterized but....

By the way, type parameters do not have to be just a simple type variable. They can be much fancier. We’ll see examples in future sections below.

Classifying Types

Just like values are classified into types, types can be classified into typeclasses. Examples:

A type is equatable, or is an eqtype, if values of the type can be tested for equality (==).

A type is comparable (a.k.a. ordered) if values of the type can be compared with < (and often also with <=, >, and >=, and often <=>).

A type is bounded if there is a minimum value of the type and a maximum value of the type.

A type is showable, or is a showtype if values of the type can be converted to strings.

A type is callable if values of its type can be called on, or applied to, other values.

A type is enumerable if it has successor and predecessor operations on its values.

A type is monoidal, or is a monoid, if it has an associative composition operator, and an identity operator for the composition.

A type is monadic, or is a monad, if values of the type “wrap” underlying values in some way (e.g. optional, list), and there is an operator to wrap a value, and an associative composition operator on functions that unwrap a value and do something to it to produce a new wrapped value, such that the wrapper is an identity for the composition.

Type Equivalence

When are two types the same?

typedef struct {
   int a;
   int b;
} Point;

typedef struct {
   int a;
   int b;
} Pair;

Point x;
Pair y;

Do x and y have the same type? Should we say “yes” (because they have the same structure), or “no” (because their types have different names, and furthermore appear in different declarations)? These two approaches to determining whether types are the same are structural and named equivalence:

Structural Equivalence Name Equivalence
Check equivalence by expanding structures all the way down to basic types Strict: Every type declaration defines a new type
Loose: Factor out aliases

Here is an example from Michael L. Scott:

type alink = pointer to cell;
subtype blink = alink;
p, q : pointer to cell;
r    : alink;
s    : blink;
t    : pointer to cell;
u    : alink;

-- If structural: [p q r s t u]

-- If strict name: [p q] [r u] [s] [t]

-- If loose name: [p q] [r s u] [t]

Loose name equivalence is pretty common: we like to distinguish things like Point and Pair above but want the flexibility of aliasing. In Ada we can do both explicitly:

type Height is new Integer;
type Weight is new Integer;
-- Can’t add heights and weights

subtype Height is Integer;
subtype Weight is Integer;
-- Now you can freely mix them

ML seems to use structural equivalence:

type pair = int * int;
type point = int * int;
val x: point = (1, 4);
val y: pair = x;
type person = {name: string, age: int};
type school = {name: string, age: int};
val p: person = {name="Alice", age=22};
val s: school = p;

But that’s because type does not define a new type! Only a datatype or abstype declaration creates a new type.

abstype person = P of string * int with
  fun new_person (s, i) = P(s, i)
  fun name (P(s, i)) = s
  fun age (P(s, i)) = i
end;

Type Compatibility

When can a value of type A be used in a context that expects type B?

Possible answers:

So.... In:

int a;
float b;
float c;
c = a + b;

Is this

A language that allows this is said to coerce ints to floats, and we say "int is compatible with float".

Important definitions:

Type Conversion
Explicit operation that takes in an object of one type and returns an object of a different type that has the "same value" (not necessarily the same bit pattern).
Type Coercion
Implicit
Non-converting Type Cast
"Reinterpret" an object as an object of another type by preserving its bit pattern, regardless of value.

Type Inference

How do we determine the type of an expression?

ML seems to do it the most general way possible. It looks at all the types of the primitive expressions, then at the types of arguments that the subroutines and operators expect, and work your way out to the whole expression.

1   fun fib (n) =
2       let fun fib_helper (f1, f2, i) =
3           if i = n then f2
4           else fib_helper (f2, f1+f2, i+1)
5       in
6           fib_helper (0, 1, 0)
7       end;

If there’s not enough information to resolve to a known type, ML’s inferencer will bring in type variables. Note how clever it is:

fun I x = x;                              'a -> 'a
fun p x y = y                             'a -> 'b -> 'b
fun first (x, y) = x;                     'a * 'b -> 'a
fun q x = (hd o hd) x;                    'a list list -> 'a
fun c x y = if x = y then "y" else "n";   ''a -> ''a -> string

In ML, a type variable beginning with two primes can only be instantiated with a type that admits equality.

ML’s inferencing is more powerful than that of other languages, Go, Rust, and Scala require that you put types on parameters.

scala> var x = 3;
x: Int = 3
scala> var y = x + 3;
y: Int = 6
scala> def f(x) = x + 3;
<console>:1: error: ':' expected but ')' found.
       def f(x) = x + 3;
              ^
scala> def f(x:Int) = x + 3;
f: (x: Int)Int

Go example.

Rust example.

Exercise: Two things complicate type inference are overloading and coercion. Give concrete examples of complications that arise for each.

Type Checking

What if you use an expression in a manner inconsistent with its type, like trying to compute the age of a string (instead of a person). Should the evaluation result in an error, or return a "best guess"?

Strong Typing vs. Weak Typing

Strong TypingWeak Typing
Type clashes among unrelated types result in errors. they may be compile time, or even run time (e.g. NoSuchMethodError, or ClassCastException) but they are errors. Type clashes don’t really exist... the language implementation will try to cast the argument into some reasonable (!) type and carry out the operation.
Alternate definition: you can’t really circumvent the type system. Example: In Java, even if you explicitly add a cast to cast a string to an int, you get an error. Alternate definition: you can easily circumvent the type system. Example: In C++, you can cast a pointer to an int to a void*, then to a string*, and you get away with it.

Some people would argue strong vs. weak typing isn’t a terribly useful thing to worry about. the static/dynamic dimension is a bigger deal.

Static Typing vs. Dynamic Typing

Static TypingDynamic Typing
Type checking done at compile time

Once a variable’s type is known, it can only be assigned expressions of that type (including related types that can be coerced to it, of course).

Type checking done at run time

Because checking is deferred until run-time a variable can be assigned an expression of one type and then later an expression of another type.

var x;
x = 2;
print x + 5;
x = "dog";
print concat(x, "house");

It is common for languages to have some static typing and some dynamic typing.

Manifest Typing vs. Implicit Typing

Manifest TypingImplicit Typing
Types of variables must be given in their declarations.
void f(int x, list<int> z) {
  int y = x * x;
  z.push_back(y);
}
the types of variables will be inferred from context
fun f x z =
  let y = x * x in z @ [y] end
x must be an int because of the "*" operator, so y must be an int as well, and then z must be an int list, and finally f must be int -> int list -> int list.

What about some popular languages?

Exercise: Classify Classic Lisp, Smalltalk, Go, and Rust

Declarations and Static Typing

In order to do static typing, you must sometimes specify types when declaring variables, functions, parameters, etc. How is this done?

Exercise: Rewrite the examples in the C and Go comparision article in Rust.

Type Checking with Type Variables

Suppose f has type 'a * int -> 'b * 'b -> string. then the following expressions are legal

    f(4, 5)("sd","de")
    f(1,1)(2,2)
    f([],5)([],[4,4])

but these are not

    f(3,2)(4,"p")
    f((3,2),5)([5.5,2.2],[1,6])

Further Reading

the literature on strong/weak, static/dynamic, and manifest/implicit typing is enormous, as are the debates and flame wars. Some good reading:

Dependent Types

A dependent type is a type whose definition depends on a value. Examples:

Does C++ have dependent types? I mean, you can do:

template <int n>
class BoundedList {
public:
  int contents[n];
}

but all types that instantiate this template are created at compile time.

Languages that fully support dynamic dependent types can eliminate many logic errors at compile time! (The type system essentially lets you formulate compile-time proofs that a program will have a certain run time behavior, in terms of the values it will produce.)

Start your study of dependent types at this Stack Overflow question then go to Wikipedia.

Overview of Common Types

Numeric Types

Lots of variety here:

Enumerations

In type theory, an enumeration is a type with a fixed number of distinct values, ordered from smallest to largest. Typical examples are:

The Boolean type is an enum of the values False and True.

Many languages have a character type which is essentially an enumeration. Each character’s "value" is its code point.

Ranges

Some languages allow you to make types such as:

Sum Types and Product Types

Suppose we have two types, Boolean and Byte, where:

Now we form new types:

Note Boolean has 2 values, Byte has 256 values. Boolean + Byte has 2+256 = 258 values, and Boolean × Byte has 2 * 256 = 512 values. Boolean + Byte is a sum type and Boolean × Byte is a product type.

Product types are very common; Python calls them tuples. Haskell and ML just let you write the product type name directly, e.g. Int * String.

Often sums and products can be tagged.

Option Types

Option types are one solution to the billion dollar mistake. Basically an object of type T? is pretty much the sum type of T and the type containing a single value representing null.

Records, a.k.a. Structs

Tagged product types go by many names: record, struct, object, hash, etc. Values of such types are thought of as key-value pairs.

Unions

  datatype primary_color = Red | Blue | Green

  Red                                   primary_color
  (Red, 4)                              primary_color * int

Arrays

Usually by array we mean a collection of elements indexed by integers or some other enumerated type, that has constant time access of any element given its index. Contrast this with a list, which needn’t have constant time access, and with a map (a.k.a dictionary or associative array), which can be indexed by anything, not just integers or related enumerated types.

Array Bounds

Lifetime and Shape

  Static Bounds Bounds Set at Elaboration Bounds Can be Changed
(Dynamic Shape)
Global
Lifetime
C globals
Local
Lifetime
C locals Ada locals
Arbitrary
Lifetime
Java Perl, APL

Implementation

Slices

Several languages (Perl, APL, Fortran 90) have nice syntax for slices. Fortran 90 allows (these examples are from Scott):

matrix(3:6, 4:7)        columns 3-6, rows 4-7
matrix(6:, 5)           columns 6-end, row 5
matrix(:4, 2:8:2)       columns 1-4, every other row from 2-8
matrix(:, /2, 5, 9/)    all columns, rows 2, 5, and 9
Exercise: Give the equivalent expressions in Julia.
Exercise: How are these represented in Python’s numpy library?

Good to know: Go uses the term “slice“ to mean something else.

Strings

Most (all?) languages have them, but care is needed:

Sets

Mathematically, a set is an unordered collection of unique elements.

Pointers

A pointer is, more or less, an object through which you reference some other object. They appear in system languages and low level languages.

int x = 5;
int* p = &x;
int* q = NULL;
int* r = new int;
int* s = new int[100];

cout << *p << *r << s[20];
cout << *q; // crash

Basics

Reference and Dereference Syntax

Can be always explicit, always implicit, or sometimes-implicit (like in Ada).

pointer.png

If the pointer is called p (or $p in Perl), then

Inthe referent is calledand the field is called
C*p(*p).x or p->x
Adap.allp.all.x or p.x
Modula-2p^p^.x
Javapp.x
Perl%$p$$p{x} or $p->{x}

If the referent is called p (or %p in Perl), then

Inthe pointer is calledand the field is called
C&pp.x
Adap'accessp.x
Perl\%p$p{x}

Memory Leaks and Dangling Pointers

In languages like C requiring explicit allocation and explicit deallocation, it is possible to:

How can we prevent such things?

Garbage Collection

Covered separately. For now, you can read Wikipedia’s article on garbage collection, and an interesting developerworks article on Java performance and garbage collection.

Pointers and Arrays in C

Sometimes, these ideas get conflated in C. After all:

e1[e2] is the same as *(e1 + e2)

For definitions these are completely different things:

  int *x;           /* is totally different from: */
  int x[100];

  int *a[n];        /* is totally different from: */
  int a[n][100];

But for declarations, at least in parameter declarations, you can blur the distinction:

  void f(int* a) { ... }
  void g(int b[]) { ... }

An array of ints, or pointer to an int, can be passed to either.

Streams

The term stream is a bit overloaded. Abstractly, a stream:

Sometimes we use the term stream to refer to files.

In Java, streams are their own thing. They work like the streams described above (they are processed sequentially), but are in a pretty large subsystem of their own.

Regular Expressions

Covered separately.

Subroutines

Subroutines are often objects with their own types. We’ll see these later when we discuss subroutines.

Processes and threads

Will be covered later when we discuss concurrency.

Orthogonality

Can an object of any type...

Are types themselves objects?

Issues in Orthogonality

Case Studies

Rust

Rust’s type system is pretty sophisticated. Read about it here.

Julia

Swift

Elm