Types

Programs manipulate values, and manipulate them according to their type. There is more to the study of types than you might think.

Why Types?

We have an intuition that integers are different from strings which are different from people which are different from books which are different from sockets and so on. These differences arise from behavior: You can open a socket, but not an integer; you can request the parents of a person, but not a dictionary; you can push an object onto a stack, but not onto character.

This idea gives rise to a few different ways to define the term type:

A value’s type constrains the way it may be used in a program.

More philosophically:

Types impose constraints on what we can and cannot say.

More algebraically:

A type consists of a set of values and a set of allowable operations.

Since humans naturally classify things (give them types) we can use these types in our languages, making the source more closely resemble our understanding of the domain. We even classify types themselves:

typehierarchy.png

This sketch has little to do with any real language, since the set of types and their orgranization differs radically among languages. More on this later, after we do a little theory.

Type Systems

In programming language theory, a language’s type system answers questions such as:

  1. Is the set of types fixed, or can you create new ones?
  2. Are types extra-lingual, or are they themselves values? If values, are they first-class?
  3. What exactly are the types of the language, and how are new types built?
  4. How do we infer the type of an expression (gets tricky with variables)
  5. How do we know when two types are exactly the same?
  6. How do we know when two types are compatible?
  7. What do we do when expressions are used in a way inconsistent with their type?

Fixed Type Systems

Some languages predefine the set of types and do not let you define new ones. These include:

JavaScript really does have only those 7 types, but it kind of feels like it has more. For example:

class Dog { bark() { return 'woof'; } }
class Rat { squeak() { return 'peep'; } }
const d = new Dog();
const r = new Rat();
d.constructor              // Dog
r.constructor              // Rat
typeof d                   // 'object'
typeof r                   // 'object'

So in JavaScript you can make lots of different kinds of objects (even arrays, functions, dates, and regular expressions are objects), but all of these objects have the same type. They have different constructors, but constructors aren’t types.

Are Types Values?

In JavaScript, the value 100 has the type “number”, but this “number type” itself is not a JavaScript value. Types in JavaScript are extra-lingual. You can ask for an expression’s type, but you’ll just get back a string:

typeof 3 === 'number'
typeof 'hello' === 'string'
typeof { x: 1, y: 2 } === 'object'
typeof typeof 3 === 'string'

Ditto for Lua:

type(3) == 'number'
type('hello') == 'string'
type({ x=1, y=2 }) == 'table'
type(type(3)) == 'string'
In Erlang, you can’t even ask for the type, but there are built-in functions to tell you whether or not an expression has a type:
is_atom(ten).
is_integer($a).
is_float(-3.55e-8).
is_function(fun (X) -> X*X end).
is_reference(make_ref()).
is_tuple({dog, "Nika", 5, 'G-SHEP'}).
is_list("a string").
This is probably a good thing, because as we’ll see, values can have multiple types at the same time. So asking for the type of an expression might be an indication of a type system that is less sophisticated than it could be.

In C, you can't even ask or check for a type, but you can see them! The types exist in the code, but they are not values that can be assigned to variables or passed as arguments or returned from functions.

Exercise: Try writing C code that uses int as a value. What problems do you run into?

In Python, on the other hand, the value 100 has the type int, and int itself is a value in Python. In fact it has the type type. And, you guessed it, type is a value whose type is... yes, type!

type(3) == int
type('hello') == str
type({ 'x':1, 'y':2 }) == dict
type(int) == type
type(type) == type
type(type(type(type(3)))) == type

Same with Ruby, though in Ruby you tend to see classes more than types:

3.class == Integer
'hello'.class == String
{x: 1, y: 2}.class == Hash
Integer.class == Class
Class.class == Class

Java, too, has objects representing classes:

System.out.println(new Integer(3).getClass());        // class java.lang.Integer
System.out.println(int.class);                        // int
System.out.println("hello".getClass());               // class java.lang.String
System.out.println(new int[]{1, 2, 3}.getClass());    // class [I
System.out.println(String.class);                     // class java.lang.String
System.out.println("hi".getClass().getClass());       // class java.lang.Class

Augh, wait, WTF...we have types and now classes?

Types vs. Classes

So what is this thing we call a class? A little hard to define, maybe.... Some languages conflate the notion of type and class, but most people agree there is a difference.

TYPECLASS
Is all about behaviorIs about both structure and behavior
An object can have many typesAn object belongs to exactly one class
Is about interfacing and usageIs about construction, and things like fields/properties and methods
Example: A string object in Java has multiple types, including String, Comparable, and Object. But only one class: String.
String s = "hello";
Comparable c = s;
Object o = s;
System.out.println(o.getClass().getName());

In most languages, declaring a class gives rise to a type. The exceptions to this rule are JavaScript and CoffeeScript, where the word class does not make a type: instead it is a funny way of declaring a function and a prototype.

Other languages are similar to Java in having a class construct that makes a type, but they also have ways to make types that “mixin” behaviors, for example Ruby has modules and Swift has protocols. Go has structs for the class-concept and interfaces for the pure behavior-only types.

Type Expressions

In many languages, you get a set of basic types and mechanisms for making new types. Some examples:

Language Basic Types Type formers
C char, signed char, unsigned char, short, unsigned short, int, unsigned, long, unsigned long, long long, unsigned long long, float, double, long double, _Bool, float _Complex, double _Complex, long double _Complex, float _Imaginary, double _Imaginary, long double _Imaginary enum, *, [], (), union, struct
Java boolean, byte, char, short, int, long, float, double interface, class, enum, []
Standard ML unit, bool, int, word, real, char, string ->, *, list, option, exn, ref, frag, datatype, {}
Python NoneType, NotImplementedType, ellipsis, int, bool, float, complex, str, bytes, tuple, list, bytearray, set, frozenset, dict, function, generator, method, classmethod, staticmethod, module, slice, range, type class
Go bool, int8 (byte), int16, int32 (rune), int64, int, uint8, uint16, uint32, uint64, uintptr, float32, float64, complex64, complex128, string, error [], map, *, func, struct, interface, chan
Rust bool, i8, i16, i32, i64, i128, isize, u8, u16, u32, u64, u128, usize, f32, f64, char, str, ! (T1, T2), [T ; N], [T], struct, enum, union, ->, &, &mut, *const, *mut, trait, impl
Swift Bool, Int8, Int16, Int32, Int64, Int, UInt8, UInt16, UInt32 UInt64, UInt, Float, Double, Character, String, (T1, T2), T?, [T], Set<T>, [K:V], T->U, struct, class
Haskell Bool, Int, Integer, Float, Double, Char [a], (a,b), a->b, data

So it looks like one of the most common ways to introduce a new type is to use that class notion we saw above. As in Python, for example:

>>> class C(object): pass
...
>>> class D(object): pass
...
>>> c, d = C(), D()
>>> type(c), type(d), type(c) == type(d)
(<class '__main__.C'>, <class '__main__.D'>, False)

But something cooler and more computer sciency are the type operators of the ML-family of languages, like list (for lists), * (for tuples), and -> (for functions):

7                                     (*   int                       *)
(4, "dog")                            (*   int * string              *)
[5, 3, 2, 7]                          (*   int list                  *)
[ [], [3,3,3], [], [1] ]              (*   int list list             *)
[(3, 4.0),(1, 5.5)]                   (*   (int * real) list         *)
(5, [2.2, 1.0E~4])                    (*   int * real list           *)
fn x => x + 8                         (*   int -> int                *)
fn x => fn y => x ^ Int.toString(y)   (*   string -> int -> string   *)

Mmmm, so here’s something interesting. SML has type called int list. Does Python? No, in Python, the type is just called list. So [1, "hello"] is legal in Python but illegal in SML. SML has parameterized types.

Parameterized Types

In Standard ML, the types int list and string list and ((int * string) -> bool) list are all different, but they are all instances of the parameterized type 'a list. Lots of languages are like this. Here are some simple examples:

LanguageExample Parameterized TypeExample Instance TypeNotes
Standard ML'a liststring listType variables are 'a, 'b, 'c and so on. If the instantiating type must admit equality, then we write ''a, ''b, ''c, and so on.
'a * 'bstring * bool
'a -> 'b -> 'aint->(bool*int)->int
Haskell[a]
a->b
List Int
Int->String
Types begin with a capital letter and type variables with a lowercase letter.
JavaList<T>List<String>
SwiftSet<T>Set<String>

Here are languages that do not have parameterized types, though you think they might: Go, C.

By the way, type parameters do not have to be just a simple type variable. They can be much fancier. We’ll see examples in future sections below.

Type Equivalence

When are two types the same?

typedef struct {
   int a;
   int b;
} Point;

typedef struct {
   int a;
   int b;
} Pair;

Point x;
Pair y;

Do x and y have the same type? Should we say “yes” (because they have the same structure), or “no” (because their types have different names, and furthermore appear in different declarations)? These two approaches to determining whether types are the same are structural and named equivalence:

Structural Equivalence Name Equivalence
Check equivalence by expanding structures all the way down to basic types Strict: Every type declaration defines a new type
Loose: Factor out aliases

Here is an example from Michael L. Scott:

type alink = pointer to cell;
subtype blink = alink;
p, q : pointer to cell;
r    : alink;
s    : blink;
t    : pointer to cell;
u    : alink;

-- If structural: [p q r s t u]

-- If strict name: [p q] [r u] [s] [t]

-- If loose name: [p q] [r s u] [t]

Loose name equivalence is pretty common: we like to distinguish things like Point and Pair above but want the flexibility of aliasing. In Ada we can do both explicitly:

type Height is new Integer;
type Weight is new Integer;
-- Can’t add heights and weights

subtype Height is Integer;
subtype Weight is Integer;
-- Now you can freely mix them

ML seems to use structural equivalence:

type pair = int * int;
type point = int * int;
val x: point = (1, 4);
val y: pair = x;
type person = {name: string, age: int};
type school = {name: string, age: int};
val p: person = {name="Alice", age=22};
val s: school = p;

But that’s because type does not define a new type! Only a datatype or abstype declaration creates a new type.

abstype person = P of string * int with
  fun new_person (s, i) = P(s, i)
  fun name (P(s, i)) = s
  fun age (P(s, i)) = i
end;

Type Compatibility

When can a value of type A be used in a context that expects type B?

Possible answers:

So.... In:

int a;
float b;
float c;
c = a + b;

Is this

A language that allows this is said to coerce ints to floats, and we say "int is compatible with float".

Important definitions:

Type Conversion
Explicit operation that takes in an object of one type and returns an object of a different type that has the "same value" (not necessarily the same bit pattern).
Type Coercion
Implicit
Non-converting Type Cast
"Reinterpret" an object as an object of another type by preserving its bit pattern, regardless of value.

Type Inference

How do we determine the type of an expression?

ML seems to do it the most general way possible. It looks at all the types of the primitive expressions, then at the types of arguments that the subroutines and operators expect, and work your way out to the whole expression.

1   fun fib (n) =
2       let fun fib_helper (f1, f2, i) =
3           if i = n then f2
4           else fib_helper (f2, f1+f2, i+1)
5       in
6           fib_helper (0, 1, 0)
7       end;

If there’s not enough information to resolve to a known type, ML’s inferencer will bring in type variables. Note how clever it is:

fun I x = x;                              'a -> 'a
fun p x y = y                             'a -> 'b -> 'b
fun first (x, y) = x;                     'a * 'b -> 'a
fun q x = (hd o hd) x;                    'a list list -> 'a
fun c x y = if x = y then "y" else "n";   ''a -> ''a -> string

In ML, a type variable beginning with two primes can only be instantiated with a type that admits equality.

ML’s inferencing is more powerful than that of other languages, Go, Rust, and Scala require that you put types on parameters.

scala> var x = 3;
x: Int = 3
scala> var y = x + 3;
y: Int = 6
scala> def f(x) = x + 3;
<console>:1: error: ':' expected but ')' found.
       def f(x) = x + 3;
              ^
scala> def f(x:Int) = x + 3;
f: (x: Int)Int

Go example.

Rust example.

Exercise: Two things complicate type inference are overloading and coercion. Give concrete examples of complications that arise for each.

Type Checking

What if you use an expression in a manner inconsistent with its type, like trying to compute the age of a string (instead of a person). Should the evaluation result in an error, or return a "best guess"?

Strong Typing vs. Weak Typing

Strong TypingWeak Typing
Type clashes among unrelated types result in errors. they may be compile time, or even run time (e.g. NoSuchMethodError, or ClassCastException) but they are errors. Type clashes don’t really exist... the language implementation will try to cast the argument into some reasonable (!) type and carry out the operation.
Alternate definition: you can’t really circumvent the type system. Example: In Java, even if you explicitly add a cast to cast a string to an int, you get an error. Alternate definition: you can easily circumvent the type system. Example: In C++, you can cast a pointer to an int to a void*, then to a string*, and you get away with it.

Some people would argue strong vs. weak typing isn’t a terribly useful thing to worry about. the static/dynamic dimension is a bigger deal.

Static Typing vs. Dynamic Typing

Static TypingDynamic Typing
Type checking done at compile time

Once a variable’s type is known, it can only be assigned expressions of that type (including related types that can be coerced to it, of course).

Type checking done at run time

Because checking is deferred until run-time a variable can be assigned an expression of one type and then later an expression of another type.

var x;
x = 2;
print x + 5;
x = "dog";
print concat(x, "house");

It is common for languages to have some static typing and some dynamic typing.

Manifest Typing vs. Implicit Typing

Manifest TypingImplicit Typing
Types of variables must be given in their declarations.
void f(int x, list<int> z) {
  int y = x * x;
  z.push_back(y);
}
the types of variables will be inferred from context
fun f x z =
  let y = x * x in z @ [y] end
x must be an int because of the "*" operator, so y must be an int as well, and then z must be an int list, and finally f must be int -> int list -> int list.

What about some popular languages?

Exercise: Classify Classic Lisp, Smalltalk, Go, and Rust

Declarations and Static Typing

In order to do static typing, you must sometimes specify types when declaring variables, functions, parameters, etc. How is this done?

Exercise: Rewrite the examples in the C and Go comparision article in Rust.

Type Checking with Type Variables

Suppose f has type 'a * int -> 'b * 'b -> string. then the following expressions are legal

    f(4, 5)("sd","de")
    f(1,1)(2,2)
    f([],5)([],[4,4])

but these are not

    f(3,2)(4,"p")
    f((3,2),5)([5.5,2.2],[1,6])

Further Reading

the literature on strong/weak, static/dynamic, and manifest/implicit typing is enormous, as are the debates and flame wars. Some good reading:

Dependent Types

Overview of Common Types

Numeric Types

Enumerations

Ranges and Subranges

Option Types

Records, a.k.a. Structs

Unions

  datatype primary_color = Red | Blue | Green

  Red                                   primary_color
  (Red, 4)                              primary_color * int

Arrays

Usually by array we mean a collection of elements indexed by integers or some other enumerated type, that has constant time access of any element given its index. Contrast this with a list, which needn’t have constant time access, and with a map (a.k.a dictionary or associative array), which can be indexed by anything, not just integers or related enumerated types.

Array Bounds

Lifetime and Shape

  Static Bounds Bounds Set at Elaboration Bounds Can be Changed
(Dynamic Shape)
Global
Lifetime
C globals
Local
Lifetime
C locals Ada locals
Arbitrary
Lifetime
Java Perl, APL

Implementation

Slices

Several languages (Perl, APL, Fortran 90) have nice syntax for slices. Fortran 90 allows (these examples are from Scott):

matrix(3:6, 4:7)        columns 3-6, rows 4-7
matrix(6:, 5)           columns 6-end, row 5
matrix(:4, 2:8:2)       columns 1-4, every other row from 2-8
matrix(:, /2, 5, 9/)    all columns, rows 2, 5, and 9

Strings

Sets

Pointers

A pointer is, more or less, an object through which you reference some other object.

int x = 5;
int* p = &x;
int* q = NULL;
int* r = new int;
int* s = new int[100];

cout << *p << *r << s[20];
cout << *q; // crash

Basics

Reference and Dereference Syntax

Can be always explicit, always implicit, or sometimes-implicit (like in Ada).

pointer.png

If the pointer is called p (or $p in Perl), then

Inthe referent is calledand the field is called
C*p(*p).x or p->x
Adap.allp.all.x or p.x
Modula-2p^p^.x
Javapp.x
Perl%$p$$p{x} or $p->{x}

If the referent is called p (or %p in Perl), then

Inthe pointer is calledand the field is called
C&pp.x
Adap'accessp.x
Perl\%p$p{x}

Memory Leaks and Dangling Pointers

In languages like C requiring explicit allocation and explicit deallocation, it is possible to:

How can we prevent such things?

Garbage Collection

Covered separately. For now, you can read Wikipedia’s article on garbage collection, and an interesting developerworks article on Java performance and garbage collection.

Pointers and Arrays in C

this can get confusing. After all,

e1[e2] is the same as *(e1 + e2)

For definitions these are completely different things:

int *x;           /* is totally different from: */
int x[100];

int *a[n];        /* is totally different from: */
int a[n][100];

But for declarations, at least in parameter declarations, you can blur the distinction:

void f(int* a) { ... }
void g(int b[]) { ... }

An array of ints, or pointer to an int, can be passed to either.

Streams

A stream

Regular Expressions

Covered separately.

Subroutines

Subroutines are often objects with their own types. We’ll see these later when we discuss subroutines.

Processes and threads

Will be covered later when we discuss concurrency.

Orthogonality

Can an object of any type...

Are types themselves objects?

Issues in Orthogonality

Case Studies

Rust

Rust’s type system is pretty sophisticated. Read about it here.

Julia

Swift

Elm