Our very intuition makes the concept of types inescapable.
We have an intuition that integers are different from strings which are different from people which are different from books which are different from planets which are, you get the idea. These differences arise from behavior: You can open a socket, but not an integer; you can request the parents of a person, but not a dictionary; you can push an object onto a stack, but not onto character. Therefore:
A value’s type constrains the way it may be used in a program.
More philosophically:
Types impose constraints on what we can and cannot say.
More formally:
A type consists of a set of values and a set of allowable operations.
What are some types you know about from previous programming practice? Here are some you might have seen, Keep in mind, as you go through this list, that types are characterized by their operations (more so than their “values”):
true
and false
, with operations such as and
, or
, xor
, and not
.It is tempting to try to put types into a hierarchy, or at least a directed acyclic graph:
But it’s actually difficult to this because there are so many other ways to classify type that defy hierarchy. In fact, there’s this whole idea of classifying types into something called typeclasses, which are things like this:
A type is equatable, or is an eqtype, if values of the type can be tested for equality (==
).
A type is comparable (a.k.a. ordered) if values of the type can be compared with <
(and often also with <=
, >
, and >=
, and often <=>
).
A type is bounded if there is a minimum value of the type and a maximum value of the type.
A type is showable, or is a showtype if values of the type can be converted to strings.
A type is callable if values of its type can be called on, or applied to, other values.
A type is enumerable if it has successor and predecessor operations on its values.
enum
things, like Swift. Those concepts are completely different.A type is monoidal, or is a monoid, if it has an associative composition operator, and an identity operator for the composition.
x => x
(JS notation).A type is monadic, or is a monad, if values of the type “wrap” underlying values in some way (e.g. optional, list), and there is an operator to wrap a value, and an associative composition operator on functions that unwrap a value and do something to it to produce a new wrapped value, such that the wrapper is an identity for the composition.
Typeclasses in Practice
Languages vary greatly in their treatment of typeclasses. Haskell has a distinguished typeclass concept (called aclass
. Swift implements them via protocols, and other languages just use their own interface or mixin construct.
We can arrange types by building them up algebraically, similar to how sets are built up, for example:
Kind of Type | Notation | Notes | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Empty | $\perp$ | Type with no members at all. A bottom type (meaning it is a subtype of all types). Sometimes called Void . | ||||||||||
Singleton | $x$ | Type with only a single member. Examples:
| ||||||||||
Sum | $T_1\;|\;T_2\;|\,...|\;T_n$ or $x_1: T_1\;|\,...|\;x_n: T_n$ | Also written $T_1 \cup T_2$ or $T_1 + T_2$. If unlabeled, generally called a Union. If labeled, generally called a Tagged Union. Examples:
| ||||||||||
Product | $(T_1, T_2, ... T_n)$ or $(x_1: T_1,..., x_n: T_n)$ | Also written $T_1 \times T_2$. If unlabled, generally called a Tuple. If labeled, generally called a Record or Struct. Examples:
| ||||||||||
Sequence | $T*$ | a.k.a. Array or List. Examples:
| ||||||||||
Function | $T_1 \rightarrow T_2$ | A function from $T_1$ to $T_2$. Use $\perp$ or a product type for $T_1$ to simulate zero or multiple arguments, and $\perp$ or a product type for $T_2$ to simulate zero or multiple return values. Examples:
| ||||||||||
Top | $\top$ | All values are a member of this type. The “type of everything.” A top type (meaning all types are a subtype of it). Examples:
|
This arrangement actually works pretty well, as it covers most everything we need:
true | false
for singleton types true
and false
.T | null
for any type T
and singleton type null
. This is often abbreviated T?
or Optional<T>
.left
and right
, or a similarly structured tagged union called Result with labels success
and failure
. The latter is commonly used for returning, rather than throwing errors.Never
can be used for the return type of a function that does not return, either because it enters an infinite loop or throws an error.In case you haven’t seen singleton or union types before, you might not have seen TypeScript.
In programming language theory, a language’s type system answers questions such as:
Some languages predefine the set of types and do not let you define new ones. These include:
JavaScript really does have only eight types, but it kind of feels like it has more. For example:
class Dog { bark() { return 'woof'; } } class Rat { squeak() { return 'peep'; } } const d = new Dog(); const r = new Rat(); d.constructor // Dog r.constructor // Rat typeof d // 'object' typeof r // 'object'
So in JavaScript you can make lots of different kinds of objects (even arrays, functions, dates, and regular expressions are objects), but all of these objects have the same type. They have different constructors, but constructors aren’t types.
In JavaScript, the value 100 has the type “number”, but this “number type” itself is not a JavaScript value. Types in JavaScript are extra-lingual. You can ask for an expression’s type, but you’ll just get back a string:
typeof 3 === 'number' typeof 'hello' === 'string' typeof { x: 1, y: 2 } === 'object' typeof typeof 3 === 'string'
Ditto for Lua:
type(3) == 'number' type('hello') == 'string' type({ x=1, y=2 }) == 'table' type(type(3)) == 'string'In Erlang, you can’t even ask for the type, but there are built-in functions to tell you whether or not an expression has a type:
is_atom(ten). is_integer($a). is_float(-3.55e-8). is_function(fun (X) -> X*X end). is_reference(make_ref()). is_tuple({dog, "Nika", 5, 'G-SHEP'}). is_list("a string").
This is probably a good thing, because as we’ll see, values can have multiple types at the same time. So asking for the type of an expression might be an indication of a type system that is less sophisticated than it could be.
In C, you can't even ask or check for a type, but you can see them! The types exist in the code, but they are not values that can be assigned to variables or passed as arguments or returned from functions.
int
as a value. What problems do you run into?
In Python, on the other hand, the value 100 has the type int
, and int
itself is a value in Python. In fact it has the type type
. And, you guessed it, type
is a value whose type is... yes, type
!
type(3) == int type('hello') == str type({ 'x':1, 'y':2 }) == dict type(int) == type type(type) == type type(type(type(type(3)))) == type
Same with Ruby, though in Ruby you tend to see classes more than types:
3.class == Integer 'hello'.class == String {x: 1, y: 2}.class == Hash Integer.class == Class Class.class == Class
Java, too, has objects representing classes:
System.out.println(new Integer(3).getClass()); // class java.lang.Integer System.out.println(int.class); // int System.out.println("hello".getClass()); // class java.lang.String System.out.println(new int[]{1, 2, 3}.getClass()); // class [I System.out.println(String.class); // class java.lang.String System.out.println("hi".getClass().getClass()); // class java.lang.Class
Augh, wait, WTF...we have types and now classes?
So what is this thing we call a class? A little hard to define, maybe.... Some languages conflate the notion of type and class, but most people agree there is a difference.
TYPE | CLASS |
---|---|
Is all about behavior | Is about both structure and behavior |
An object can have many types | An object belongs to exactly one class |
Is about interfacing and usage | Is about construction, and things like fields/properties and methods |
String
, Comparable
, and Object
. But only one class: String
.
String s = "hello"; // "hello" ∈ type String Comparable c = s; // "hello" ∈ type Comparable Object o = s; // "hello" ∈ type Object System.out.println(o.getClass().getName()); // THE class of "hello" is String
In most languages, declaring a class gives rise to a type. The exceptions to this rule are JavaScript and CoffeeScript, where the word class
does not make a type: instead it is a funny way of declaring a function and a prototype.
Other languages are similar to Java in having a class
construct that makes a type, but they also have ways to make types that “mixin” behaviors, for example Ruby has module
s and Swift has protocol
s. Go has struct
s for the class-concept and interface
s for the pure behavior-only types.
Oversimplying just a bit...
A class is an object factory; a type is a behavioral specification.
In many languages, you get a set of basic types and mechanisms for making new types. Some examples:
Language | Basic Types | Type formers |
---|---|---|
C | char, signed char, unsigned char, short, unsigned short, int, unsigned, long, unsigned long, long long, unsigned long long, float, double, long double, _Bool, float _Complex, double _Complex, long double _Complex, float _Imaginary, double _Imaginary, long double _Imaginary | enum, *, [], (), union, struct |
Java | boolean, byte, char, short, int, long, float, double | interface, class, record, enum, [] |
Standard ML | unit, bool, int, word, real, char, string | ->, *, list, option, exn, ref, frag, datatype, {} |
Python | NoneType, NotImplementedType, ellipsis, int, bool, float, complex, str, bytes, tuple, list, bytearray, set, frozenset, dict, function, generator, method, classmethod, staticmethod, module, slice, range, type | class |
Go | bool, int8 (byte), int16, int32 (rune), int64, int, uint8, uint16, uint32, uint64, uintptr, float32, float64, complex64, complex128, string, error | [], map, *, func, struct, interface, chan |
Rust | bool, i8, i16, i32, i64, i128, isize, u8, u16, u32, u64, u128, usize, f32, f64, char, str, ! | (T1, T2), [T ; N], [T], struct, enum, union, -> , &, &mut, *const, *mut, trait, impl
|
Swift | Bool, Int8, Int16, Int32, Int64, Int, UInt8, UInt16, UInt32 UInt64, UInt, Float, Double, Character, String, | (T1, T2), T?, [T], Set<T>, [K:V], T->U, struct, class |
Haskell | Bool, Int, Integer, Float, Double, Char | [a], (a,b), a->b, data |
So it looks like one of the most common ways to introduce a new type is to use that class
notion we saw above. As in Python, for example:
>>> class C: pass ... >>> class D: pass ... >>> c, d = C(), D() >>> type(c), type(d), type(c) == type(d) (<class '__main__.C'>, <class '__main__.D'>, False)
But something cooler and more computer sciencey are the type operators of the ML-family of languages, like list
(for lists), *
(for tuples), and ->
(for functions):
7 (* int *) (4, "dog") (* int * string *) [5, 3, 2, 7] (* int list *) [ [], [3,3,3], [], [1] ] (* int list list *) [(3, 4.0),(1, 5.5)] (* (int * real) list *) (5, [2.2, 1.0E~4]) (* int * real list *) fn x => x + 8 (* int -> int *) fn x => fn y => x ^ Int.toString(y) (* string -> int -> string *)
Mmmm, so here’s something interesting. SML has type called int list
. Does Python? No, in Python, the type is just called list
. So [1, "hello"]
is legal in Python but illegal in SML. SML has parameterized types.
In Standard ML, the types int list
and string list
and ((int * string) -> bool) list
are all different, but they are all instances of the parameterized type 'a list
. Lots of languages are like this. Here are some simple examples:
Language | Example Parameterized Type | Example Instantiated Type | Notes |
---|---|---|---|
Standard ML | 'a list | string list | Type variables are 'a , 'b , 'c and so on. If the instantiating type must admit equality, then we write ''a , ''b , ''c , and so on.
|
'a * 'b | string * bool
| ||
'a -> 'b -> 'a | int->(bool*int)->int | ||
Haskell | [a] a->b | List Int Int->String | Types begin with a capital letter and type variables with a lowercase letter. |
Java | List<T> | List<String> | |
Swift | Set<T> | Set<String> |
By the way, type parameters do not have to be just a simple type variable.
When are two types the same?
typedef struct { int a; int b; } Point; typedef struct { int a; int b; } Pair; Point x; Pair y;
Do x
and y
have the same type? Should we say “yes” (because they have the same structure), or “no” (because their types have different names, and furthermore appear in different declarations)? These two approaches to determining whether types are the same are structural and named equivalence:
Structural Equivalence | Name Equivalence |
---|---|
Check equivalence by expanding structures all the way down to basic types | Strict: Every type declaration defines a new type |
Loose: Factor out aliases |
Here is an example from Michael L. Scott:
type alink = pointer to cell; subtype blink = alink; p, q : pointer to cell; r : alink; s : blink; t : pointer to cell; u : alink; -- If structural: [p q r s t u] -- If strict name: [p q] [r u] [s] [t] -- If loose name: [p q] [r s u] [t]
Loose name equivalence is pretty common: we like to distinguish things like Point
and Pair
above but want the flexibility of aliasing. In Ada we can do both explicitly:
type Height is new Integer; type Weight is new Integer; -- Can’t add heights and weights subtype Height is Integer; subtype Weight is Integer; -- Now you can freely mix them
ML seems to use structural equivalence:
type pair = int * int; type point = int * int; val x: point = (1, 4); val y: pair = x; type person = {name: string, age: int}; type school = {name: string, age: int}; val p: person = {name="Alice", age=22}; val s: school = p;
But that’s because type
does not define a new
type! Only a datatype or abstype declaration creates a new type.
abstype person = P of string * int with fun new_person (s, i) = P(s, i) fun name (P(s, i)) = s fun age (P(s, i)) = i end;
When can a value of type $S$ be used in a context that expects type $T$?
Possible answers:
If an $S$ can be used where a $T$ is expected, we say $S$ is compatible with $T$.
By the way, what is coercion? In:
int a; float b; float c; c = a + b;
We may ask whether this is:
c = int_to_float(a) + b
?
A language that allows this is said to coerce ints to floats, and we say "int is compatible with float".
Important definitions:
class Animal { ... } class Dog extends Animal { ... } class Canine = Dog | Wolf class Famous = Notable & Popular
Then we should be able to assign a:
Dog
to an Animal
Dog
to a Canine
Notable
to a Famous
(Dog)=>Dog
to a (Dog)=>Animal
(Animal)=>Dog
to a (Dog)=>Dog
(Animal)=>Animal
to a (Dog)=>Animal
(Dog)=>Animal
to a (Dog)=>Animal only
but none of these work the other way around.
How do we determine the type of an expression?
ML seems to do it the most general way possible. It looks at all the types of the primitive expressions, then at the types of arguments that the subroutines and operators expect, and work your way out to the whole expression.
1 fun fib (n) = 2 let fun fib_helper (f1, f2, i) = 3 if i = n then f2 4 else fib_helper (f2, f1+f2, i+1) 5 in 6 fib_helper (0, 1, 0) 7 end;
If there’s not enough information to resolve to a known type, ML’s inferencer will bring in type variables. Note how clever it is:
fun I x = x; 'a -> 'a fun p x y = y 'a -> 'b -> 'b fun first (x, y) = x; 'a * 'b -> 'a fun q x = (hd o hd) x; 'a list list -> 'a fun c x y = if x = y then "y" else "n"; ''a -> ''a -> string
In ML, a type variable beginning with two primes can only be instantiated with a type that admits equality.
ML’s inferencing is more powerful than that of other languages, Go, Rust, and Scala require that you put types on parameters.
scala> var x = 3;
x: Int = 3
scala> var y = x + 3;
y: Int = 6
scala> def f(x) = x + 3;
<console>:1: error: ':' expected but ')' found.
def f(x) = x + 3;
^
scala> def f(x:Int) = x + 3;
f: (x: Int)Int
What if you use an expression in a manner inconsistent with its type, like trying to compute the age of a string (instead of a person). Should the evaluation result in an error, or return a "best guess"?
Strong Typing | Weak Typing |
---|---|
Type clashes among unrelated types result in errors. they may be compile time, or even run time (e.g. NoSuchMethodError, or ClassCastException) but they are errors. | Type clashes don’t really exist... the language implementation will try to cast the argument into some reasonable (!) type and carry out the operation. |
Alternate definition: you can’t really circumvent the type system. Example: In Java, even if you explicitly add a cast to cast a string to an int, you get an error. | Alternate definition: you can easily circumvent the type system. Example: In C++, you can cast a pointer to an int to a void*, then to a string*, and you get away with it. |
Some people would argue strong vs. weak typing isn’t a terribly useful thing to worry about. the static/dynamic dimension is a bigger deal.
Static Typing | Dynamic Typing |
---|---|
Type checking done at compile time
Once a variable’s type is known, it can only be assigned expressions of that type (including related types that can be coerced to it, of course). | Type checking done at run time
Because checking is deferred until run-time a variable can be assigned an expression of one type and then later an expression of another type. var x; x = 2; print x + 5; x = "dog"; print concat(x, "house"); |
It is common for languages to have some static typing and some dynamic typing.
Manifest Typing | Implicit Typing |
---|---|
Types of variables must be given in their
declarations.
void f(int x, list<int> z) { int y = x * x; z.push_back(y); } | the types of variables will be inferred from context
fun f x z = let y = x * x in z @ [y] endx must be an int because of the "*" operator, so y must be an int as well, and then z must be an int list, and finally f must be int -> int list -> int list. |
In order to do static typing, you must sometimes specify types when declaring variables, functions, parameters, etc. How is this done?
Suppose f has type 'a * int -> 'b * 'b -> string
.
then the following expressions are legal
f(4, 5)("sd","de") f(1,1)(2,2) f([],5)([],[4,4])
but these are not
f(3,2)(4,"p") f((3,2),5)([5.5,2.2],[1,6])
the literature on strong/weak, static/dynamic, and manifest/implicit typing is enormous, as are the debates and flame wars. Some good reading:
Does C++ have dependent types? I mean, you can do:
template <int n> class BoundedList { public: int contents[n]; }
but all types that instantiate this template are created at compile time.
Languages that fully support dynamic dependent types can eliminate many logic errors at compile time! (The type system essentially lets you formulate compile-time proofs that a program will have a certain run time behavior, in terms of the values it will produce.)
Start your study of dependent types at this Stack Overflow question then go to Wikipedia.
Lots of variety here:
Decimal
In type theory, an enumeration is a type with a fixed number of distinct values, ordered from smallest to largest. Typical examples are:
type TrafficLight is (RED, AMBER, GREEN)
type Direction is (NORTH, EAST, SOUTH, WEST)
The Boolean type is an enum of the values False and True.
Many languages have a character type which is essentially an enumeration. Each character’s "value" is its code point.
Some languages allow you to make types such as:
Suppose we have two types, Boolean and Byte, where:
Now we form new types:
Note Boolean has 2 values, Byte has 256 values. Boolean + Byte has 2+256 = 258 values, and Boolean × Byte has 2 * 256 = 512 values. Boolean + Byte is a sum type and Boolean × Byte is a product type.
Product types are very common; Python calls them tuples. Haskell and ML just let you write the product type name directly, e.g. Int * String
.
Often sums and products can be tagged.
Option types are one solution to the billion dollar mistake. Basically an object of type T?
is pretty much the sum type of T
and the type containing a single value representing null.
Tagged product types go by many names: record, struct, object, hash, etc. Values of such types are thought of as key-value pairs.
EQUIVALENCE
)
datatype primary_color = Red | Blue | Green Red primary_color (Red, 4) primary_color * int
Usually by array we mean a collection of elements indexed by integers or some other enumerated type, that has constant time access of any element given its index. Contrast this with a list, which needn’t have constant time access, and with a map (a.k.a dictionary or associative array), which can be indexed by anything, not just integers or related enumerated types.
Static Bounds | Bounds Set at Elaboration | Bounds Can be Changed (Dynamic Shape) |
|
---|---|---|---|
Global Lifetime |
C globals | ||
Local Lifetime |
C locals | Ada locals | |
Arbitrary Lifetime |
Java | Perl, APL |
alloca
syscall to implement
these, giving you automatic dealloaction on block exit.
Several languages (Perl, APL, Fortran 90) have nice syntax for slices. Fortran 90 allows (these examples are from Scott):
matrix(3:6, 4:7) columns 3-6, rows 4-7 matrix(6:, 5) columns 6-end, row 5 matrix(:4, 2:8:2) columns 1-4, every other row from 2-8 matrix(:, /2, 5, 9/) all columns, rows 2, 5, and 9
Good to know: Go uses the term “slice“ to mean something else.
Most (all?) languages have them, but care is needed:
Mathematically, a set is an unordered collection of unique elements.
set
and frozenset
A pointer is, more or less, an object through which you reference some other object. They appear in system languages and low level languages.
int x = 5; int* p = &x; int* q = NULL; int* r = new int; int* s = new int[100]; cout << *p << *r << s[20]; cout << *q; // crash
Here are the crucial things to know:
(R (X () ()) (Y (Z ()()) (W ()())))And from ML:
datatype tree = empty | node of 'a * 'a tree * 'a tree; val x_zw = node ('R', node ('X', empty, empty), node ('Y', node ('Z', empty, empty), node ('W', empty, empty)));
Can be always explicit, always implicit, or sometimes-implicit (like in Ada).
If the pointer is called p (or $p in Perl), then
In | the referent is called | and the field is called |
---|---|---|
C | *p | (*p).x or p->x
|
Ada | p.all | p.all.x or p.x
|
Modula-2 | p^ | p^.x
|
Java | p | p.x
|
Perl | %$p | $$p{x} or $p->{x}
|
If the referent is called p (or %p in Perl), then
In | the pointer is called | and the field is called |
---|---|---|
C | &p | p.x
|
Ada | p'access | p.x
|
Perl | \%p | $p{x}
|
In languages like C requiring explicit allocation and explicit deallocation, it is possible to:
How can we prevent such things?
Covered separately. For now, you can read Wikipedia’s article on garbage collection, and an interesting developerworks article on Java performance and garbage collection.
Sometimes, these ideas get conflated in C. After all:
e1[e2]
is the same as *(e1 + e2)
For definitions these are completely different things:
int *x; /* is totally different from: */ int x[100]; int *a[n]; /* is totally different from: */ int a[n][100];
But for declarations, at least in parameter declarations, you can blur the distinction:
void f(int* a) { ... } void g(int b[]) { ... }
An array of ints, or pointer to an int, can be passed to either.
The term stream is a bit overloaded. Abstractly, a stream:
Sometimes we use the term stream to refer to files.
In Java, streams are their own thing. They work like the streams described above (they are processed sequentially), but are in a pretty large subsystem of their own.
These will be covered later in separate notes.
Subroutines are often objects with their own types. We’ll see these later in separate notes on subroutines.
These will be covered later in separate notes on concurrency.
If a type system is orthogonal, then no type is more special and capable than others. The questions you want to ask to see whether a system is orthogonal or not are: Can an object of any type:
You might be surprised at how un-orthogonal some language’s type systems are.
A related question is: Are types themselves objects?
type
, so yes there.Class
, so yes there too.It’s sometimes undesirable, or maybe even impossible, to get true orthogonality, since:
There is a great deal of theory surrounding our notion of types. Entire books have been written about this theory.
A good place to start on Type Theory is at Wikipedia. The notes you are reading now have focused almost entirely on the more pragmatic issues of types.
You can also read the Type Theory article in the Stanford Encyclopedia of Philosophy.
More on Mathematics and Philosophy
Many philosophers and mathematicians are interested in a field known as the Foundations of Mathematics. There are three theories that try to be THE foundation: Set Theory (generally the ZFC flavor), Type Theory, and Category Theory. They all work okay and are in some sense equally powerful. See this discussion on Philosophy Stack Exchange and the Wikipedia articles on Intuitionistic Type Theory, Homotopy Type Theory, Curry-Howard Isomorphism, and Foundations of Mathematics.
We’ve covered: