Our very intuition makes the concept of types inescapable.
We have an intuition that integers are different from strings which are different from people which are different from books which are different from planets which are, you get the idea. These differences arise from behavior: You can open a socket, but not an integer; you can request the parents of a person, but not a dictionary; you can push an object onto a stack, but not onto character. Intuitively:
A value’s type constrains the way it may be used in a program.
More philosophically:
Types impose constraints on what we can and cannot say.
But actually, types are not only useful, they are actually one of the most fundamental concepts of all, able to serve as the foundation for all of mathematics. It makes sense to begin the study of all things with types. Ask yourself: where do things even come from?
Here is a way to bring things into existence. We start with the concept of a type, then define (exhaustive) rules that create things to inhabit the types. Here is our first type:
$$ \dfrac{}{\textsf{true}\!: \textsf{Bool}} \quad\quad\quad \dfrac{}{\textsf{false}\!: \textsf{Bool}} $$That is, the type $\textsf{Bool}$ is inhabited by exactly two values, $\textsf{true}$ and $\textsf{false}$. No more, no less. We have created two things. And a type.
Now here is the type of natural numbers:
$$ \dfrac{}{0\!: \textsf{Nat}} \quad\quad\quad \dfrac{n\!: \textsf{Nat}}{\textsf{s}\,n\!: \textsf{Nat}} $$This says “(1) $0$ is a natural number, (2) For any natural number $n$, $\mathsf{s\,}n$ is a natural number, and (3) nothing else is a natural number.” So the type of natural numbers is inhabited by $0$, $\mathsf{s\,}0$, $\mathsf{s\,s\,}0$, $\mathsf{s\,s\,s\,}0$, $\mathsf{s\,s\,s\,s\,}0$, and so on. We’ll write these as 0, 1, 2, 3, 4, and so on. These abbreviations should be familiar to you.
Now for the integers:
$$ \dfrac{n\!: \textsf{Nat}}{\textsf{pos}\,n\!: \textsf{Int}} \quad\quad\quad \dfrac{n\!: \textsf{Nat}}{\textsf{neg}\,n\!: \textsf{Int}} $$So the type of integers is inhabited by $\mathsf{pos\,}0$, $\mathsf{neg\,}0$, $\mathsf{pos\,s\,}0$, $\mathsf{neg\,s\,}0$, $\mathsf{pos\,s\,s\,}0$, $\mathsf{neg\,s\,s\,}0$, and so on. We’ll abbreviate these as 0, -0, 1, -1, 2, -2, and so on. But hmmmm, what, then, is the type of 3? We need to figure it out from context, or we can write $3_{\textsf{Nat}}$ or $3_{\textsf{Int}}$ to be explicit.
Here are the rationals:
$$ \dfrac{n\!: \textsf{Int} \quad d\!: \textsf{Nat}}{\textsf{rat}\,n\,d\!: \textsf{Rat}} $$So one-third is $\textsf{rat}\,1\,3$, which we can abbreviate as $\frac{1}{3}$.
We can build lots of numeric types that are useful in programming languages, for example $\textsf{Int8}$, $\textsf{Int16}$, $\textsf{Int32}$, $\textsf{Int64}$, $\textsf{Int128}$, $\textsf{UInt8}$, $\textsf{UInt16}$, $\textsf{UInt32}$, $\textsf{UInt64}$, $\textsf{UInt128}$, $\textsf{Float16}$, $\textsf{Float32}$, $\textsf{Float64}$, $\textsf{Float128}$, $\textsf{Complex64}$, $\textsf{Complex128}$, and so on. Some might require a lot of rules, but they can be defined in principle.
We can make up more types for ourselves. Here is one for primary colors:
$$ \dfrac{}{\textsf{red}\!: \textsf{PrimaryColor}} \quad\quad\quad \dfrac{}{\textsf{green}\!: \textsf{PrimaryColor}} \quad\quad\quad \dfrac{}{\textsf{blue}\!: \textsf{PrimaryColor}} $$You can define the type $\textsf{Unicode}$ whose inhabitants are, you guessed it, the characters of Unicode. That’d be a lot of rules! But it is indeed definable, as there are a finite number of such characters.
Now here is a type for some simple two-dimensional shapes:
$$ \dfrac{r\!: \textsf{Float64}}{\textsf{circle}\,r\!: \textsf{Shape}} \quad\quad\quad \dfrac{w\!: \textsf{Float64}\quad h\!: \textsf{Float64}}{\textsf{rectangle}\,w\,h\!: \textsf{Shape}} \quad\quad\quad \dfrac{a\!: \textsf{Float64}\quad b\!: \textsf{Float64} \quad c\!: \textsf{Float64}}{\textsf{triangle}\,a\,b\,c\!: \textsf{Shape}} $$The idea is that $\mathsf{circle}\,5.0$ is a circle with radius 5.
Now let’s build types from types. If $t_1$ and $t_2$ are types, then $t_1 \times t_2$ is a type, defined like so:
$$ \dfrac{v_1\!: t_1 \quad v_2\!: t_2}{(v_1, v_2)\!: t_1 \times t_2} $$The type $t_1 \times t_2$ is a pair type, which is a special case of a product type, also known as a tuple type. Tuples are not limited to two component types:
$$ \dfrac{v_1\!: t_1 \quad \cdots \quad v_n\!: t_n}{(v_1, \ldots, v_n)\!: t_1 \times \cdots \times t_n} $$Examples: $(2, \mathsf{false}, (\mathsf{true}, \frac{3}{5}))\!: \mathsf{Int \times Bool \times (Bool \times Rat)}$
Can we have a product type with zero components? We can. In fact there is only one such type. With only one value. The type is called $\textsf{Unit}$:
$$ \dfrac{}{()\!: \textsf{Unit}} $$Next is the type “list of elements of type $t$” which we will write $t^*$:
$$ \dfrac{}{[\,]\!: t^*} \quad\quad\quad \dfrac{x\!: t \quad y\!: t^*}{(x :: y)\!: t^*} $$We will write ${[x]}$ for $(x\,\textbf{::}\,[\,])$, ${[x,y,z]}$ for $(x\,\textbf{::}\,(y\,\textbf{::}\,(z\,\textbf{::}\,[\,])))$, and so on, again relying on context to know the actual type of the term $[\,]$.
Imagine what the type $\textsf{Unicode}^*$ is. Got it? That’s right! We’ll use the abbreviation $\textsf{String}$ for this type.
Now let’s do “optional of type $t$” which we will write $t\texttt{?}$:
$$ \dfrac{}{\textsf{none}\!: t\texttt{?}} \quad\quad \dfrac{x\!:t}{\textsf{some}\;x\!: t\texttt{?}} $$We thus have $\textsf{some}\:3\!:\mathsf{Nat}\texttt{?}$ and $\textsf{some}\:(-3,T)\!:(\mathsf{Int \times Bool})\texttt{?}$. When writing $\textsf{none}$, we must rely on context to know which type is being referred to, or you can use subscripts to be explicit, e.g., $\mathsf{none_{Int}}$.
Finally, we'll define the type $\textsf{Void}$ to be the type with no inhabitants. As there are no inhabitants, there are no formation rules.
Don’t confuse Void and UnitThe type $\textsf{Void}$ has no inhabitants at all. The type $\textsf{Unit}$ has a single inhabitant, namely $()$.
Although this shouldn’t be confusing, some programming languages, like Swift, actually call the unit type
Void
.Oh well.
There’s more to types than just which items inhabit the types. Types have behavior. We capture behavior by defining functions on the types. To define a function you show how to map each element of one type (the domain) to an element of another type (the codomain). A mapping must exist for each rule that defines the domain type. Here are some examples:
$\begin{array}{l} \textsf{not}\!: \textsf{Bool} \rightarrow \textsf{Bool} \\ \textsf{not}\;\textsf{true} = \textsf{false} \\ \textsf{not}\;\textsf{false} = \textsf{true} \end{array}$
$\begin{array}{l} \textsf{isZero}\!: \textsf{Nat} \rightarrow \textsf{Bool} \\ \textsf{isZero}\;0 = \textsf{true} \\ \textsf{isZero}\;(\textsf{s}\,n) = \textsf{false} \end{array}$
$\begin{array}{l} \textsf{plusTwo}\!: \textsf{Nat} \rightarrow \textsf{Nat} \\ \textsf{plusTwo}\;0 = \textsf{s}\,\textsf{s}\,0 \\ \textsf{plusTwo}\;(\textsf{s}\,n) = \textsf{s}\,\textsf{s}\,\textsf{s}\,n \end{array}$
While definition by cases is the proper way to define functions, we can sometimes use an alternative notation:
$$ \textsf{plusTwo} = \lambda n_{\textsf{Nat}}.\,\textsf{s}\,\textsf{s}\,n $$Did you notice something? Functions are actually values that inhabit function types. Woah. Maybe you noticed this by the way we wrote $\textsf{not}\!: \textsf{Bool} \rightarrow \textsf{Bool}$. Not an accident! $\textsf{Bool} \rightarrow \textsf{Bool}$ is indeed a type. In fact, for any types $t_1$ and $t_2$,
$t_1 \rightarrow t_2$ is a type.
This is good, because let’s try to define the functions $\textsf{and}$ and $\textsf{or}$. So what is $\mathsf{and\,true}$ and $\mathsf{and\,false}$? Well we know we want:
$\begin{array}{l} (\textsf{and}\;\textsf{true})\;\textsf{true} = \textsf{true} \\ (\textsf{and}\;\textsf{true})\;\textsf{false} = \textsf{false} \\ (\textsf{and}\;\textsf{false})\;\textsf{true} = \textsf{false} \\ (\textsf{and}\;\textsf{false})\;\textsf{false} = \textsf{false} \end{array}$
So $\textsf{and true}$ is the function that just produces its argument, and $\textsf{and false}$ is the function that always produces $\textsf{false}$. So we can write:
$\begin{array}{l} \textsf{and}\!: \textsf{Bool} \rightarrow (\textsf{Bool} \rightarrow \textsf{Bool}) \\ \textsf{and}\;\textsf{true} = \lambda x_{\textsf{Bool}}.\,x \\ \textsf{and}\;\textsf{false} = \lambda x_{\textsf{Bool}}.\,\textsf{false} \end{array}$
Here is the function $\textsf{or}$:
$\begin{array}{l} \textsf{or}\!: \textsf{Bool} \rightarrow (\textsf{Bool} \rightarrow \textsf{Bool}) \\ \textsf{or}\;\textsf{true} = \lambda x_{\textsf{Bool}}.\,\textsf{true} \\ \textsf{or}\;\textsf{false} = \lambda x_{\textsf{Bool}}.\,x \end{array}$
Here is an amazing function, known as the conditional. Notice its type. It has a type variable $t$ meaning (at least in this case) that it behaves the same way for all types (meaning you can substitute any real, concrete type for $t$):
$\begin{array}{l} \textsf{cond}\!: \textsf{Bool} \rightarrow (t \rightarrow (t \rightarrow t)) \\ \textsf{cond}\;\textsf{true} = \lambda x_{t}.\,\lambda y_{t}.\,x \\ \textsf{cond}\;\textsf{false} = \lambda x_{t}.\,\lambda y_{t}.\,y \end{array}$
In case that’s hard to read, we can simplify our definition a bit, using fewer $\lambda$ symbols and fewer parentheses:
$\begin{array}{l} \textsf{cond}\!: \textsf{Bool} \rightarrow t \rightarrow t \rightarrow t \\ \textsf{cond}\;\textsf{true}\,x\,y = x \\ \textsf{cond}\;\textsf{false}\,x\,y = y \end{array}$
The convention is that when writing function types, $t_1 \rightarrow t_2 \rightarrow t_3$ is the same as $t_1 \rightarrow (t_2 \rightarrow t_3)$. And when writing function calls, $f\,x\,y$ is the same as $(f\,x)\,y$.
Here is how we add natural numbers:
$\begin{array}{l} \textsf{plus}\!: \textsf{Nat} \rightarrow \textsf{Nat} \rightarrow \textsf{Nat} \\ \textsf{plus}\;0\;m = m \\ \textsf{plus}\;(\textsf{s}\,n)\;m = \textsf{s}\,(\textsf{plus}\;n\;m) \end{array}$
And here is how we multiply natural numbers:
$\begin{array}{l} \textsf{times}\!: \textsf{Nat} \rightarrow \textsf{Nat} \rightarrow \textsf{Nat} \\ \textsf{times}\;0\;m = 0 \\ \textsf{times}\;(\textsf{s}\,n)\;m = \textsf{plus}\;m\;(\textsf{times}\;n\;m) \end{array}$
Ok hold up! It’s time to introduce some notation to make things a bit more familiar. From now on, we’re going to write:
$\begin{array}{lll} \neg b & \text{for} & \textsf{not}\;b \\ a \land b & \text{for} & \textsf{and}\;a\;b \\ a \lor b & \text{for} & \textsf{or}\;a\;b \\ \textsf{if}\;b\;\textsf{then}\;x\;\textsf{else}\;y & \text{for} & \textsf{cond}\;b\;x\;y \\ n + 1 & \text{for} & \textsf{s}\;n \\ n + m & \text{for} & \textsf{plus}\;n\;m \\ n \times m & \text{for} & \textsf{times}\;n\;m \\ \textsf{let}\;x = e\;\textsf{in}\;e' & \text{for} & (\lambda x.\,e')\;e \end{array}$
Relax, these are just abbreviations. The basic low-level notations of our theory still exist.
Back to example function definitions. The length of a list:
$\begin{array}{l} \textsf{length}\!: t^* \rightarrow \textsf{Nat} \\ \textsf{length}\;[\,] = 0 \\ \textsf{length}\;(x\,\textbf{::}\,y) = (\textsf{length}\;y) + 1 \end{array}$
Appending two lists:
$\begin{array}{l} \textsf{append}\!: t^* \rightarrow t^* \rightarrow t^* \\ \textsf{append}\;[\,]\;z = z \\ \textsf{append}\;(x\,\textbf{::}\,y)\;z = x\,\textbf{::}\,(\textsf{append}\;y\;z) \end{array}$
Reversing a list:
$\begin{array}{l} \textsf{reverse}\!: t^* \rightarrow t^* \\ \textsf{reverse}\;[\,] = [\,] \\ \textsf{reverse}\;(x\,\textbf{::}\,y) = \textsf{append}\;(\textsf{reverse}\;y)\;[x] \end{array}$
Mapping a function over a list:
$\begin{array}{l} \textsf{map}\!: (t_1 \rightarrow t_2) \rightarrow t_1^* \rightarrow t_2^* \\ \textsf{map}\;f\;[\,] = [\,] \\ \textsf{map}\;f\;(x\,\textbf{::}\,y) = (f\;x)\,\textbf{::}\,(\textsf{map}\;f\;y) \end{array}$
Filtering a list:
$\begin{array}{l} \textsf{filter}\!: (t \rightarrow \textsf{Bool}) \rightarrow t^* \rightarrow t^* \\ \textsf{filter}\;p\;[\,] = [\,] \\ \textsf{filter}\;p\;(x\,\textbf{::}\,y) = \textsf{if}\;(p\;x)\;\textsf{then}\;(x\,\textbf{::}\,(\textsf{filter}\;p\;y))\;\textsf{else}\;\textsf{filter}\;p\;y \end{array}$
Sometimes it’s helpful to think of an optional as kind of like a sequence of 0 or 1 element. In this case, functions like $\textsf{map}$ make sense to apply to optionals:
$\begin{array}{l} \textsf{map}\!: (t_1 \rightarrow t_2) \rightarrow t_1\texttt{?} \rightarrow t_2\texttt{?} \\ \textsf{map}\;f\;\textsf{none} = \textsf{none} \\ \textsf{map}\;f\;(\textsf{some}\;x) = \textsf{some}\;(f\;x) \end{array}$
This gives us an elegant way to define a function for determining the index of the first occurrence of an element in a list:
$\begin{array}{l} \textsf{indexOf}\!: t \rightarrow t^* \rightarrow \textsf{Nat}\texttt{?} \\ \textsf{indexOf}\;x\;[\,] = \textsf{none} \\ \textsf{indexOf}\;x\;(y\,\textbf{::}\,ys) = \\ \quad \quad \textsf{if}\;x = y\;\textsf{then}\;\textsf{some}\;0\;\textsf{else}\;\textsf{map}\;(\lambda n.\,n+1)\;(\textsf{indexOf}\;x\;ys) \end{array}$
We’ll create example invocations for each of these.
Philosophy question: should a thing inhabit just one type? In some mathematically-oriented theories they do. Pragmatically, though, it makes sense to allow things to be in more than type. Like $3$. Why not let it inhabit $\textsf{Nat}$ and $\textsf{Int}$ and maybe even more numeric types? If we allow this, then we might start to notice that every inhabitant of $\textsf{Nat}$ also inhabits $\textsf{Int}$. That is, $\textsf{Nat}$ is a subtype of $\textsf{Int}$. We write:
$$ \textsf{Nat} \; \texttt{<:} \; \textsf{Int} $$Conversely, $\textsf{Int}$ is a supertype of $\textsf{Nat}$.
Let’s get super technical:
$t_1 \; \texttt{<:} \; t_2$ if and only if for every $v$ such that $v\!: t_1$, we have $v\!: t_2$.
If we allow individuals to inhabit multiple types, then it makes sense to define union types. The union $t_1 \mid t_2$ is inhabited by exactly all the inhabitants of $t_1$ and those of $t_2$. Yes, it’s possible for $t_1$ and $t_2$ to overlap. So the type $\textsf{Nat} \mid \textsf{Int}$ is just $\textsf{Int}$.
But the type $\textsf{Int} \mid \textsf{String}$ is perhaps interesting. Maybe you can find a use for it?
In case you were wondering, yes, if you have union types, you can also have intersection types, also. We write them like this: $t_1\:\texttt{&}\:t_2$.
Are you getting set theory vibes?Please calm down.
Types are not sets.
TYPES ARE NOT SETS.
We’ll talk about sets later. Please ffs do not let any preconceived notions of sets cloud your learning of the beauty of types. Keep. your. focus. on. types.
Here are two types, $C$ (for color, with 3 inhabitants $r$, $g$, and $b$) and $D$ (for direction, with 2 inhabitants $l$ and $r$):
$$ \dfrac{}{r\!: C} \quad\quad\quad \dfrac{}{g\!: C} \quad\quad\quad \dfrac{}{b\!: C} $$ $$ \quad\quad\quad \dfrac{}{l\!: D} \quad\quad\quad \dfrac{}{r\!: D} $$The inhabitants of $C \times D$ are:
$$ (r,l),\;(r,r),\;(g,l),\;(g,r),\;(b,l),\;(b,r) $$Did you notice that $C$ has 3 inhabitants and $D$ has 2 inhabitants and $C\times D$ has $3 \times 2 = 6$ inhabitants? See why we call these product types? Right? RIGHT?
Since we have products, maybe we should have sums. We’d like $C + D$ to be a type with $3 + 2 = 5$ inhabitants. You might think a union would work, but wait, it doesn’t because the union ($C \mid D$) only has four elements: $r$, $g$, $b$, and $l$. A true sum type needs to have two distinct $r$'s. We have to tag the “source” of each element of the sum. So the type $C+D$ has these inhabitants:
$$ \mathsf{tag_1}r,\;\mathsf{tag_1}g,\;\mathsf{tag_1}b,\;\mathsf{tag_2}l,\;\mathsf{tag_2}r $$In general, we define a sum type like so:
$$ \dfrac{v_1\!: t_1 \quad \cdots \quad v_n\!: t_n }{\mathsf{tag_1}v\!: t_1 + \cdots + t_n \quad\quad \cdots \quad\quad \mathsf{tag_n}v\!: t_1 + \cdots + t_n} $$If $n = 0$ we have the sum type over no types. It’s the type $\textsf{Void}$ referred to above!
Does this remind you of basic number theory?Void is the empty sum type, the sum of no types, and has zero elements. Just like the number 0 is the sum over a list of no elements, the additive identity.
Unit is the empty product type, the product of no types, and has one element. Just like the number 1 is the product over a list of no elements. The multiplicative identity.
In some theories the void type is actually called $\mathbf{0}$ and the unit type is actually called $\mathsf{1}$.
So, we have sums and products, but what about exponential types? Could there be types $C^D$ and $D^C$? There are, and it’s kind of interesting what they turn out to be. Let’s list all the inhabitants of $C \rightarrow D$:
$$ \begin{array}{llll} (000) & r \mapsto l, & g \mapsto l, & b \mapsto l \\ (001) & r \mapsto l, & g \mapsto l, & b \mapsto r \\ (010) & r \mapsto l, & g \mapsto r, & b \mapsto l \\ (011) & r \mapsto l, & g \mapsto r, & b \mapsto r \\ (100) & r \mapsto r, & g \mapsto l, & b \mapsto l \\ (101) & r \mapsto r, & g \mapsto l, & b \mapsto r \\ (110) & r \mapsto r, & g \mapsto r, & b \mapsto l \\ (111) & r \mapsto r, & g \mapsto r, & b \mapsto r \\ \end{array} $$There are 8 of these, which is not coincidentally $2^3$, exactly the number of inhabitants of $D$ raised to the power of the number of inhabitants of $C$. Therefore, $C \rightarrow D$ is sometimes written $D^C$.
It works the other way, too. $D \rightarrow C$ can be written $C^D$, as it has $3^2 = 9$ inhabitants:
$$ \begin{array}{lll} (00) & l \mapsto r, & r \mapsto r \\ (01) & l \mapsto r, & r \mapsto g \\ (02) & l \mapsto r, & r \mapsto b \\ (10) & l \mapsto g, & r \mapsto r \\ (11) & l \mapsto g, & r \mapsto g \\ (12) & l \mapsto g, & r \mapsto b \\ (20) & l \mapsto b, & r \mapsto r \\ (21) & l \mapsto b, & r \mapsto g \\ (22) & l \mapsto b, & r \mapsto b \\ \end{array} $$And there you have it: function types are the exponential types. 🤯
If you have a function $A$ of type $t \rightarrow \mathsf{Bool}$, then you can imagine collecting together, in an object, all of the inhabitants $x$ of type $t$ for which $A\,x = \textsf{true}$. That’s a set. That’s what it is.
A lot of people write sets with curly braces.
Function Notation | Set Notation |
---|---|
$\lambda x_{\textsf{Nat}}.\,x=0$ | $\{0\}$ |
$\lambda x_{\textsf{Nat}}.\,x=2 \vee x=3 \vee x=5$ | $\{2, 3, 5\}$ |
$\lambda x_{\textsf{Nat}}.\,\textsf{true}$ | $\mathbb{N}$ |
$\lambda x_{\textsf{Nat}}.\,x > 5$ | $\{ x \in \mathbb{N} \mid x > 5 \}$ |
$\lambda x_{\textsf{Int}}.\,\textsf{true}$ | $\mathbb{Z}$ |
$\lambda x_t.\,\texttt{false}$ | $\varnothing$ |
$(A\,x)_{\textsf{Bool}}$ | $x \in A$ |
There are some theories of mathematics that start with sets. In computer science, theories that begin with types tend to be more useful.
The way a language defines and uses its types comprises the language’s type system. In defining a type system, a language designer needs to answer questions such as:
Most programming languages have one or more of the following kinds of types:
void
; in others it is called never
; and still other names may be used. Why the name never
? In programming, we can have functions that never return because they enter an infinite loop or throw an error.boolean
, the type containing the values true and false, with operations such as and, or, xor, and not.atom
: The type of named symbols. Atoms just have name, and that’s it. You don’t really operate on them (other than testing them for equality perhaps); you just pass them around. They just exist. Some languages call the type of atoms symbol
.number
,
int8
,
int16
,
int32
,
int64
,
int128
,
int
,
uint8
,
uint16
,
uint32
,
uint64
,
uint128
,
uint
,
bigint
,
fixed
,
float32
,
float64
,
float128
,
complex64
,
complex128
,
ratio
,
decimal
,
bigdecimal
.
char
: The type of “characters”, i.e., units of textual information. Exactly what a character is might differ from language to language.string
is a kind of sequence, but there’s not a lot of agreement about what exactly strings are sequences of. Are they sequences of bytes? character codes? code points? unicode scalars? graphemes? String types are generally broken. You don't need them.unit
. In general, product types are called tuple types. If the components are labeled (named), we get records.enum
keyword. Occasionally the word union
may be employed but this not in agreement with the mainline theory: sums must be tagged and unions are not. If $n=0$ we get the empty type, void
.Interestingly, the way types are naturally constructed give rise to sum types. Booleans, numbers, options, and lists, seen at the top of these notes, are essentially sum types. Here are two more:
$$ \dfrac{v\!: t_1}{\textsf{left}\;v\!: \textsf{Either}\;t_1\;t_2} \quad\quad\quad \dfrac{v\!: t_2}{\textsf{right}\;v\!: \textsf{Either}\;t_1\;t_2} $$and
$$ \dfrac{v\!: t_1}{\textsf{success}\;v\!: \textsf{Result}\;t_1\;t_2} \quad\quad\quad \dfrac{v\!: t_2}{\textsf{failure}\;v\!: \textsf{Result}\;t_1\;t_2} $$|
operator, or a construct that actually uses the keyword union
.type
, whose members are...types! If you venture into this world in the context of studying Type Theory and the Foundations of Mathematics, there’s tricky stuff happening around here. But in many programming languages, they just make the type type
and all is well. But there are other languages that stick close to the theory and deal with universes or kinds as a way of avoiding Girard’s Paradox. You probably haven’t used them before, though. Or have you?unknown
. Other languages call it any
(though “any” generally has a completely different meaning in gradually-typed languages, so be careful!)Let’s attempt a simplification, and show some examples from various languages:
Kind of Type | Notation | Notes | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Empty | $\perp$ | A type with no members at all. A bottom type (meaning it is a subtype of all types). Sometimes called Void . | ||||||||||
Singleton | $x$ | A type with only a single member. Examples:
| ||||||||||
Union | $T_1\;|\;T_2\;|\,...|\;T_n$ | A type made up of all the elements of other types. Examples:
| ||||||||||
Sum | $x_1: T_1 + \ldots + x_n: T_n$ | Similar to a union, but the elements are tagged so we “know where they came from” and there are no overlaps. Also called a Tagged Union or a Discriminated Union. Examples:
| ||||||||||
Tuple | $T_1 \times \ldots \times T_n$ | The unlabeled product type. Examples:
| ||||||||||
Record | $(x_1: T_1,..., x_n: T_n)$ | The labeled product type, also known as a Struct. Examples:
| ||||||||||
Sequence | $T*$ | a.k.a. Array or List. Examples:
| ||||||||||
Function | $T_1 \rightarrow T_2$ | A function from $T_1$ to $T_2$. Use $\perp$ or a product type for $T_1$ to simulate zero or multiple arguments, and $\perp$ or a product type for $T_2$ to simulate zero or multiple return values. Examples:
| ||||||||||
Top | $\top$ | All values are a member of this type. The “type of everything.” A top type (meaning all types are a subtype of it). Examples:
|
Singletons and unions give us another way to look at certain types:
true | false
for singleton types true
and false
.T?
or Optional<T>
) are pure unions, such as T | null
for any type T
; but in others, they are sum types with constructors some
and none
(as in Rust) or Just
and Nothing
(as in Haskell).In case you haven’t seen singleton or union types before, you might not have seen TypeScript.
Types have behaviors, too, so they can be classified. One approach is to introduce typeclasses as a means of describing the structure or capabilities of types. Some examples follow:
A type is equatable, or is an eqtype, if values of the type can be tested for equality (==
).
A type is comparable (a.k.a. ordered) if values of the type can be compared with <
(and often also with <=
, >
, and >=
, and often <=>
).
A type is bounded if there is a minimum value of the type and a maximum value of the type.
A type is showable, or is a showtype if values of the type can be converted to strings.
A type is callable if values of its type can be called on, or applied to, other values.
A type is enumerable if it has successor and predecessor operations on its values.
enum
things, like Swift. Those concepts are completely different.A type is monoidal, or is a monoid, if it has an associative composition operator, and an identity operator for the composition.
x => x
(JS notation).A type is monadic, or is a monad, if values of the type “wrap” underlying values in some way (e.g. optional, list), and there is an operator to wrap a value, and an associative composition operator on functions that unwrap a value and do something to it to produce a new wrapped value, such that the wrapper is an identity for the composition.
Typeclasses in Practice
Languages vary greatly in their treatment of typeclasses. Haskell has a distinguished typeclass concept. Swift implements them via protocols, and other languages just use their own interface or mixin construct.
In common, informal speech, a language may have a set of static types and a set of dynamic types. The static types are used at compile-time to look for type-clashes and if found, reject the program as statically illegal. The dynamic types are carried around with each value at run time and are constantly checked every time the value is used.
This leads some people to avoid the use of term “dynamic type” altogether and refer to these things as tags or classifiers or classes.
There are quite a lot of tradeoffs to discuss here. Let’s talk about them.
It’s best here to discuss how specific languages do things, before we can make any general theoretical claims.
nil
, boolean
, number
, string
, function
, thread
, userdata
, and table
. The type of any value is available to you at run time using the type
function. All types are dynamic types; there are no static types.When you query for a type at run-time, Lua gives you back a string:
type(3) == 'number' type('hello') == 'string' type({ x=1, y=2 }) == 'table' type(type(3)) == 'string'
atom
, integer
, float
, binary
, ref
, pid
, port
, function
, list
, tuple
, map
. You can’t ask for the type of an expression, but there are built-in functions to tell you whether or not an expression has a type:
is_atom(ten). is_integer($a). is_float(-3.55e-8). is_function(fun (X) -> X*X end). is_reference(make_ref()). is_tuple({dog, "Nika", 5, 'G-SHEP'}). is_list("a string").
This is probably a good thing, because as we’ll see, values can have multiple types at the same time. So asking for the type of an expression might be an indication of a type system that is less sophisticated than it could be.
Number
, BigInt
, Boolean
, String
, Symbol
, Null
, Undefined
, and Object
. But it’s more complicated. The built-in typeof
operator can distinguish functions and regexes, and Array.isArray
can tell you if something is an array. When you define your own types via prototypes or the class
syntax, the constructor
property is doing typechecking, too. All types are dynamic types. There are no static types.
typeof undefined // 'undefined' typeof 3 // 'number' typeof 3n // 'bigint' typeof true // 'boolean' typeof 'dog' // 'string' typeof(Symbol('dog')) // 'symbol' typeof { x: 1, y: 2 } // 'object' typeof (x => x + 1) // 'function' typeof null // 'object' 😱😱😱😱😱😱😱😱 typeof /a*b/ // 'object' /a*b/.constructor === RegExp // true typeof [1,2,3] // 'object' [1,2,3].constructor === Array // true Array.isArray([1,2,3]) // true class Dog { bark() { return 'woof'; } } class Rat { squeak() { return 'peep'; } } const d = new Dog(); const r = new Rat(); d.constructor // Dog r.constructor // Rat typeof d // 'object' typeof r // 'object' typeof Dog // 'function' typeof Array // 'function' typeof Function // 'function'
type
. And, in fact, type
is a value whose type is... yes, type
!
type(3) == int type('hello') == str type({ 'x':1, 'y':2 }) == dict type(int) == type type(type) == type type(type(type(type(3)))) == type
Python has a really rich set of dynamic types. But, interestingly, it has static types, too, but they are not checked by Python itself, but only by an external type checker. More on this later.
By the way, languages with only static types may actually have a typeof
operator, but don’t be fooled—this doesn’t do any run-time calcs: the call compiles down into returning the known static type.
int
as a value. What problems do you run into?
Okay what have we learned by looking into these specific languages? It appears that most languages distinguish their static and dynamic types, while just a few go all-in on one but not the other. We also learned that the two type systems of a language have really different purposes. And also that they are kind of different things.
Are dynamic types even types?Some folks say typing by its very nature is a static thing. All types are static. Dynamic types aren’t really types, but rather tags that you check at run time. A language that people call “dynamically typed” really just has one giant sum type at compile time: the dynamic “types” are just the tags of the single sum type. This camp would say such a language is unityped (it has just one type).
The other camp says “c’mon we’re all calling these run-time things types so let’s just live with the overloading of the term.”
Make sure to read Bob Harper’s article on unityped languages and this discussion in which the authors of the Julia language defend using “type” even when used at runtime.
So what are we going to do? Let’s just try to be clear. But it will be hard. We might even make mistakes. Consider these notes under construction.
By the way, for a lot of languages, but not all of them, dynamic types aren’t called types at all. They use the word class. Ruby is one of these:
3.class == Integer 'hello'.class == String {x: 1, y: 2}.class == Hash Integer.class == Class Class.class == Class
Java, too, has objects representing classes:
new Integer(3).getClass() // class java.lang.Integer int.class // int "hello".getClass() // class java.lang.String new int[]{1, 2, 3}.getClass() // class [I String.class // class java.lang.String "hi".getClass().getClass() // class java.lang.Class
Here’s a table showing that yes, it’s best to think of classes as dynamic (even if a language can do compile time checks), since class information is usually carried around at run time.
TYPE | CLASS |
---|---|
Is all about behavior | Is about both structure and behavior |
Is concerned with what objects can do and how they behave | Is a factory for producing objects |
An object can have many types | An object belongs to exactly one class (the class that created it) |
Is about interfacing and usage | Is about construction, and things like fields and methods |
Is considered by many to be exclusively a compile-time concept | Is essentially a run-time tag for run-time type checking |
String
, Comparable
, and Object
. But only one class: String
.
String s = "hello"; // "hello" inhabits String Comparable c = s; // "hello" inhabits Comparable Object o = s; // "hello" inhabits Object System.out.println(o.getClass().getName()); // THE class of "hello" is String
In most languages, declaring a class gives rise to a type. The exceptions to this rule are JavaScript and CoffeeScript, where the word class
does not make a type: instead it is a funny way of declaring a function and a prototype.
We’re ready to define what a class isA class is (1) an object factory, that is, an entity used for constructing objects. All objects created by a class remember the class that created them at run time; therefore, a class is also (2) a run time classifier tag.
A language may keep track of the class that created an object at compile-time and use it for traditional static type checking.
Terminology may be inconsistentLanguage features like interfaces, sometimes called mixins or protocols, are definitely types rather than classes. You do not use those things for creating objects.
But beware, you may hear Java people saying that an interface is a kind of “class.” Java lingo abuses the notion of class. They are probably talking about “class files.” But in the traditional sense, an interface is not a class, because a class is an object factory.
In many languages, you get a set of basic types and mechanisms for making new types. Some examples:
Language | Basic Types | Type formers |
---|---|---|
C | ||
Java | ||
Standard ML | ||
Python | ||
Go | ||
Rust | ||
Swift | ||
Haskell |
Here are some examples from Standard ML:
7 (* int *) (4, "dog") (* int * string *) [5, 3, 2, 7] (* int list *) [ [], [3,3,3], [], [1] ] (* int list list *) [(3, 4.0),(1, 5.5)] (* (int * real) list *) (5, [2.2, 1.0E~4]) (* int * real list *) fn x => x + 8 (* int -> int *) fn x => fn y => x ^ Int.toString(y) (* string -> int -> string *)
But check out this Python script:
print(type([1, 2, 3])) # <class 'list'>
So the run-time type (class) of [1, 2, 3]
is list
and not “int list.” However, Python does allow a rich set of types for its “type hints” feature, which are annotations meant to aid a static type checker. Interestingly, these types are not part of the language itself, and are completely ignored if the program runs without being checked.
In Standard ML, the types int list
and string list
and ((int * string) -> bool) list
are all different. So how do we define functions like length
, head
, and append
, that can work on any list type?
The trick is us type variables. All those types are all instances of the parameterized type 'a list
. Lots of languages are like this. Here are some simple examples:
Language | Example Parameterized Type | Example Instantiated Type | Notes |
---|---|---|---|
Standard ML | 'a list | string list | Type variables are 'a , 'b , 'c and so on. If the instantiating type must admit equality, then we write ''a , ''b , ''c , and so on.
|
'a * 'b | string * bool
| ||
'a -> 'b -> 'a | int->(bool*int)->int | ||
Haskell | [a] a->b | List Int Int->String | Types begin with a capital letter and type variables with a lowercase letter. |
Java | List<T> | List<String> | |
Swift | Set<T> | Set<String> |
When are two types the same?
typedef struct { int a; int b; } Point; typedef struct { int a; int b; } Pair; Point x; Pair y;
Do x
and y
have the same type? Should we say “yes” (because they have the same structure), or “no” (because their types have different names, and furthermore appear in different declarations)? These two approaches to determining whether types are the same are structural and named equivalence:
Structural Equivalence | Name Equivalence |
---|---|
Check equivalence by expanding structures all the way down to basic types | Strict: Every type declaration defines a new type |
Loose: Factor out aliases |
Here is an example from Michael L. Scott:
type alink = pointer to cell; subtype blink = alink; p, q : pointer to cell; r : alink; s : blink; t : pointer to cell; u : alink; -- If structural: [p q r s t u] -- If strict name: [p q] [r u] [s] [t] -- If loose name: [p q] [r s u] [t]
Loose name equivalence is pretty common: we like to distinguish things like Point
and Pair
above but want the flexibility of aliasing. In Ada we can do both explicitly:
type Height is new Integer; type Weight is new Integer; -- Can’t add heights and weights subtype Height is Integer; subtype Weight is Integer; -- Now you can freely mix them
ML uses structural equivalence:
type pair = int * int; type point = int * int; val x: point = (1, 4); val y: pair = x; type person = {name: string, age: int}; type school = {name: string, age: int}; val p: person = {name="Alice", age=22}; val s: school = p;
But that’s because type
does not define a new type! Only a datatype or abstype declaration creates a new type.
abstype person = P of string * int with fun new_person (s, i) = P(s, i) fun name (P(s, i)) = s fun age (P(s, i)) = i end;
When can a value of type $S$ be used in a context that expects type $T$?
Possible answers:
If an $S$ can be used where a $T$ is expected, we say $S$ is compatible with $T$.
By the way, what is coercion? In:
int a; float b; float c; c = a + b;
We may ask whether this is:
c = int_to_float(a) + b
?
A language that allows this is said to coerce ints to floats, and we say “int is compatible with float”.
Important definitions:
class Animal { ... } class Dog extends Animal { ... } class Canine = Dog | Wolf class Famous = Notable & Popular
Then we should be able to assign a:
Dog
to an Animal
Dog
to a Canine
Notable
to a Famous
(Dog)=>Dog
to a (Dog)=>Animal
(Animal)=>Dog
to a (Dog)=>Dog
(Animal)=>Dog
to a (Animal)=>Animal
(Animal)=>Animal
to a (Dog)=>Animal
How do we determine the type of an expression?
First thing first: if you have a literal, type inference is obvious. It’s when expressions have variables that things get interesting.
There appear to be four strategies here. Sometimes you can mix and match them.
Some languages require every variable, every parameter, every return type, every field in a record, and every named entity that can have a type must be declared with an explicit type. C is one such language. Ancient versions of Java, too.
This makes inference rather trivial. Easiest for implementers by far.
In Java, Kotlin, Scala, Go, Rust, Swift, and a few others, local variables can have their types inferred provided they have an initializer. But parameters and function return types, while inferrable in some simple cases, often take more work to infer so these languages require manifest typing for these things (except for some small function literals).
scala> var x = 3;
x: Int = 3
scala> var y = x + 3;
y: Int = 6
scala> def f(x) = x + 3;
<console>:1: error: ':' expected but ')' found.
def f(x) = x + 3;
^
scala> def f(x:Int) = x + 3;
f: (x: Int)Int
This is also pretty easy in most cases. Languages that allow this will need to prohibit things like var x = x;
of course.
HM is an algorithm for type inference in which you never have to manifest any types (unless you want to.) It looks at all the types of the primitive expressions, then at the types of arguments that the functions and operators expect, and work your way out to the whole expression.
Here’s an example in Standard ML:
(* line 1 *) fun fib (n) = (* line 2 *) let fun fib_helper (f1, f2, i) = (* line 3 *) if i = n then f2 (* line 4 *) else fib_helper (f2, f1+f2, i+1) (* line 5 *) in (* line 6 *) fib_helper (0, 1, 0) (* line 7 *) end;
The algorithm will go something like this:
i
has type int
, because it is added to 1 at line 4
n
has type int
, because it is compared to i at line 3
int
constants, and that’s the only use of fib_helper
(given scope of let), so f1
and f2
have type int
int
type of i
(good!)
fib_helper
returns f2
(known to be int
) at line 3, the result of the call at line 6 will have type int
fib
immediately returns this result as its own result, the return type of fib
is int
If there’s not enough information to resolve to a known type, ML’s inferencer will bring in type variables. Note how clever it is:
fun I x = x; (* 'a -> 'a *) fun p x y = y (* 'a -> 'b -> 'b *) fun first (x, y) = x; (* 'a * 'b -> 'a *) fun q x = (hd o hd) x; (* 'a list list -> 'a *) fun c x y = if x = y then "y" else "n"; (* ''a -> ''a -> string *)
In Standard ML, a type variable beginning with two primes can only be instantiated with a type that admits equality.
HM is used in Haskell, Standard ML, OCaml, F#, and other languages.
Read more at Wikipedia.
Some languages use control flow to help with type inference. Here is an example from the Rust Book:
fn main() { // Because of the annotation, the compiler knows that `elem` has type u8. let elem = 5u8; // Create an empty vector (a growable array). let mut vec = Vec::new(); // At this point the compiler doesn't know the exact type of `vec`, it // just knows that it's a vector of something (`Vec<_>`). // Insert `elem` in the vector. vec.push(elem); // Aha! Now the compiler knows that `vec` is a vector of `u8`s (`Vec<u8>`) // TODO ^ Try commenting out the `vec.push(elem)` line println!("{:?}", vec); }
Here is a TypeScript example:
function f(x: string | number) { // type of x is string | number here if (typeof x === 'number') { // type of x is number here. } else { // type of x is string here. } }
When is type checking done, and what happens if the checking indicates an expression is used in a manner inconsistent with its type, like trying to compute the age of a string (instead of a person)?
Let’s tackle the “when” question first:
Static Typing | Dynamic Typing |
---|---|
Type checking is done at compile time.
Once a variable’s type is known, it can only be assigned expressions of that type (including related types that can be coerced to it, of course). | Type checking is done at run time.
Because checking is deferred until run-time a variable can be assigned an expression of one type and then later an expression of another type. var x; x = 2; print x + 5; x = "dog"; print concat(x, "house"); |
It is common for languages to have some static typing and some dynamic typing. Stop thinking in binary terms.
Now, what happens if there is a type mismatch?
Strong Typing | Weak Typing |
---|---|
Type clashes among unrelated types result in errors. they may be compile time, or even run time (e.g., NoSuchMethodError or ClassCastException ) but they are errors. | Type clashes don’t really exist—the language implementation will try to cast the argument into some reasonable (!) type and carry out the operation. |
Alternate definition: you can’t really circumvent the type system. Example: In Java, even if you explicitly cast a string to an integer, you get an error. | Alternate definition: you can easily circumvent the type system. Example: In C++, you can cast an int* to a void* , then to a char** , and you get away with it.
|
Maybe avoid the terms “strong” and “weak”There are people who get all in a tizzy when they hear the terms “strong typing” and “weak typing.” They don’t think the terms mean very much. Also, the terms are so often muddled and misused. A lot of people say “strong typing” and they don’t even know what they are talking about. For this reason, a lot of people never use the terms at all. Most people would agree, though, that the static/dynamic dimension is a bigger deal.
Remember the term manifest from above?
Manifest Typing | Implicit Typing |
---|---|
The types of variables must be given in their declarations.
void f(int x, list<int> z) { int y = x * x; z.push_back(y); } | The types of variables may be inferred from context.
fun f x z = let y = x * x in z @ [y] endHere x must be an int because of the * operator, so y must be an int as well, and then z must be an int list , and finally f must have type int→int list→int list .
|
Checking code with type variables is done by instantiation. Suppose f
has type 'a * int ‑> 'b * 'b ‑> string
. Then the following expressions are legal:
f(4, 5)("sd","de") f(1,1)(2,2) f([],5)([],[4,4])
but these are not
f(3,2)(4,"p") f((3,2),5)([5.5,2.2],[1,6])
Languages can be marked according to how far along they are on the various axes:
There’s a real pragmatic concern for manifest typing. Like how to do you specify complex types? Read A comparison of C and Go to see that, um, there are clear ways and no so clear ways.
The literature on strong/weak, static/dynamic, and manifest/implicit typing is enormous, as are the debates and flame wars. Some good reading:
Does C++ have dependent types? I mean, you can do:
template <int n> class BoundedList { public: int contents[n]; }
but all types that instantiate this template are created at compile time.
Languages that fully support dynamic dependent types can eliminate many logic errors at compile time! (The type system essentially lets you formulate compile-time proofs that a program will have a certain run time behavior, in terms of the values it will produce.)
Start your study of dependent types at this Stack Overflow question then go to Wikipedia.
Real programming languages often deal with concerns that we don’t usually mention in theoretical contexts. For example, implementors need to know how to map their language structures to machine structures, and this can lead to some interesting design choices. You can often see machine ideas leaking into language designs.
To take advantage of hardware registers and arithmetic units, languages often have a variety of numeric types:
Decimal
The basic ”enumeration type” is a type with a fixed number of distinct values, ordered from smallest to largest. Typical examples are:
type TrafficLight is (RED, AMBER, GREEN)
type Direction is (NORTH, EAST, SOUTH, WEST)
The Boolean type is an enum of the values false
and true
.
Many languages have a character type which is essentially an enumeration. Each character’s value is its code point.
Sometimes a language will let you loosely interchange a enum value with its integer position in its declaration—or allow you to attach arbitrary (but still distinct) integers to each value in the enum.
Many languages use the term enum
to refer to a sum type.
enum
for a sum type?
Some languages allow you to make types such as:
range(1, 100)
range(3, 900, 5)
1..100
1...100
1..<100
1..=100
These are just so useful.
Irl, the record type (a tagged product types) goes by many names: record, struct, object, hash, etc. Implementors have to think about how such things are mapped to memory. Things they think about include:
Perhaps unfortunately, many real languages confuse these terms. We know from above what a union is and what a sum is. But a lot of languages might use the word union
for a sum. Sums are often called tagged unions, disjoint unions, or variant records (uggghh). Confusingly, the term enum
has become popular for sums. 🤷♀️
Language implementors have to worry about certain things:
EQUIVALENCE
)
Usually in type theory we talk about the type $t^*$ for base type $t$. But in a real language, we are concerned with:
You learned about these in Data Structures.
An array is a collection of elements indexed by integers or some other enumerated type, that has constant time access of any element given its index. The elements are almost always packed tightly together in a contiguous memory space. Contrast the array with the list, which needn’t have constant time access, and can be made up of lots of links.
Language designers have a lot of choices when it comes to arrays.
First, there’s the whole idea of bounds:
Second, the ideas of lifetime and shape matter:
Static Bounds | Bounds Set at Elaboration | Bounds Can be Changed (Dynamic Shape) |
|
---|---|---|---|
Global Lifetime |
C globals | ||
Local Lifetime |
C locals | Ada locals | |
Arbitrary Lifetime |
Java | Perl, APL |
Local lifetime / Bounds set at elaboration category is interesting: one can use the alloca
syscall to implement
these, giving you automatic deallocation on block exit.
When laying out an array in memory, implementors have choices:
Generally, arrays are fixed things. But slices are parts of a (backing) array.
Several languages (Perl, APL, Julia, Fortran 90) have really nice syntaxes for slices. Fortran 90 allows (these examples are from Michael L. Scott):
matrix(3:6, 4:7) columns 3-6, rows 4-7 matrix(6:, 5) columns 6-end, row 5 matrix(:4, 2:8:2) columns 1-4, every other row from 2-8 matrix(:, /2, 5, 9/) all columns, rows 2, 5, and 9
Go and Rust elevate slices to more of a first-class kind of entity. We’ll talk about these in the language-specific course notes for these languages.
Most (all?) languages have a built-in string type, but variations abound:
Believe it or not, strings are controversial. Here are some articles:
Mathematically, a set is an unordered collection of unique elements. In programming languages:
set
and frozenset
).A reference, or ref, is a value through which you reference another value. They are crucial for modeling sharing of data and for dynamic, linked data structures. Sometimes you will see references called pointers, though technically speaking, a pointer is a reference that you can do arithmetic on. They appear in system languages and low level languages.
Here is a reference to a string value:
The syntax for creating and using references varies from language to language. And when there’s a record being referred to, a lot of languages let you automatically dereference. Given:
If the pointer is called p
(or $p
in Perl), then:
In | the referent is called | and the field is called |
---|---|---|
C, C++ | *p | (*p).x or p->x
|
Ada | p.all | p.all.x or p.x
|
Pascal, Modula-2 | p^ | p^.x
|
Java, Python, Ruby, Lua, JavaScript | p (do you know why?) | p.x
|
Standard ML, OCaml | !p | !p.x
|
Go, Rust | *p | (*p).x or p.x
|
Perl | %$p | $$p{x} or $p->{x}
|
If the referent is called v
(or %v
in Perl), then:
In | the pointer is called | and the field is called |
---|---|---|
C, C++ | &v | v.x
|
Ada | v'access | v.x
|
Pascal, Modula-2 | @v | v.x
|
Java, Python, Ruby, Lua, JavaScript | v (do you know why?) | v.x
|
Standard ML, OCaml | ref v | — |
Go, Rust | &v | v.x
|
Perl | \%v | $v{x}
|
Why are the entries for Java, JavaScript, Python, Lua, Ruby, etc. so weird? It’s because references are often implicit. In Python, for example, every variable (including parameters and attributes) hold references. Every. single. one. Even though the references and referents exist, and you know they exist, you can’t really even distinguish them. It’s so weird.
It is even weirderStandard ML and OCaml have explicit references, but these are only for mutable referents. Most of the other values are implicitly referenced. Oh my.
While it is eventually easy to get used to implicit references, not every language has them. In C++, pointers are explicit:
int x = 5; int* p = &x; int* q = NULL; int* r = new int; int* s = new int[100]; cout << *p << *r << s[20]; cout << *q; // crash
Although references are crucial for modeling certain things, they can be really problematic and disastrous for security. So some languages put in a lot of safeguards. Here are some issues:
unsafe
block
We’ll cover garbage collection separately. For now, you can read Wikipedia’s article on garbage collection, and an interesting developerworks article on Java performance and garbage collection.
Ah, the billion dollar mistake. We’ve talked about this abomination earlier.
Sometimes, pointers and arrays get conflated in C. After all:
e1[e2]
is the same as*(e1 + e2)
For definitions these are completely different things:
int *x; /* is totally different from: */ int x[100]; int *a[n]; /* is totally different from: */ int a[n][100];
But for declarations, at least in parameter declarations, you can blur the distinction:
void f(int* a) { ... } void g(int b[]) { ... }
An array of ints, or pointer to an int, can be passed to either.
More details in the notes on C.
The term stream is a bit overloaded. Abstractly, a stream:
Sometimes we use the term stream to refer to files.
In Java, streams are their own thing. They work like the streams described above (they are processed sequentially), but are in a pretty large subsystem of their own. See the notes on Java for details.
These will be covered later in separate notes.
Functions are often objects with their own types. We’ll see these later in separate notes on functions.
These will be covered later in separate notes on concurrency.
If a type system is orthogonal, then no type is more special and capable than others. The questions you want to ask to see whether a system is orthogonal or not are whether an object of any type:
You might be surprised at how un-orthogonal some language’s type systems are.
A related question is: Are types themselves objects? Various answers to this are:
type
, so yes there.Class
, so yes there too.It’s sometimes undesirable, or maybe even impossible, to get true orthogonality, since:
There is a great deal of theory surrounding our notion of types. Entire books have been written about this theory.
A good place to start on Type Theory is at Wikipedia. The notes you are reading now have focused almost entirely on the more pragmatic issues of types.
You can also read the Type Theory articles in the Stanford Encyclopedia of Philosophy and at nLab. Actually, nLab has a lot of good stuff on type theory and re;ated topics; see for example their articles on Type, Judgment, Proposition, Dependent Type, Type Universe, and Hierarchy of Universes.
More on Mathematics and Philosophy
Many philosophers and mathematicians are interested in a field known as the Foundations of Mathematics. There are three theories that try to be THE foundation: Set Theory (generally the ZFC flavor), Type Theory, and Category Theory. They all work okay and are in some sense equally powerful. See this discussion on Philosophy Stack Exchange and the Wikipedia articles on Intuitionistic Type Theory, Homotopy Type Theory, Curry-Howard Isomorphism, and Foundations of Mathematics.And of course, nLab has its own Foundations of Mathematics article.
More good reads:
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
[2, 3, 5]
abbreviate? 2::3::5::[]
()
.interface{}
, object
, BasicObject
.Option<T>
is an actual sum type with tags Some
and None
, while TypeScript uses the pure union T | null
.typeof e
, type(e)
, e.class
Array
but in other languages like Java, Rust, and Swift there exist many parameterized array types? TODO - MORE QUESTIONS TO COME
We’ve covered: