The Lambda Calculus

This little invention of Alonzo Church is really quite something.

The Background

So the Lambda Calculus is a theory of functions.

It was created by Alonzo Church (Turing’s PhD advisor) around 1930s when he was looking into the foundations of mathematics. It turns out his invention, simple as it is, is actually a universal model for all of computation (equivalent to Turing machines).

A good detailed technical overview is David F. Martin’s Notes on the Lambda Calculus. A long textbook chapter overview is Chapter 5 of the Slonneger and Kurtz Book. If you prefer Wikipedia, there’s an article for you there of course. And a really good one at the Stanford Encyclopedia of Philosophy.

If you like videos, here’s a good intro:

Why It Matters

Are you wondering why you’d care about this funny-named theory of functions? (If you said no, you may have had that question answered in the video above, so great!)

The usual answers for “why it matters” and “why study this thing” are that theories are helpful in organizing useful knowledge, and in particular, this Lambda Calculus thing turns out to, even in its simplicity, capture the essentials of functions! And as programmers, you know how important functions are to computation. Right? Functions in some sense are what we think of as computation:

And oh yeah, most modern programming languages have “lambdas” and they are powerful things.

The Basic Idea

So if you were making a theory of functions, how simple can you make it? Here’s how Church saw things:

A function is applied to an (input) argument to compute an (output) result. Examples:
$\begin{array}{l} \;\;\;\; \mathtt{square}\;5 \Longrightarrow 25 \\ \;\;\;\; \mathtt{not}\;\mathtt{true} \Longrightarrow \mathtt{false} \\ \;\;\;\; \mathtt{factorial}\;8 \Longrightarrow 40320 \end{array}$
Functions do not have to be named:
$\begin{array}{l} \;\;\;\; \lambda x. \mathtt{square}\; (\mathtt{factorial}\;x) \\ \;\;\;\; \lambda x. \mathtt{not}\; (\mathtt{even}\;x) \\ \;\;\;\; \lambda x. x \end{array}$
You compute by reducing terms via function application:
$\begin{array}{l} \;\;\;\; (\lambda x. \mathtt{square}\; (\mathtt{factorial}\;x))\;8 \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{square}\;(\mathtt{factorial}\;8) \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{square}\;40320 \\ \;\;\;\;\;\;\;\; \Longrightarrow 1625702400 \end{array}$
Even though all functions have a single input and a single output, you can simulate a function with multiple inputs by having the function return another function. THIS IS SO COOL. For example ($\mathtt{plus}\;3)$ is the function that adds 3 to its argument, so:
$\begin{array}{l} \;\;\;\; (\mathtt{plus}\;3)\;5 \Longrightarrow 8 \end{array}$
If you want to give names to certain terms, you can. Let’s define the name $\mathtt{successor}$ to abbreviate the term $\mathtt{(plus\;1)}$. Then:
$\begin{array}{l} \;\;\;\; \mathtt{successor}\;8 \\ \;\;\;\;\;\;\;\; \Longrightarrow (\mathtt{plus\;1})\;8 \\ \;\;\;\;\;\;\;\; \Longrightarrow 9 \end{array}$
An aside: function application associates to the left, so $\mathtt{f\;x\;y}$ means $\mathtt{(f\;x)\;y}$. This makes derivations a bit easier to read. Let’s compute whether 317811 is odd:

$\begin{array}{l} \;\;\;\; (\lambda x. \mathtt{not}\;(\mathtt{eq}\;(\mathtt{mod}\;x\;2)\;0))\;317811 \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{not}\;(\mathtt{eq}\;(\mathtt{mod}\;317811\;2)\;0) \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{not}\;(\mathtt{eq}\;1\;0) \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{not}\;\mathtt{false} \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{true} \end{array}$

And just like functions can be returned from functions, they can also be passed to functions as arguments. The classic example is the famous function twice, defined as:

$ \;\;\;\; \mathtt{twice} =_{\mathtt{def}} \lambda f. \lambda x. f\;(f\;x) $

$\begin{array}{l} \;\;\;\; \mathtt{twice}\;f = \lambda x. f\;(f\;x) \\ \;\;\;\; \mathtt{twice}\;f\;x = f\;(f\;x) \end{array}$

$\begin{array}{l} \;\;\;\; \mathtt{twice}\;\mathtt{square}\;5 \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{square}\;(\mathtt{square}\;5) \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{square}\;25 \\ \;\;\;\;\;\;\;\; \Longrightarrow 625 \end{array}$

$\begin{array}{l} \;\;\;\; \mathtt{twice}\;(\lambda x. \mathtt{times}\;5\;x)\;10 \\ \;\;\;\;\;\;\;\; \Longrightarrow (\lambda x. \mathtt{times}\;5\;x)\;((\lambda x. \mathtt{times}\;5\;x)\;10) \\ \;\;\;\;\;\;\;\; \Longrightarrow (\lambda x. \mathtt{times}\;5\;x)\;( \mathtt{times}\;5\;10) \\ \;\;\;\;\;\;\;\; \Longrightarrow (\lambda x. \mathtt{times}\;5\;x)\;50 \\ \;\;\;\;\;\;\;\; \Longrightarrow \mathtt{times}\;5\;50 \\ \;\;\;\;\;\;\;\; \Longrightarrow 250 \end{array}$

Now it’s time to make a rigorous theory.

The Syntax

The Lambda Calculus is a formal calculus, which means it has a syntax that defines which formulas are legally well-formed (the formal part), and it has rules for transforming one formula into another (the calculus part). Church designed the calculus to mimic what we intuitively think functions should do. Now here’s the syntax of the Untyped Applied Lambda Calculus, given with a grammar:

$\begin{array}{lcll} \textit{Exp} & \longrightarrow & \textit{constant} & \\ & | & \textit{variable} & \\ & | & \texttt{"("}\;\textit{Exp}\;\textit{Exp}\;\texttt{")"} & \textrm{--application} \\ & | & \texttt{"("}\;\texttt{"}\mathtt{\lambda}\texttt{"}\;\textit{variable}\;\texttt{"."}\;\textit{Exp}\;\texttt{")"} & \textrm{--abstraction} \\ \end{array}$

You may have guessed from that lead in that there actually exist typed versions of the calculus (“The Typed $\lambda$ Calculus”) as well as non-applied versions (“The Pure $\lambda$ Calculus”). We’ll see those soon. But it’s best to start without types, and with constants. (The pure lambda calculus does not have constants.)

The syntax above is completely unambiguous, thanks to tons of parentheses. In practice, we can remove parentheses as long as we preserve the meaning under the assumptions that (1) the scope of the abstraction’s body goes as far to the right as possible, and (2) application associates to the left.

That might have sounded complicated so we’ll go over these in class:

Official, fully parenthesized expression	Simplified version
$(\mathtt{square}\;5)$	$\mathtt{square}\;5$
$((\mathtt{plus}\;5)\;8)$	$\mathtt{plus}\;5\;8$
$((((x\;y)\;z)\;w)\;x)$	$x\;y\;z\;w\;x$
$(\mathtt{sin}\;(\mathtt{sqrt}\;2))$	$\mathtt{sin}\;(\mathtt{sqrt}\;2)$
$((((x\;y)\;z)\;(w\;y))\;x)$	$x\;y\;z\;(w\;y)\;x$
$(\lambda x. (x\;y))$	$\lambda x. x\;y$
$((\lambda x. x)\;y)$	$(\lambda x. x)\;y$
$(\lambda x. ((x\;y)\;(\lambda z. ((x\;y)\;z))))$	$\lambda x. x\;y\;\lambda z. x\;y\;z$

Here’s the entire abstract syntax:

$\begin{array}{llcl} k: & \mathsf{Constant} & & \\ v: & \mathsf{Variable} & & \\ e: & \mathsf{Expression} & = & k \mid v \mid e\;e \mid \lambda x. e \\ \end{array}$

Vocabulary Time!

Application: An expression of the form $(M\;N)$. Here $M$ is the operator and $N$ is the operand.
Abstraction: An expression of the form $\lambda x. M$. Here $x$ is the parameter and $M$ is the body.
Value: Any expression that is not an application (e.g., a constant, variable, or abstraction).

Values are, well, values.

Applications are things you need to, well, apply, or in other words, call, invoke, or compute.

How Does It All Work?

Before we lay out the rules, convince yourself that these three things make sense.

$\lambda x. \mathtt{hypot}\;x\;3 \Longrightarrow \lambda y. \mathtt{hypot}\;y\;3$
$(\lambda x.\mathtt{minus}\;x\;1)\;9 \Longrightarrow \mathtt{minus}\;9\;1$
$\lambda x. \mathtt{square}\;x \Longrightarrow\;\mathtt{square}$

These operations have names:

Example	Name	What it says
$\begin{array}{l}\lambda x. \mathtt{hypot}\;x\;3 \Longrightarrow\\ \;\;\;\;\lambda y. \mathtt{hypot}\;y\;3\end{array}$	α‑conversion	The name of the variable doesn’t really matter. (Usually)
$\begin{array}{l}(\lambda x.\mathtt{minus}\;x\;1)\;9 \Longrightarrow \\ \;\;\;\;\mathtt{minus}\;9\;1\end{array}$	β‑conversion	We call functions by passing arguments to parameters, producing the body of the function with the parameter replaced with the argument. (Usually)
$\begin{array}{l}\lambda x. \mathtt{square}\;x \Longrightarrow\\ \;\;\;\;\mathtt{square}\end{array}$	η‑conversion	You don’t need to wrap a lambda around a perfectly good function. Leave it alone. (Usually)

But why do we say “usually”? This isn’t hard, right? It’s not...hard...is it?

We have to be careful

Any one of these three conversions can go terribly wrong if you don’t do them right. If you naively make changes, the new expression will have a completely different meaning than the original. And that’s bad. Examples:

$(\lambda x. y) \not\Rightarrow \lambda y. y$ Failed α
$(\lambda x. \lambda y. x\;y)\;y \not\Rightarrow \lambda y. y$ Failed β
$(\lambda x. x\;x) \not\Rightarrow x$ Failed η

In the first two cases, the variable $y$ got captured. It became bound when it should have been free. In the third case, $x$ escaped. It was bound and became free. Wait. Bound? Free?

Understanding Bound and Free Variables

The calculus was designed so that:

$\;\; \lambda x. \mathtt{plus}\;x\;1$

and:

$\;\; \lambda y. \mathtt{plus}\;y\;1$

represent the same thing. In other words, the variable in the abstraction doesn’t really matter too much. We should be able to substitute a different variable, but this does not always work. We cannot substitute $y$ for $x$ in:

$\;\; \lambda x. \mathtt{plus}\;x\;y$

Because doing so would change the function from one that “adds its argument to global variable y” to one that “multiplies its argument by two.” Completely different!

How do we make restrictions on this substitution formal? First we define the terms bound and free so that in, say:

$\;\; \lambda x. x\;y$

$x$ is bound and $y$ is free. The intuition is that a variable occurrence is bound if it is the variable of an abstraction in the scope of the abstraction and free otherwise. To make this perfectly rigorous we define the sets $BV$ (bound variable occurrences) and $FV$ free variable occurrences as follows (table is from David F. Martin):

Expression	FV	BV
$k$ (a constant)	$\varnothing$	$\varnothing$
$x$ (a variable)	$\{x\}$	$\varnothing$
$(M\;N)$	$FV(M) \cup FV(N)$	$BV(M) \cup BV(N)$
$(\lambda x. M)$	$FV(M) - \{x\}$	$BV(M) \cup \{x\}$

Remember it’s all about occurrences, not variables
In $((\lambda x. x)\;x)$ we see first the parameter $x$ then two occurrences of $x$. The first occurrence is bound and the second is free. So when asked “which variables are free in this expression” you should understand that question to be asking for occurrences, so don’t just say “$x$”. You need to say which occurrences are free.
That said, some people might call the parameter appearance an “occurrence” and say it’s bound. I guess they are...free...to do so.

Time for examples! Here are a handful of expressions, with the free variables underlined and the bound variables left alone:

$\underline{x}$
$\underline{x}\;\underline{y}\;\underline{z}$
$\lambda x. \underline{y}$
$\lambda x. x\;\underline{y}$
$(\lambda x. x)\;\underline{x}$
$(\lambda x. x\;\underline{y})\;\lambda z. \underline{w}\;\lambda w. w\;z\;\underline{y}\;\underline{x}$

Now we can properly define $\eta$-conversion to avoid escapes:

Rule Name	How it works
$\eta$-conversion	$\lambda x. M\;x \Longrightarrow M$ provided that $x \not\in FV(M)$

But we need more work to make $\alpha$-conversion and $\beta$-conversion free of captures.

Proper Substitution

$\alpha$ and $\beta$ conversions operate by substituting a variable or an expression for another. But captures are possible if we are not careful. So we have to define proper substitution.

Define the notation $M[N/x]$ to mean the proper substitution of $N$ for all free occurrences of $x$ in $M$, like so:

$E$	$E[M/x]$	Rationale
$k$	$k$	Just a constant, nothing to substitute
$x$	$M$	$x$ is free, good to substitute for it
$y$ where $y \neq x$	$y$	This is not the variable we are substituting for
$(N_1\;N_2)$	$(N_1[M/x]\;N_2[M/x])$	Just properly sub into both terms
$\lambda x. N$	$\lambda x. N$	$x$ is bound, you cannot substitute for it!
$\lambda y. N$ where $y \neq x$	if $y\not\in FV(M) \vee x\not\in FV(N)$ then $\;\;\;\; \lambda y. N[M/x]$ else $\;\;\;\; \lambda z.((N[z/y])[M/x])$ $\;\;\;\;\;\;\;\;$ where $z$ is a new variable $\;\;\;\;\;\;\;\;$not free in $M$ nor $N$	If $M$ has a free $y$, and we had an $x$ in $N$ to substitute over, then $y$ would get captured! So in this case, we have to “rename” $y$ to a new fresh $z$ before substituting for $x$.

Practice!

$\mathsf{times}[2/x] \equiv \mathsf{times}$
$x[2/x] \equiv 2$
$y[2/x] \equiv y$
$(f\;x)[2/x] \equiv f\;2$
$(x\;y)[2/x] \equiv 2\;y$
$(x\;x)[z/x] \equiv z\;z$
$((g\;x)\;x)[2/x] \equiv g\;2\;2$
$(\lambda x. f\;x\;y)[2/x] \equiv \lambda x. f\;x\;y$
$(\lambda x. f\;x\;y)[z/x] \equiv \lambda x. f\;x\;y$
$(\lambda y. f\;x\;y)[z/x] \equiv \lambda y. f\;z\;y$
$(\lambda y. f\;x\;y)[y/x] \equiv (\lambda z. f\;x\;z)[y/x] \equiv\lambda z. f\;y\;z$

This is good enough for $\beta$-conversion, but $\alpha$-conversion needs a qualifier:

Rule Name	How it works
$\alpha$-conversion	$\lambda x. M \Longrightarrow \lambda y. M[y/x]$ provided that $y \not\in FV(M)$
$\beta$-conversion	$(\lambda x. M)\;N \Longrightarrow M[N/x]$

Reduction

Computation proceeds by reducing expressions, specifically applications. There are two kinds of reductions:

Beta-reduction: $(\lambda x. M)\;N \Longrightarrow M[N/x]$
Constant application: For expressions like $(\texttt{sqrt}\;100)$, carry out whatever computation the constant operator is defined to mean.

An expression that can be reduced is a reducible exression: a redex.

When you’ve reduced an expression to the point where no reduction operations can be applied any more, you’ve reached the normal form of the expression.

Not every expression has a normal form. Here’s one that doesn’t: $(\lambda x. x\;x)\;(\lambda x. x\;x)$. Some people call this $\Omega$.

When you perform reductions, you sometimes have a choice of which part of an expression you want to reduce first. Example:

$(\lambda x. x + x)(3 \times 5) \Longrightarrow (3 \times 5) + (3 \times 5)$
$(\lambda x. x + x)(3 \times 5) \Longrightarrow (\lambda x. x + x)(15)$

Can different choices of what to reduce next lead to different normal forms? Nope. It turns out that every expression has at most one normal form, and that normal form is unique, up to the names of its bound variables, anyway. This is a corollary to famous Church-Rosser Theorem. The theorem is actually: For every lambda expression $E$, $M_1$, and $M_2$, if $E \Longrightarrow^* M_1$ and $E \Longrightarrow^* M_2$, then there exists an expression $N$ such that $M_1 \Longrightarrow^* N$ and $M_2 \Longrightarrow^* N$.

This choice gives rise to two different semantics:

The Call-By-Value (CBV) Lambda Calculus: You cannot do any reductions unless the operand is a value, and you cannot perform any reductions inside the body of an abstraction.
The Call-By-Name (CBN) Lambda Calculus: You have to apply the functions as soon as possible (don’t evaluate the operand!)

Note how both of these restrict the anything-goes basic semantics.

How is this related to programming languages?
Most programming languages use call-by-value. But some languages, like Haskell, use call-by-name. And some languages might even let you choose.

Here’s a CBV Reduction:

$\begin{array}{l} (\lambda x. x + x)((\lambda x. 3x + 1) 5) \\ \;\;\;\;\Longrightarrow (\lambda x. x + x)(3 \times 5 + 1) \\ \;\;\;\;\Longrightarrow (\lambda x. x + x)(16) \\ \;\;\;\;\Longrightarrow 16 + 16 \\ \;\;\;\;\Longrightarrow 32 \end{array}$

And here’s a CBN Reduction:

$\begin{array}{l} (\lambda x. x + x)((\lambda x. 3x + 1) 5) \\ \;\;\;\;\Longrightarrow (\lambda x. 3x + 1)5 + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow (3 \times 5 + 1) + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow 16 + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow 16 + (3 \times 5 + 1) \\ \;\;\;\;\Longrightarrow 16 + 16 \\ \;\;\;\;\Longrightarrow 32 \end{array}$

Same normal form!

Exercise: Show that the reductions of $(\lambda x. 0)((\lambda x. x\;x)(\lambda x. x\;x))$ using call-by-value and call-by-name give different “results”. But does that contradict the Church-Rosser Theorem? Why not?

The two different calculi are sometimes confused with two reduction strategies. These are:

The Normal Order Reduction Strategy (NORS): Always reduce the leftmost, outermost redex first.
The Applicative Order Reduction Strategy (AORS): Always reduce the leftmost, innermost redex first.

Exercise: See the David F. Martin article linked above for a description of the difference between these two strategies.

Lambdas IRL

Most major programming languages have “lambdas” a.k.a “anonymous functions.” Let’s start easy, with the square function:

Language	The Square Function
JavaScript	`x => x ** 2`
Python	`lambda x: x ** 2`
Ruby	`->(x) {x ** 2}`
Lua	`function (x) return x ^ 2 end`
Kotlin	`{x: Double -> x * x}`
Swift	`{$0 as Double * $0}`
Rust	`\|x\| {x * x}`
Swift	`{$0 * $0}`
Perl	`{ $_ * $_ }`
Java	`(x) -> x * x`
C++	`[](double x)return x * x;`
C#	`x => x * x`
F#	`fun x -> x * x`
Go	`func(x float64)float64 {return x * x}`
Julia	`x -> x^2`
Gleam	`fn(x) { x * x }`
Lisp	`(lambda (x) (* x x))`
Clojure	`#(* % %)`
Haskell	`\x -> x ^ 2`
K	`{x^2}`

Lambdas show up a lot when used with functions like map, filter, and reduce. Here are examples of how different languages specify a function to compute the sum of the squares of the even numbers of a list (H/T Xan Mavromatis for writing a few of these):

Language	Sum of squares of evens
JavaScript	function sumOfEvenSquares(a) { return a .filter(x => x % 2 === 0) .map(x => x ** 2) .reduce((x, y) => x + y, 0) }
Python	def sum_of_even_squares(a): return sum(x*x for x in a if x % 2 == 0)
Ruby	def sum_of_even_squares(a) a.select{\|x\| x % 2 == 0}.map{\|x\| x * x}.reduce(0, :+) end
Kotlin	fun sumOfEvenSquares(a: Array<Int>): Int { return a.filter { it % 2 == 0 }.map { it * it }.sum() }
Swift	func sumOfEvenSquares(of a: [Int]) -> Int { return a.filter{$0 % 2 == 0}.map{$0 * $0}.reduce(0, +) }
Rust	fn sum_of_even_squares(a: &[i32]) -> i32 { a.iter() .filter(\|&&x\| x % 2 == 0) .map(\|&x\| x * x) .sum() }
Perl	sub sum_of_even_squares { sum0 map { $_**2 } grep { $_ % 2 == 0 } @_; }
Java	public static int SumOfEvenSquares(int[] a) { return IntStream.of(a).filter(x -> x % 2 == 0).map(x -> x * x).sum() }
C#	// Must use System.Linq public static int SumOfEvenSquares(int[] a) { return a.Where(x => x % 2 == 0).Select(x => x * x).Sum(); }
Clojure	(defn sum-of-even-squares [a] (->> a (filter even?) (map #(* % %)) (reduce +)))
Chapel	proc sum_of_even_squares(a) { return + reduce ([x in a] if x % 2 == 0 then x*x else 0); }
Julia	function sumOfEvenSquares(a) sum(x -> x^2, filter(x -> x % 2 == 0, a), init = 0) end
Haskell	sumOfEvenSquares :: [Int] -> Int sumOfEvenSquares = sum . map (^ 2) . filter even
F#	let sum_of_even_squares a = a \|> List.filter (fun x -> x % 2 = 0) \|> List.sumBy (fun x -> x * x)
K	sumofevensquares: {+/x[&~x!2]^2}

Some languages, notably Lua and Verse, don’t seem to have native approaches to mapping, filtering, and reducing for their higher-order functions.

The Pure λ-Calculus

Believe it or not, we can build up all the constants of mathematics from lambda expressions. You don’t need numbers or booleans or lists or anything, all you need are functions. Nothing but functions. You get the objects you need by defining them to be...functions! Here are a few:

$\begin{array}{l} \;\; 0 \equiv \lambda f. \lambda x. x \\ \;\; 1 \equiv \lambda f. \lambda x. f\;x \\ \;\; 2 \equiv \lambda f. \lambda x. f\;(f\;x) \\ \;\; 3 \equiv \lambda f. \lambda x. f\;(f\;(f\;x)) \\ \;\; 4 \equiv \lambda f. \lambda x. f\;(f\;(f\;(f\;x))) \\ \;\; \vdots \end{array}$

The number $n$ is a function that takes in a function $f$ and some $x$ and applies $f$ $n$ times to $x$. So this means:

$ \;\; \mathtt{successor} \equiv \lambda n. \lambda f. \lambda x. f\;(n\;f\;x) $

And then $\mathtt{plus}\;m\;n$ is applying the successor function $m$ times to $n$:

$ \;\; \mathtt{plus} \equiv \lambda m. \lambda n. m\;\mathtt{successor}\;n $

with multiplication following from that in a straightforward way:

$ \;\; \mathtt{times} \equiv \lambda m. \lambda n. m\;(\mathtt{plus}\;n)\;0 $

There are more “direct” ways to do addition and multiplication:

$\begin{array}{l} \;\; \mathtt{plus} \equiv \lambda m. \lambda n. \lambda f. \lambda x. m\;f\;(n\;f\;x)\\ \;\; \mathtt{times} \equiv \lambda m. \lambda n. \lambda f. m\;(n\;f) \end{array}$

And amazingly, exponentiation turns out to be just this:

$ \;\; \mathtt{pow} \equiv \lambda b. \lambda e. e\;b $

How about Booleans?

$\begin{array}{l} \;\; \mathtt{true} \equiv \lambda x. \lambda y. x \\ \;\; \mathtt{false} \equiv \lambda x. \lambda y. y \\ \;\; \mathtt{not} \equiv \lambda p. p\;\mathtt{false}\;\mathtt{true} \\ \;\; \mathtt{and} \equiv \lambda p. \lambda q. p\;q\;p \\ \;\; \mathtt{or} \equiv \lambda p. \lambda q. p\;p\;q \\ \;\; \mathtt{ifthenelse} \equiv \lambda r. \lambda p. \lambda q. r\;p\;q \end{array}$

We can do tuples:

$\begin{array}{l} \;\; \mathtt{pair} \equiv \lambda x. \lambda y. \lambda f. f\;x\;y \\ \;\; \mathtt{first} \equiv \lambda p. p\;\mathtt{true} \\ \;\; \mathtt{second} \equiv \lambda p. p\;\mathtt{false} \\ \end{array}$

A list is either empty or a pair consisting of an item (its head) and a list (its tail):

$ \;\; \mathtt{emptylist} \equiv \lambda x. \mathtt{true} $

Example: Here are how some lists are represented:

Sugared Notation	λ-Expression
[]	emptylist
[3]	(pair 3 emptylist)
[8, 3]	(pair 8 (pair 3 emptylist))
[5, 8, 3]	(pair 5 (pair 8 (pair 3 emptylist)))

There is a function to test whether a list is empty:

$ \;\; \mathtt{isempty} \equiv \lambda p. p\;(\lambda x. \lambda y. \mathtt{false}) $

Exercise: Show the reduction of the expressions (isempty emptylist) and (isempty [1, 5, 8])

What’s the point?

It’s just to show you could, if you wanted to, build up math from just functions.

Functions all the way down.

Let, Where, and If

When you apply a function to an argument, you are LET-ting the parameter be the argument within the body of the function. So writing the application:

$\;\; (\lambda x. \mathtt{plus}\;5\;x)\;3$

could be written as:

$\;\; \mathsf{let}\;x=3\;\mathsf{in}\;(\mathtt{plus}\;5\;x)$

We sometimes see it written as this as well:

$\;\; (\mathtt{plus}\;5\;x)\;\mathsf{where}\;x=3$

All three of these are the same $\lambda$-expression. The symbols $\mathsf{let}$, $\mathsf{in}$, and $\mathsf{where}$ are symbols of the sugared $\lambda$ Calculus itself; they are not constants!

Another convenient form is:

$\;\; \mathsf{if}\;E_1\;\mathsf{then}\;E_2\;\mathsf{else}\;E_3$

You can use it in a computation as in this example:

$\begin{array}{l} \;\; (\lambda x. \mathsf{if}\;\mathtt{less}\;x\;3\;\mathsf{then}\;\mathtt{times}\;2\;x\;\mathsf{else}\;\mathtt{plus}\;x\;5)\;10 \\ \;\;\;\;\;\; \Longrightarrow \mathsf{if}\;\mathtt{less}\;10\;3\;\mathsf{then}\;\mathtt{times}\;2\;10\;\mathsf{else}\;\mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow \mathsf{if}\;\mathsf{false}\;\mathsf{then}\;\mathtt{times}\;2\;10\;\mathsf{else}\;\mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow \mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow 15 \end{array}$

Other Syntactic Sugar

It’s fine to use crazy mathematical notation like infix operators with funny symbols. Here are some examples. (You can use your own sugar too, if you can get folks to agree to use them.)

Standard Form	Sugared Form
(plus $x$ $y$)	$x + y$
(minus $x$ $y$)	$x - y$
(times $x$ $y$)	$x \times y$
(divide $x$ $y$)	$x \div y$
(remainder $x$ $y$)	$x \;\mathtt{\%}\; y$
(power $x$ $y$)	$x^y$
(tetrate $x$ $y$)	$x \uparrow\uparrow y$
(pentate $x$ $y$)	$x \uparrow\uparrow\uparrow y$
(negate $x$)	$-x$
(sqrt $x$)	$\sqrt{x}$
(factorial $x$)	$x!$
(exp $x$)	$e^x$
(floor $x$)	$\lfloor x \rfloor$
(ceil $x$)	$\lceil x \rceil$
(less $x$ $y$)	$x < y$
(lesseq $x$ $y$)	$x \leq y$
(eq $x$ $y$)	$x = y$
(noteq $x$ $y$)	$x \not= y$
(greater $x$ $y$)	$x > y$
(greatereq $x$ $y$)	$x \geq y$
(union $x$ $y$)	$x \cup y$
(intersection $x$ $y$)	$x \cap y$
(pair $x$ $y$)	$(x, y)$
(pair $x$ (pair $y$ $z$))	$(x, y, z)$
emptylist	$[\,]$
(pair $x$ emptylist)	$[x]$
(pair $x$ (pair $y$ emptylist))	$[x,y]$
(pair $x$ (pair $y$ (pair $z$ emptylist)))	$[x,y,z]$

Recursion

So, lambda calculus is supposedly this model of computation. We saw it can do booleans, logic, numbers, arithmetic, function invocation, and even conditional evaluation. But can it do iteration (i.e., loops and recursion)?

Why It Is Hard

How can we define the factorial function? First note this does NOT work:

$\;\; \mathtt{fact} \equiv_{\mathtt{def}} \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times \mathtt{fact}(n-1)$

We’re trying to define fact but assuming it already exists? That’s not right! We have to evaulate the right hand side first, then bind it to fact.

Perhaps what was meant was:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times (\lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times \ldots)(n-1) \end{array}$

which is nonsense. We can’t have infinitely long expressions. The whole point of computation is to find finite representations (programs, functions, what have you) of things.

What we want is something like this:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \textrm{the function}\;f\;\textrm{such that}\;f = \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times f(n-1) \end{array}$

But how do we possibly find such an $f$? What sorcery is required?

Fixpoints

This is wild indeed so hang on. How do we deal with these “such that” questions? How do we find values that “satisfy” statements? Sometimes we can just guess:

The $x$ such that $x = \cos x$ is $0.73908513321516064166\ldots$
The $x$ such that $x = x^0$ is $1$
The $x$ such that $x = 6-x^2$ is $2$. Or $-3$.

How do we make this more rigorous? We use the idea of a fixpoint. A fixpoint of a function $f$ is a value $x$ such that $f(x)=x$. That’s it. That’s all it is. Examples are helpful:

Function	Fixpoints
$\lambda x. \cos x$	$0.73908513321516064166\ldots$
$\lambda x. x^0$	$1$
$\lambda x. 6-x^2$	$2$ $-3$
$\lambda x. x!$	$0$ $1$
$\lambda x. x^2-x$	$0$ $2$
$\lambda x. x$	all values are fixpoints
$\lambda x. x + 1$	no fixpoints (well JS has `NaN`)

Now, if there existed a function, called, say, fix, that would find fixpoints of a function for us, then we could write:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \mathtt{fix}\;\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1) \end{array}$

See why? Because what is the fixpoint of $\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1)$? It’s the function $f$ such that.... You get the idea.

The Trick

Now how do we write fix? First, understand what it is supposed to do. $\mathtt{fix}\;f$ must produce a fixpoint of $f$, namely some value for which $x = f(x)$. So that means that fix must satisfy the equation:

$ \;\; \mathtt{fix}\;f = f(\mathtt{fix}\;f) $

How even can we find what fix is? Well someone did, and they called it $Y$:

$ \;\; Y \equiv_{\mathrm{def}} \lambda f. (\lambda g. (f\;(g\;g))\;\lambda g.(f\;(g\;g))) $

Wait, what? Seriously? Let’s see:

$\;\; \begin{array}{l} Y\;f & = & \lambda g. (f\;(g\;g))\;\lambda g.(f\;(g\;g)) \\ & = & f\;(\lambda g.(f\;(g\;g))\;\lambda g.(f\;(g\;g))) \\ & = & f\;(Y\;f) \end{array}$

🤯

For completeness, the $Y$ combinator gives you fixed points only in the call-by-name calculus. You need a different one, $Z$ for the call-by-value calculus.

$\;\; Z \equiv_{\mathrm{def}} \lambda f.(\lambda g.(f\;\lambda h.((g\;g)\;h))\;\lambda g.(f\;\lambda h.((g\;g)\;h))) $

Exercise: Prove $Z$ does compute fixpoints in the CBV Calculus.

Why This Matters

This is awesome because now we have justified the use of recursion as a proper computational mechanism! Yes, that is a big deal.

Let’s see it in action with factorial. We are going to define:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; Y\;\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times f(n-1) \end{array}$

This is a proper definition because we are not using fact to define fact. We are in the clear. But does it actually work? Let’s try it. Let’s compute $\mathtt{fact}\;3$. Because these derivations can get really long, we’ll use $F$ to stand for the expression $\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1)$. Here goes:

$\;\; \begin{array}{l} \mathtt{fact}\;3 & = & Y\;F\;3 \\ & = & F\;(Y\;F)\;3 \\ & = & ((\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1))(Y\;F))\;3 \\ & = & \mathsf{if}\;3=0\;\mathsf{then}\;1\;\mathsf{else}\;3 \times (Y\;F)(3-1) \\ & = & 3 \times (Y\;F)(2) \\ & = & 3 \times ((\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1))(Y\;F))(2) \\ & = & 3 \times (\mathsf{if}\;2=0\;\mathsf{then}\;1\;\mathsf{else}\;2 \times (Y\;F))(2-1) \\ & = & 3 \times 2 \times (Y\;F)(1) \\ & = & 3 \times 2 \times 1 \times (Y\;F)(0) \\ & = & 3 \times 2 \times 1 \times 1 \\ & = & 6 \\ \end{array}$

Combinators

There is a word for an lambda expression with no free variables. It is called a combinator.They’re convenient to use since, well, using them in expressions means you have less variables floating around, so your expressions are simpler. And you can get good at working only on the level of the combinators, combining them to form powerful operators.

First, how about a big list of combinators?

Comb	Name	Definition	Alt Def	Facts
$M$	Mockingbird	$\lambda x. x\;x$	$SII$	$Mx = xx$
$I$	Identity Bird	$\lambda x. x$	$SKK$	$Ix = x$
$K$	Kestrel	$\lambda x. \lambda y. x$		$Kxy = x$
$T$	Thrush	$\lambda x. \lambda y. y\;x$	$CI$	$Txy = yx$
$W$	Warbler	$\lambda x. \lambda y. x\;y\;y$	$C(BMR)$	$Wxy = xyy$
$L$	Lark	$\lambda x. \lambda y. x\;(y\;y)$	$CBM$	$Lxy = x(yy)$
$O$	Owl	$\lambda x. \lambda y. y\;(x\;y)$	$SI$	$Oxy = y(xy)$
$U$	Turing	$\lambda x. \lambda y. y\;(x\;x\;y)$	$LO$	$Uxy = y(xxy)$
$S$	Starling	$\lambda x. \lambda y. \lambda z. x\;z\;(y\;z)$		$Sxyz = xz(yz)$
$B$	Bluebird	$\lambda x. \lambda y. \lambda z. x\;(y\;z)$	$S(KS)K$	$Bxyz = x(yz)$
$R$	Robin	$\lambda x. \lambda y. \lambda z. y\;z\;x$	$BBT$	$Rxyz = yzx$
$C$	Cardinal	$\lambda x. \lambda y. \lambda z. x\;z\;y$	$RRR$	$Cxyz = xzy$
$V$	Vireo	$\lambda x. \lambda y. \lambda z. z\;x\;y$	$BCT$	$Vxyz = zxy$
$D$	Dove	$\lambda x. \lambda y. \lambda z. \lambda w. x\;y\;(z\;w)$	$BB$	$Dxyzw = xy(zw)$
$B_1$	Blackbird	$\lambda x. \lambda y. \lambda z. \lambda w. x\;(y\;z\;w)$	$BBB$	$B_1xyzw = x(yzw)$
$E$	Eagle	$\lambda x. \lambda y. \lambda z. \lambda w. \lambda v. x\;y\;(z\;w\;v)$	$B(BBB)$	$Exyzwv = xy(zwv)$
$KI$	Kite	$\lambda x. \lambda y. y$	$KI$	$KIxy = Iy = y$

Combinators form the basis of point-free programming and are the basis of concatenative programming languages.

If you are interested in this point-free, stuff, here’s the first of an entire video series called...Point-Free.

BIRDS!
You really need to read Raymond Smullyan’s To Mock a Mockingbird. Smullyan, like Turing, was a PhD student of Church’s. So was Kleene, Martin, Rosser, Rabin, Scott, Andrews, Davis. Woah.

Amazing Fun Fact
You can create every possible combinator from just $S$ and $K$.

More vocabulary: You can also call a combinator (an expression without free variables) a closed expression, and an expression with free variables an open expression.

The Typed λ-Calculus

Things get interesting when you allow your constants (and even your formulas) in the calculus to be typed.

Wikipedia has articles on Typed lambda calculii (of which there are many) and the simply-typed lambda calculus, which is the simplest kind of typed lambda calculus.

Recall Practice

Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.

Who invented the Lambda Calculus and why?
Alonzo Church, to study the foundations of mathematics and computation.
What is the Lambda Calculus most concerned with?
Functions.
In the Lambda Calculus, can functions have more than one parameter? More than one result?
No and no. One parameter only. One result only.
Why are computer scientists interested in the Lambda Calculus as a foundational theory?
Functions are dynamic and feel more like computation. Sets are kind of just static.
What is the main thing you do with functions?
Apply them to arguments.
How do we write the application of function $f$ to argument $x$?
Officially $(f\;x)$, but it is perfectly fine to write $f\;x$ or even $f(x)$.
Write two distinct lambda expressions for averaging two numbers, one using tuples and one that does not. Assume that there are pre-defined constants $\texttt{plus}$ and $\texttt{divide}$, as well as constants for all numbers.
$\lambda x. \lambda y. \texttt{divide}\;(\texttt{plus}\;x\;y)\;2$
$\lambda (x,y). \texttt{divide}(\texttt{plus}(x,y),2)$
What are the four types of expressions in the untyped applied Lambda Calculus?
Variables, Constants, Applications, Abstractions.
The expression $\lambda x. f\;(\lambda x. g\;y\;\lambda y. g)\;x\;y$ is a syntactic abbreviation for which proper expression?
$(\lambda x. ((f\;(\lambda x. ((g\;y)\;(\lambda y. g)))\;x)\;y))$
The expression $x\;\lambda y. y\;x$ is a syntactic abbreviation for which proper expression?
$(x\;(\lambda y. (y\;x)))$
What is a value in the Lambda Calculus?
An expression that is not an application.
What are the two parts of an application called?
The operator and the operand.
What is the syntax of an abstraction?
$(\lambda x. E)$
What are the two parts of an abstraction called?
The parameter and the body.
What are three conversions in Lambda Calculus computation?
Alpha, Beta, Eta.
Which conversion deals with renaming parameters?
Alpha
Which conversion deals with applying functions to arguments?
Beta
Which conversion deals with removing needless abstractions?
Eta
What can happen if $\alpha$-conversion or $\beta$-conversion is applied without taking into account its restrictions?
Free variables can become bound.
What can happen if $\eta$-conversion is applied without taking into account its restrictions?
Bound variables can become free.
What is a capture? What is an escape?
A capture is when a free variable becomes bound. An escape is when a bound variable becomes free.
What someone says “Which are the free variables in this expression?” what do they really mean?
They are asking “Which variable OCCURRENCES are free in this expression?”
Is $\texttt{plus}$ free in $\lambda y. \mathtt{plus}\;y\;1$? Why or why not?
No, because it is a constant.
Which occurrences of $x$ are free in $(\lambda x. \lambda y. x\;y)\;x$?
The last one only.
How do you read $E_1[E_2/x]$?
The proper substitution of $E_2$ for all free occurrences of $x$ in $E_1$.
Perform the substitution $(\lambda y. f\;x\;y)[y/x]$.
$(\lambda z. f\;y\;z)$
What is a redex?
A reducible expression.
What is a normal form?
An expression that cannot be reduced further.
What is interesting about the expression $(\lambda x. x\;x)\;(\lambda x. x\;x)$?
It has no normal form.
Can an expression have more than one normal form?
No, not really. They are all essentially the same, differing only in the names of bound variables.
What characterizes the call-by-value calculus?
Arguments are evaluated before the function is applied, and you may not reduce inside the body of an abstraction.
What characterizes the call-by-name calculus?
Arguments are not evaluated before the function is applied.
What are two well-known reduction strategies?
Normal order (NORS) and applicative order (AORS).
How do NORS and AORS differ?
NORS reduces the leftmost, outermost redex first. AORS reduces the leftmost, innermost redex first.
How do the applied and pure Lambda Calculi differ?
The applied Lambda Calculus has constants, while the pure Lambda Calculus does not.
How are true and false represented in the Pure Lambda Calculus?
$\lambda x. \lambda y. x$ and $\lambda x. \lambda y. y$
How do we represent the natural number $n$ in the Pure Lambda Calculus?
$\lambda f. \lambda x. f^n\;x$
How do you write $\textsf{let}\;x=1\;\textsf{in}E$ in the Lambda Calculus?
$(\lambda x. E)\;1$
How do you write the list $[1,2,3]$ in the Lambda Calculus (it’s ok to use function iteration notation)?
$(\texttt{pair}\;1\;(\texttt{pair}\;2\;(\texttt{pair}\;3\;\texttt{emptylist})))$
Why is recursion “hard” in the Lambda Calculus?
You can’t define a function in terms of itself.
How do we get the effect of recursion in the Lambda Calculus?
Fixpoint combinators.
What is a fixpoint of a function $f$?
A value $x$ such that $f(x)=x$.
How do you express the factorial function with $Y$?
$Y(\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1))$
What are the CBN and CBV fixpoint combinators called?
$Y$ and $Z$.
What does $YF$ produce when “reduced” one step?
$F\;(Y\;F)$
What is a combinator?
A lambda expression with no free variables.
Why is the study of combinators important?
They are the basis of point-free programming.
How is the mockingbird defined?
$M = \lambda x. x\;x$, or
$M\;x=x\;x$
Which are the only two birds you need?
The Starling and the Kestrel ($S$ and $K$).

Summary

We’ve covered:

Where it comes from and why it matters
Syntax
The Basic Rules
Examples
Lambdas in Major Programming Languages
Pure Lambda Calculus
Recursion and Fixed Points
Types