The Lambda Calculus

This little invention of Alonzo Church is really quite something.

The Background

So the Lambda Calculus is a theory of functions.

It was created by Alonzo Church (Turing’s PhD advisor) around 1930s when he was looking into the foundations of mathematics. It turns out his invention, simple as it is, is actually a universal model for all of computation (equivalent to Turing machines).

A good detailed technical overview is David F. Martin’s Notes on the Lambda Calculus. A long textbook chapter overview is Chapter 5 of the Slonneger and Kurtz Book. If you prefer Wikipedia, there’s an article for you there of course. And a really good one at the Stanford Encyclopedia of Philosophy.

If you like videos, here’s a good 12 minute one:

Why It Matters

Are you wondering why you’d care about this funny-named theory of functions? (If you said no, you may have had that question answered in the video above, so great!)

The usual answers for “why it matters” and “why study this thing” are that theories are helpful in organizing useful knowledge, and in particular, this Lambda Calculus thing turns out to, even in its simplicity, capture the essentials of functions! And as programmers, you know how important functions are to computation. Right? Functions in some sense are what we think of as computation:

functionio.png

And oh yeah, most modern programming languages have “lambdas” and they are powerful things.

The Basic Idea

So if you were making a theory of functions, how simple can you make it? Here’s how Church saw things:

Now it’s time to make a rigorous theory.

The Syntax

The Lambda Calculus is a formal calculus, which means it has a syntax that defines which formulas are legally well-formed (the formal part), and it has rules for transforming one formula into another (the calculus part). Church designed the calculus to mimic what we intuitively think functions should do. Now here’s the syntax of the Untyped Applied Lambda Calculus, given with a grammar:

Exp  → constant
     | variable
     | "(" Exp Exp ")"
     | "(" "λ" variable "." Exp ")"

You may have guessed from that lead in that there actually exist typed versions of the calculus (“The Typed $\lambda$ Calculus”) as well as non-applied versions (“The Pure $\lambda$ Calculus”). We’ll see those soon. But it’s best to start without types, and with constants. (The pure lambda calculus does not have constants.)

The syntax above is completely unambiguous, thanks to tons of parentheses. In practice, we can remove parentheses as long as we preserve the meaning under the assumptions that (1) the scope of the abstraction’s body goes as far to the right as possible, and (2) application associates to the left.

That might have sounded complicated so we’ll go over these in class:

Official, fully parenthesized expressionSimplified version
$(\mathtt{square}\;5)$$\mathtt{square}\;5$
$((\mathtt{plus}\;5)\;8)$$\mathtt{plus}\;5\;8$
$((((x\;y)\;z)\;w)\;x)$$x\;y\;z\;w\;x$
$(\mathtt{sin}\;(\mathtt{sqrt}\;2))$$\mathtt{sin}\;(\mathtt{sqrt}\;2)$
$((((x\;y)\;z)\;(w\;y))\;x)$$x\;y\;z\;(w\;y)\;x$
$(\lambda x. (x\;y))$$\lambda x. x\;y$
$((\lambda x. x)\;y)$$(\lambda x. x)\;y$
$(\lambda x. ((x\;y)\;(\lambda z. ((x\;y)\;z))))$$\lambda x. x\;y\;\lambda z. x\;y\;z$

How Does It All Work?

Before we lay out the rules, convince yourself that these three things make sense.

These operations have names:

ExampleNameWhat it says
$\begin{array}{l}\lambda x. \mathtt{hypot}\;x\;3 \Longrightarrow\\ \;\;\;\;\lambda y. \mathtt{hypot}\;y\;3\end{array}$ α‑conversion The name of the variable doesn’t really matter. (Usually)
$\begin{array}{l}(\lambda x.\mathtt{minus}\;x\;1)\;9 \Longrightarrow \\ \;\;\;\;\mathtt{minus}\;9\;1\end{array}$ β‑conversion We call functions by passing arguments to parameters, producing the body of the function with the parameter replaced with the argument. (Usually)
$\begin{array}{l}\lambda x. \mathtt{square}\;x \Longrightarrow\\ \;\;\;\;\mathtt{square}\end{array}$ η‑conversion You don’t need to wrap a lambda around a perfectly good function. Leave it alone. (Usually)

But why do we say “usually”? This isn’t hard, right? It’s not...hard...is it?

We have to be careful

Any one of these three conversions can go terribly wrong if you don’t do them right. If you naively make changes, the new expression will have a completely different meaning than the original. And that’s bad. Examples:

In the first two cases, the variable $y$ got captured. It became bound when it should have been free. In the third case, we lost the binding of $x$, it was bound and became free. Wait. Bound? Free?

Understanding Bound and Free Variables

The calculus was designed so that:

$\;\; \lambda x. \mathtt{plus}\;x\;1$

and:

$\;\; \lambda y. \mathtt{plus}\;y\;1$

represent the same thing. In other words, the variable in the abstraction doesn’t really matter too much. We should be able to substitute a different variable, but this does not always work. We cannot replace $y$ for $x$ in:

$\;\; \lambda x. \mathtt{plus}\;x\;y$

Because doing so would change the function from one that “adds its argument to global variable y” to one that “multiplies its argument by two.” Completely different!

How do we make restrictions on this substitution formal? First we define the terms bound and free so that in, say:

$\;\; \lambda x. x\;y$

$x$ is bound and $y$ is free. The intuition is a variable is bound if it is the variable of an abstraction in the scope of the abstraction) and free otherwise. To make this perfectly rigorous we define the sets $BV$ (bound variables) and $FV$ free variables as follows (table is from David F. Martin):

ExpressionFVBV
$k$ (a constant)$\varnothing$$\varnothing$
$x$ (a variable)$\{x\}$$\varnothing$
$(M\;N)$$FV(M) \cup FV(N)$$BV(M) \cup BV(N)$
$(\lambda x. M)$$FV(M) - \{x\}$$BV(M) \cup \{x\}$

Time for examples! Here are a handful of expressions, with the free variables underlined and the bound variables left alone:

Now we are ready to define the computation rules rigorously and formally so that nothing gets captured and everything works as expected.

Computing

Now here are the formal rules for conversions:

Rule NameHow it works
$\alpha$$\lambda x. M \Longrightarrow \lambda y. M[y/x]$
provided that $y \not\in FV(M)$
$\beta$$(\lambda x. M\;N) \Longrightarrow M[N/x]$
$\eta$$\lambda x. (M\;x) \Longrightarrow M$
provided that $x \not\in FV(M)$

where the notation $M[N/x]$ means the proper substitution of $N$ for all free occurrences of $x$ in $M$, and is defined like this:

$E$$E[M/x]$Rationale
$k$$k$Just a constant, nothing to sub
$x$$M$$x$ is free, good to sub
$y$
where $y \neq x$
$y$Nothing to sub
$(N_1\;N_2)$$(N_1[M/x]\;N_2[M/x])$(Properly) sub into both
$\lambda x. N$$\lambda x. N$$x$ is bound, nothing to sub
$\lambda y. N$
where $y \neq x$
if $y\not\in FV(M) \vee x\not\in FV(N)$ then
$\;\;\;\; \lambda y. N[M/x]$
else
$\;\;\;\; \lambda z.((N[z/y])[M/x])$
$\;\;\;\;\;\;\;\;$ where $z$ is a new variable
$\;\;\;\;\;\;\;\;$not free in $M$ nor $N$
If $M$ has a free $y$, and we had an $x$ in $N$ to substitute over, then $y$ would get captured! So in this case, we have to “rename” $y$ to a new fresh $z$ before substituting for $x$.

Different Reduction Orders

You know, you can reduce an expression different ways:

Applicative Order
Fully evaluate the arguments of a function before “calling” the function:

$\begin{array}{l} (\lambda x. x + x)((\lambda x. 3x + 1) 5) \\ \;\;\;\;\Longrightarrow (\lambda x. x + x)(3 \times 5 + 1) \\ \;\;\;\;\Longrightarrow (\lambda x. x + x)(16) \\ \;\;\;\;\Longrightarrow 16 + 16 \\ \;\;\;\;\Longrightarrow 32 \end{array}$

Normal Order
Evaluate from the “outside-in”:

$\begin{array}{l} (\lambda x. x + x)((\lambda x. 3x + 1) 5) \\ \;\;\;\;\Longrightarrow (\lambda x. 3x + 1)5 + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow (3 \times 5 + 1) + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow 16 + (\lambda x. 3x + 1)5 \\ \;\;\;\;\Longrightarrow 16 + (3 \times 5 + 1) \\ \;\;\;\;\Longrightarrow 16 + 16 \\ \;\;\;\;\Longrightarrow 32 \end{array}$

Church-Rosser Theorem

Wikipedia says: In lambda calculus, the Church–Rosser theorem states that, when applying reduction rules to terms, the ordering in which the reductions are chosen does not make a difference to the eventual result.

Lambdas IRL

Most major programming languages have “lambdas” a.k.a “anonymous functions.” Let’s start easy, with the square function:

LanguageThe Square Function
JavaScriptx => x ** 2
Pythonlambda x: x ** 2
Ruby->(x) {x ** 2}
Luafunction (x) return x ^ 2 end
Kotlin
Swift{$0 * $0}
Rust
Perl
Java(x) -> x * x
C++
C#
Julia
Verse
Haskell
K

Lambdas show up a lot when used with functions like map, filter, and reduce. Here are examples of how different languages specify a function to compute the sum of the squares of the even numbers of a list:

LanguageSum of squares of evens
JavaScript
Python
Ruby
Lua
Kotlin
Swift
Rust
Perl
Java
C++
C#
Julia
Verse
Haskell
K

The Pure λ-Calculus

Believe it or not, we can build up all the constants of mathematics from lambda expressions. You don’t need numbers or booleans or lists or anything, all you need are functions. Nothing but functions. You get the objects you need by defining them to be...functions! Here are a few:

$\begin{array}{l} \;\; 0 \equiv \lambda f. \lambda x. x \\ \;\; 1 \equiv \lambda f. \lambda x. f\;x \\ \;\; 2 \equiv \lambda f. \lambda x. f\;(f\;x) \\ \;\; 3 \equiv \lambda f. \lambda x. f\;(f\;(f\;x)) \\ \;\; 4 \equiv \lambda f. \lambda x. f\;(f\;(f\;(f\;x))) \\ \;\; \vdots \end{array}$

The number $n$ is a function that takes in a function $f$ and some $x$ and applies $f$ $n$ times to $x$. So this means:

$ \;\; \mathtt{successor} \equiv \lambda n. \lambda f. \lambda x. f\;(n\;f\;x) $

And then $\mathtt{plus}\;m\;n$ is applying the successor function $m$ times to $n$:

$ \;\; \mathtt{plus} \equiv \lambda m. \lambda n. m\;\mathtt{successor}\;n $

with multiplication following from that in a straightforward way:

$ \;\; \mathtt{times} \equiv \lambda m. \lambda n. m\;(\mathtt{plus}\;n)\;0 $

There are more “direct” ways to do addition and multiplication:

$\begin{array}{l} \;\; \mathtt{plus} \equiv \lambda m. \lambda n. \lambda f. \lambda x. m\;f\;(n\;f\;x)\\ \;\; \mathtt{times} \equiv \lambda m. \lambda n. \lambda f. m\;(n\;f) \end{array}$

And amazingly, exponentiation turns out to be just this:

$ \;\; \mathtt{pow} \equiv \lambda b. \lambda e. e\;b $

How about Booleans?

$\begin{array}{l} \;\; \mathtt{true} \equiv \lambda x. \lambda y. x \\ \;\; \mathtt{false} \equiv \lambda x. \lambda y. y \\ \;\; \mathtt{not} \equiv \lambda p. p\;\mathtt{false}\;\mathtt{true} \\ \;\; \mathtt{and} \equiv \lambda p. \lambda q. p\;q\;p \\ \;\; \mathtt{or} \equiv \lambda p. \lambda q. p\;p\;q \\ \;\; \mathtt{ifthenelse} \equiv \lambda r. \lambda p. \lambda q. r\;p\;q \end{array}$

We can do tuples:

$\begin{array}{l} \;\; \mathtt{pair} \equiv \lambda x. \lambda y. \lambda f. f\;x\;y \\ \;\; \mathtt{first} \equiv \lambda p. p\;\mathtt{true} \\ \;\; \mathtt{second} \equiv \lambda p. p\;\mathtt{false} \\ \end{array}$

A list is either empty or a pair consisting of an item (its head) and a list (its tail):

$ \;\; \mathtt{emptylist} \equiv \lambda x. \mathtt{true} $

Example: Here are how some lists are represented:
Sugared Notationλ-Expression
[]emptylist
[3](pair 3 emptylist)
[8, 3](pair 8 (pair 3 emptylist))
[5, 8, 3](pair 5 (pair 8 (pair 3 emptylist)))

There is a function to test whether a list is empty:

$ \;\; \mathtt{isempty} \equiv \lambda p. p\;(\lambda x. \lambda y. \mathtt{false}) $

Exercise: Show the reduction of the expressions (isempty emptylist) and (isempty [1, 5, 8])
What’s the point?

It’s just to show you could, if you wanted to, build up math from just functions.

Functions all the way down.

Let, Where, and If

When you apply a function to an argument, you are LET-ting the parameter be the argument within the body of the function. So writing the application:

$\;\; (\lambda x. \mathtt{plus}\;5\;x)\;3$

could be written as:

$\;\; \mathsf{let}\;x=3\;\mathsf{in}\;(\mathtt{plus}\;5\;x)$

We sometimes see it written as this as well:

$\;\; (\mathtt{plus}\;5\;x)\;\mathsf{where}\;x=3$

All three of these are the same $\lambda$-expression. The symbols $\mathsf{let}$, $\mathsf{in}$, and $\mathsf{where}$ are symbols of the sugared $\lambda$ Calculus itself; they are not constants!

Another convenient form is:

$\;\; \mathsf{if}\;E_1\;\mathsf{then}\;E_2\;\mathsf{else}\;E_3$

You can use it in a computation as in this example:

$\begin{array}{l} \;\; (\lambda x. \mathsf{if}\;\mathtt{less}\;x\;3\;\mathsf{then}\;\mathtt{times}\;2\;x\;\mathsf{else}\;\mathtt{plus}\;x\;5)\;10 \\ \;\;\;\;\;\; \Longrightarrow \mathsf{if}\;\mathtt{less}\;10\;3\;\mathsf{then}\;\mathtt{times}\;2\;10\;\mathsf{else}\;\mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow \mathsf{if}\;\mathsf{false}\;\mathsf{then}\;\mathtt{times}\;2\;10\;\mathsf{else}\;\mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow \mathtt{plus}\;10\;5 \\ \;\;\;\;\;\; \Longrightarrow 15 \end{array}$

Other Syntactic Sugar

It’s fine to use crazy mathematical notation like infix operators with funny symbols. Here are some examples. (You can use your own sugar too, if you can get folks to agree to use them.)

Standard FormSugared Form
(plus $x$ $y$)$x + y$
(minus $x$ $y$)$x - y$
(times $x$ $y$)$x \times y$
(divide $x$ $y$)$x \div y$
(remainder $x$ $y$)$x \;\mathtt{\%}\; y$
(power $x$ $y$)$x^y$
(tetrate $x$ $y$)$x \uparrow\uparrow y$
(pentate $x$ $y$)$x \uparrow\uparrow\uparrow y$
(negate $x$)$-x$
(sqrt $x$)$\sqrt{x}$
(factorial $x$)$x!$
(exp $x$)$e^x$
(floor $x$)$\lfloor x \rfloor$
(ceil $x$)$\lceil x \rceil$
(less $x$ $y$)$x < y$
(lesseq $x$ $y$)$x \leq y$
(eq $x$ $y$)$x = y$
(noteq $x$ $y$)$x \not= y$
(greater $x$ $y$)$x > y$
(greatereq $x$ $y$)$x \geq y$
(union $x$ $y$)$x \cup y$
(intersection $x$ $y$)$x \cap y$
(pair $x$ $y$)$(x, y)$
(pair $x$ (pair $y$ $z$))$(x, y, z)$
emptylist$[]$
(pair $x$ emptylist)$[x]$
(pair $x$ (pair $y$ emptylist))$[x,y]$
(pair $x$ (pair $y$ (pair $z$ emptylist)))$[x,y,z]$

Recursion

So, lambda calculus is supposedly this model of computation. We saw it can do booleans, logic, numbers, arithmetic, function invocation, and even conditional evaluation. But can it do iteration (i.e., loops and recursion)?

Why It Is Hard

How can we define the factorial function? First note this does NOT work:

$\;\; \mathtt{fact} \equiv_{\mathtt{def}} \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times \mathtt{fact}(n-1)$

We’re trying to define fact but assuming it already exists? That’s not right! We have to evaulate the right hand side first, then bind it to fact.

Perhaps what was meant was:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times (\lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times \ldots)(n-1) \end{array}$

which is nonsense. We can’t have infinitely long expressions. The whole point of computation is to find finite representations (programs, functions, what have you) of things.

What we want is something like this:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \textrm{the function}\;f\;\textrm{such that}\;f = \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times f(n-1) \end{array}$

But how do we possibly find such an $f$? What sorcery is required?

Fixpoints

This is wild indeed so hang on. How do we deal with these “such that” questions? How do we find values that “satisfy” statements? Sometimes we can just guess:

How do we make this more rigorous? We use the idea of a fixpoint. A fixpoint of a function $f$ is a value $x$ such that $f(x)=x$. That’s it. That’s all it is. Examples are helpful:

FunctionFixpoints
$\lambda x. \cos x$$0.73908513321516064166\ldots$
$\lambda x. x^0$$1$
$\lambda x. 6-x^2$$2$
$-3$
$\lambda x. x!$$0$
$1$
$\lambda x. x^2-x$$0$
$2$
$\lambda x. x$all values are fixpoints
$\lambda x. x + 1$no fixpoints (well JS has NaN)

Now, if there existed a function, called, say, fix, that would find fixpoints of a function for us, then we could write:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; \mathtt{fix}\;\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1) \end{array}$

See why? Because what is the fixpoint of $\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1)$? It’s the function $f$ such that.... You get the idea.

The Trick

Now how do we write fix? First, understand what it is supposed to do. $\mathtt{fix}\;f$ must produce a fixpoint of $f$, namely some value for which $x = f(x)$. So that means that fix must satisfy the equation:

$ \;\; \mathtt{fix}\;f = f(\mathtt{fix}\;f) $

How even can we find what fix is? Well someone did, and they called it $Y$:

$ \;\; Y \equiv_{\mathrm{def}} \lambda f. (\lambda g. (f\;(g\;g))\;\lambda g.(f\;(g\;g))) $

Wait, what? Seriously? Let’s see:

$\;\; \begin{array}{l} Y\;f & = & \lambda g. (f\;(g\;g))\;\lambda g.(f\;(g\;g)) \\ & = & f\;(\lambda g.(f\;(g\;g))\;\lambda g.(f\;(g\;g))) \\ & = & f\;(Y\;f) \end{array}$

🤯

For completeness, the $Y$ combinator gives you fixed points only in the call-by-name calculus. You need a different one, $Z$ for the call-by-value calculus.

$\;\; Z \equiv_{\mathrm{def}} \lambda f.(\lambda g.(f\;\lambda h.((g\;g)\;h))\;\lambda g.(f\;\lambda h.((g\;g)\;h))) $

Exercise: Prove $Z$ does compute fixpoints.

Why this Matters

This is awesome because now we have justified the use of recursion as a proper computational mechanism! Yes, that is a big deal.

Let’s see it in action with factorial. We are going to define:

$\begin{array}{l} \;\; \mathtt{fact} \equiv_{\mathtt{def}} \\ \;\;\;\;\;\; Y\;\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;x \times f(n-1) \end{array}$

This is a proper definition because we are not using fact to define fact. We are in the clear. But does it actually work? Let’s try it. Let’s compute $\mathtt{fact}\;3$. Because these derivations can get really long, we’ll use $F$ to stand for the expression $\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1)$. Here goes:

$\;\; \begin{array}{l} \mathtt{fact}\;3 & = & Y\;F\;3 \\ & = & F\;(Y\;F)\;3 \\ & = & ((\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1))(Y\;F))\;3 \\ & = & \mathsf{if}\;3=0\;\mathsf{then}\;1\;\mathsf{else}\;3 \times (Y\;F)(3-1) \\ & = & 3 \times (Y\;F)(2) \\ & = & 3 \times ((\lambda f. \lambda n. \mathsf{if}\;n=0\;\mathsf{then}\;1\;\mathsf{else}\;n \times f(n-1))(Y\;F))(2) \\ & = & 3 \times (\mathsf{if}\;2=0\;\mathsf{then}\;1\;\mathsf{else}\;2 \times (Y\;F))(2-1) \\ & = & 3 \times 2 \times (Y\;F)(1) \\ & = & 3 \times 2 \times 1 \times (Y\;F)(0) \\ & = & 3 \times 2 \times 1 \times 1 \\ & = & 6 \\ \end{array}$

The Typed λ-Calculus

Things get interesting when you allow your constants (and even your formulas) in the calculus to be typed.

Wikipedia has articles on Typed lambda calculii (of which there are many) and the simply-typed lambda calculus, which is the simplest kind of typed lambda calculus.

Summary

We’ve covered:

  • Where it comes from and why it matters
  • Syntax
  • The Basic Rules
  • Examples
  • Lambdas in Major Programming Languages
  • Pure Lambda Calculus
  • Recursion and Fixed Points
  • Types