We’ll begin with this definition from an actual respected dictionary:
theory (n.)
- A set of statements or principles devised to explain a group of facts or phenomena, especially one that has been repeatedly tested or is widely accepted and can be used to make predictions about natural phenomena.
- The branch of a science or art consisting of its explanatory statements, accepted principles, and methods of analysis.
Sure, sometimes folks use the word “theory” for a hunch or a supposition, but in the arts and sciences, a theory is an organized body of knowledge with explanatory and predictive powers.
A theory is an organized body of knowledge with explanatory and predictive powers.
We build a theory to organize our knowledge and develop a vocabulary that we can use to reason about things, communicate with others, make predictions, and generate new knowledge.
Without a theory, we can still do things, but not as well: there’s a lot of improvisation, difficulty in communicating ideas, guess work, and flailing around. Your practice will not be as strong, and your insights may be limited. Let’s see how David Bennett expertly makes this point. The key ideas are expressed from 17:21-18:42 (though the whole video is interesting):
Music theory can be helpful in creating music, but can it help us enjoy music more? Certainly! After all, theoretical knowledge adds to our awe of that which we study, allowing us to appreciate the beauty of something at more than just one level:
The field we now call Computer Science has a long and fascinating history. Several theories emerged as the discipline took shape. A computer scientist has a working knowledge of these theories and employs them to not only create useful applications and systems, but to advance the state of the art. There are four major theories under the umbrella term “The Theory of Computation.” Two are rather fundamental:
How are computations expressed?
How are computations carried out?
The third and fourth get into certain specifics but have deep and fascinating philosophical roots:
What are the fundamental limits of computation?
What resources are required to perform certain computations?
To study and apply these theories, we need some context.
Computation augments and amplifies human thought. Modern computers speed up time to such an extent that our reasoning powers are at a qualitatively different level than what we had before these machines. These computational theories help us qualify and quantify how exactly this augmentation occurs. They enable us to ask and answer questions such as
In order to be accepted and useful, these theories must be grounded with practical axioms that match our intuition of what computations actually are. We will take as given that a computer is a finite device and a program is a finite set of instructions.
Also...The areas of computer science in which theory is most directly applied are often the most interesting, important, and the most lucrative for practitioners. These include cybersecurity (esp. cryptology), AI, (esp. machine learning), advanced data structures such as those powering blockchains, distributed computing, and novel hardware architectures.
Language Theory is concerned with how information and computation are expressed. It begins with the question “What is information?” and approaches the question with the hypothesis that any piece of (human-usable) information can be represented as a string of symbols. To see why this hypothesis is useful, note these examples:
91772.3e-8Stay behind the yellow line!ᐊᐃᖓᐃ ᓄᓇᕐᔪᐊᖅ/Pecs/Los Angeles/Berlin/Madrid//1 2/3 0/2 2/2 0/3 2///(* (+ 4 3) (- 9 7))int average(int x, int y) {return (x + y) / 2;}<dog id="30024"><name>Lisichka</name><age>13</age></dog>{"type": "dog", "id": "30024", "name": "Lisichka", "age": 13}∀α β. α⊥β ⇔ (α•β = 0)1. f3 e5 2. g4 ♛h4++Symbols are chosen from some finite alphabet, such as $\{0,1,2,3,4,5,6,7,8,9\}$ for the language of natural numbers, or $\{0,1,2,3,4,5,6,7,8,9,.,\textrm{e},\textrm{E},+,-\}$ for the language of floating-point numbers. What is a language? A language (in Formal Language Theory) is a set of strings over an alphabet; that is, some strings are in-the-language and some are not. For example, in the language-of-floating-point-numbers, 3.55E+21 is in the language, but +-9-E...2 is not. Beyond formal language theory, we consider a language to also contain an assignment of “meaning” to each string in the language.
Here’s an obvious, but important, point worth stating: strings in a language represent not only numbers, text, and complex data structures, but also abstractions of computations known as functions, which produce output information from input information. Examples:
double, which takes a number and returns twice that number, e.g., double(3.5) $\Longrightarrow$ 7plus, which takes a pair of numbers and returns their sum, e.g., plus(4, -3) $\Longrightarrow$ 1nextMove, which takes a game state and returns the next (best?) move that should be madeupperCase, which takes a piece of text and returns a string like it in every way except all letters are in upper case, e.g., upperCase("Please don’t shout!") $\Longrightarrow$ "PLEASE DON’T SHOUT!"sorted, which takes a list of things and returns a new list containing the same values in sorted order, e.g., sorted([4, 7, 1]) $\Longrightarrow$ [1, 4, 7]even, which takes a number and returns true or false depending on whether the number is even, e.g., even(797) $\Longrightarrow$ falsetwice, which takes a function and a value and applies the function to the result of applying the function to the value, e.g., twice(double, 5) $\Longrightarrow$ double(double(5)) $\Longrightarrow$ double(10) $\Longrightarrow$ 20javac, which takes a string in the Java programming language and returns an executable class filejava, which takes a class file and returns (the effect of) its executionpython, which takes a string in the Python programming language and returns (the effect of) its execution, e.g., python("print(8 * 5 - 2 / 1)") $\Longrightarrow$ {stdout: "38"}But how to we express computations? We can design languages to do this. We can make practical languages, or we can try to see how primitive we can make them. Alonzo Church invented the Lambda Calculus to show that function definition and application could serve as a foundation for mathematics. Here are example functions:
λx. not (equal (mod x 2) 0)
λp. λn. λr. λt. times p (power (plus 1 (divide r n)) (times n t))
Surface-level abbreviations (syntactic sugar) are okay:
To make things even more practical, real programming languages have been created. Here’s the “is even” computation expressed in a few:
(LAMBDA (x) (NOT (EQUAL (MOD x 2) 0))) (Lisp)
lambda x: x % 2 != 0 (Python)
x => x % 2 != 0 (JavaScript)
(x) -> x % 2 != 0 (Java)
->(x) {x % 2 != 0} (Ruby)
#(not= (mod % 2) 0) (Clojure)
{$0 % 2 != 0} (Swift)
|x| { x % 2 != 0 } (Rust)
In Language Theory, particularly in Programming Language Theory, we study computation expression, as well as the data upon which they operate, and the modules, objects, and control structures which house these functions. We study how syntax definitions can be written to define which strings belong to the language, and how semantic definitions can give meaning to these strings. We even study how these languages are used by human beings, something we call pragmatics. But how can we actually compute (or “carry out”, or “evaluate”) these functions? For that we turn to automata theory.
In Automata Theory, we build and study formal models of computational devices (i.e., computers). Historically, the theory evolved by first examining computing devices through history, then coming up with primitive actions that could capture any kind of computation.
People have designed or built calculating machines for centuries. Notable examples include:
“Babbage never foresaw the terrible consequences of his invention—a machine that would autocorrect his name to ‘cabbage’ every single time.” —Philomena Cunk
These were all amazing feats of engineering and human ingenuity in their own right, but Turing was one of the first to capture the essence of computation in a theoretical computing device.

When Turing was in his early 20s, he set out to prove that no effective procedure could decide any arbitrary formal statement of mathematics. To do so, he needed to define, formally, what was meant by computation, so he created a simple machine model, or automaton, to precisely capture the very notion of computation. He called machines derived from this model a-machines, but we know them today as Turing Machines. Turing Machines are by no means the only kind of automaton that can carry out computations, but they are special, for a variety of reasons.
Turing set out to formalize what (human) computers did: they followed a set of instructions in which they examined some portion of their paper worksheet, wrote something down or erased something, then moved somewhere else or finished with their result. The actual instructions were based on the human’s state of mind and what was already on the paper. He realized the contents of a two-dimensional sheet of paper could be just as easily represented as a one-dimensional sequence of symbols. So he then reduced the idea to a machine operating on an unbounded sequential paper tape with a read-write head initially pointing to the leftmost symbol of the input. At each step, the machine reads the current symbol, and depending on its current state, (1) optionally rewrites it with a different symbol, (2) optionally moves one square to the left or right, and (3) optionally enters a new state.
Turing’s StoryHistory is important in understanding just about anything. The story of what motivated Turing to work on the foundations of computing and how he arrived at these fascinating machines, and his brilliant insights into the nature of computation are told very well in the book Turing's Vision: The Birth of Computer Science
That book is well worth a read.
Let’s see a concrete example. Let’s multiply (a binary numeral) by four. How? We just append two zeros to the end of the input! But how can we describe these steps more rigorously with a Turing Machine? Like this:
Here is how our machine operates with the number five (binary 0101) initially on the tape (And oh, following convention, we use # for blank squares, just because it is easier to see!):

Nice! 5 × 4 is indeed 20 (binary 010100). Now let’s capture the multiply-by-four machine in all its glorious generality:
| When in state | Looking at | Perform these actions |
|---|---|---|
| Scanning | 0 | Move right |
| Scanning | 1 | Move right |
| Scanning | # | Print 0, Move right, Go to Doubled state |
| Doubled | # | Print 0, Go to Quadrupled state |
This says: starting in the Scanning state, while you see a 0 or 1, just move right, but if you see a blank, replace with a 0, move right, and go to the Doubled state. In this state, if you see a blank, replace with a 0 and go to the Quadrupled state. There is no place to go from here, so we’re done.
We can give a pictorial representation of the TM program as a state diagram:

We will generate Turing Machines to (a) floor-divide by four, (b) increment a binary number by 1, (c) compute the two’s complement of a binary number, and (d) compute the value of a binary number modulo 2.
Turing Machines are way too primitive to use for programming efficiently, but their very primitive nature makes them useful to study the nature of computation. We’ll be looking at:
Turing Machines are not the only model of what we think of as computation (though they might be the most famous). Other models that have been created include:
Some have equivalent “computability and recognition power” as Turing Machines, while others do not. But none have greater computing power! But wait, why not? That’s the domain of our next theory—computability theory.
Now for our next philosophical question. Can every function be represented as a string of symbols that we can process with a machine? In other words, is there a program for every function?
It turns out that the answer is no.
Why?
Programs are just strings, so we can list them in increasing alphabetic order. Programs can be thought of as representing functions in $\mathbb{N} \rightarrow \mathbb{N}$ (since all data can be encoded in binary). Call the enumeration $f_0, f_1, f_2, \ldots$. But $\lambda n. f_n(n)+1$ isn’t in the list! Bottom line:
There exist functions that cannot be computed by any computing machine.
Hey, this is philosophically interesting.
This explanation of why non-computable functions exist was given rather nonchalantly. Did it make sense? It might not have right away because it is an instance of a very powerful proof technique called diagonalization which was at first somewhat controversial. Carry out this proof by sketching the hypothetical list of all functions as a matrix: each row is a function and each column is the argument, and each cell is the result of applying the function (row) to the argument (column). Look at how $\lambda n. f_n(n)+1$ is generated. Do you see why the technique is called diagonalization? Do you see why it works? 🤯
So which functions can be computed? People have, over the years, created many models of computation—the Lambda Calculus, Post Production Systems, Phrase Structure Grammars, Partial Recursive Functions, Random Access Machines, and more. But get this: all have turned out to be equal in power, in terms of what they can compute, to the Turing machine! Even enhancing the basic Turing Machine design with non-determinism or even multiple tapes won’t increase its computing power. These facts suggest the class of computable functions is fundamental. The Church-Turing Thesis claims that what we intuitively understand as “the computable functions” are precisely those for which a Turing Machine can be given, i.e.,
A function $f$ is computable if and only if there exists a Turing Machine that outputs $f(x)$ when started on input $x$.
Remember that Universal Turing Machine briefly mentioned above? It is “universal” because it can simulate any Turing Machine. If the Church-Turing Thesis is true, it’s really universal: it can compute any computable function.
Computability Theory is concerned with figuring out which functions are and are not computable.
Turing CompleteAny computational model that can simulate a Turing Machine is called Turing Complete. Being Turing-Complete is a badge of honor—it means your model can compute anything that is computable.
Read more at Wikipedia. For a technical overview of Turing-Complete models, see Wolfgang Schreiner’s presentation
How about an example? We’ll use JavaScript rather than TMs to simplify the explanation. Let’s write a function called halts that will determine if a function halts on a given input. So we would like:
halts(Math.sqrt, 100) ⟶ true
halts(x => {while (x) {}}, true) ⟶ false
The idea is simple. Just analyze the input and see if it will halt. But it turns out to be logically impossible to write this function. Why? Assume you can write it. Then consider:
function fox(f) { if halts(f, f) while (true) {} }
Now consider the expression halts(fox, fox). Really consider it. Work it out. See what it is saying. Got it? Woah. We have an antinomy. The assumption you could write halts leads to a contradiction, so you can not write this function to work for all inputs. Interestingly, you can write:
function halts(f, x) { f(x); return true; }
So if f(x) halts, you’ll eventually know it, but if it doesn’t, we may never know.
Computability theory gives us theoretical limits on what can and cannot be computed, but doesn’t tell us what is practically computable. (A problem requiring $2^{100}$ years to complete is not worth running.) The question “What resources are required to perform certain computations?” or in other words “How much time and memory is needed to solve this problem?” give rise to Complexity Theory.
Complexity Theory gives us results such as “This function requires ________________ to compute”:
Here time refers to the number of basic steps a computation takes on a Turing Machine as a function of the size of the input, while space refers to the amount of memory used during the computation not counting the input itself.
It gets (massively) more interesting, because the theory deals with time and space performance on non-deterministic machines, and parallel machines, and interactive machines, and quantum computers, too. It even considers probabilistic algorithms. Interested? Check out the amazing Complexity Zoo.
Complexity Theory has philosophical as well as practical use. Make sure you are familiar with one of the most famous open questions in all of complexity theory, and perhaps even all of computer science, namely: Does $P = NP$? (Solve it and you will win a million dollars!)
Here’s another great explainer:
There is so, so, much information out there about the P vs. NP question as it really seems to captivate the public.
I, Scott Aaronson, hereby formally disown my statement about how “everyone who could appreciate a symphony would be Mozart” if P=NP.
In 2014, he wrote a new post on the scientific case for P≠NP. Read the later post. What do you think?We’ve only briefly introduced the major concerns of four computer science theories without any real depth (or breadth). We’ll have a chance to see each theory in more detail later, covering aspects such as:
In much the same way that Bennett pointed out that music theory can expand your abilities in music composition, and Feynman spoke of theoretical knowledge in science only expanding and enhancing the feelings of awe and beauty when contemplating nature, might computer science theory enhance one’s imagination for creating new worlds that Booch talks about here?

Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
We’ve covered: