Theories in Computer Science

To develop expertise in any endeavor, you need both practice and theory. Computer Science is a discipline with many practices and many theories. Let’s visit a few of the major theories—the science in Computer Science. Why? Don’t worry, we’ll explain the usefulness of studying theory, too.

What is a Theory?

We’ll begin with this definition from an actual respected dictionary:

theory (n.)
  1. A set of statements or principles devised to explain a group of facts or phenomena, especially one that has been repeatedly tested or is widely accepted and can be used to make predictions about natural phenomena.
  2. The branch of a science or art consisting of its explanatory statements, accepted principles, and methods of analysis.

Sure, sometimes folks use the word “theory” for a hunch or a supposition, but in the arts and sciences, a theory is an organized body of knowledge with explanatory and predictive powers.

Why Study Theory?

We build a theory to organize our knowledge and develop a vocabulary that we can use to reason about things, communicate with others, make predictions, and generate new knowledge.

Without a theory, we can still do things, but not as well: there’s a lot of improvisation, difficulty in communicating ideas, guess work, and flailing around. Your practice will not be as strong, and your insights may be limited. Let’s see how David Bennett expertly makes this point. The key ideas are expressed from 17:21-18:42 (though the whole video is interesting):

Music theory can be helpful in creating music, but can it help us enjoy music more? Certainly! After all, theoretical knowledge adds to our awe of that which we study, allowing us to appreciate the beauty of something at more than just one level:

Theories of Computation

The field we now call Computer Science has a long and fascinating history. Several theories emerged as the discipline took shape. A computer scientist has a working knowledge of these theories and employs them to not only create useful applications and systems, but to advance the state of the art. There are four major theories under the umbrella term “The Theory of Computation.” Two are rather fundamental:

Language Theory

How are computations expressed?

Automata Theory

How are computations carried out?

The third and fourth get into certain specifics but have deep and fascinating philosophical roots:

Computability Theory

What are the fundamental limits of computation?

Complexity Theory

What resources are required to perform certain computations?

To study and apply these theories, we need some context.

Computation augments and amplifies human thought. Modern computers speed up time to such an extent that our reasoning powers are at a qualitatively different level than what we had before these machines. These computational theories help us qualify and quantify how exactly this augmentation occurs. They enable us to ask and answer questions such as

In order to be accepted and useful, these theories must be grounded with practical axioms that match our intuition of what computations actually are. We will take as given that a computer is a finite device and a program is a finite set of instructions.

Exercise: Why are the assumptions of finiteness not only reasonable but essential?
We cannot describe, communicate, or comprehend an infinite set of instructions. We cannot build, let alone describe, an infinite computing device.

Language Theory

Language Theory is concerned with how information and computation are expressed. It begins with the question “What is information?” and approaches the question with the hypothesis that any piece of (human-usable) information can be represented as a string of symbols. To see why this hypothesis is useful, note these examples:

Symbols are chosen from some finite alphabet, such as $\{0,1,2,3,4,5,6,7,8,9\}$ for the language of natural numbers, or $\{0,1,2,3,4,5,6,7,8,9,\textrm{e},\textrm{E},+,-\}$ for the language of floating-point numbers. What is a language? A language (in Formal Language Theory) is a set of strings over an alphabet; that is, some strings are in-the-language and some are not. For example, in the language-of-floating-point-numbers, 3.55E+21 is in the language, but +-9-E...2 is not. Beyond formal language theory, we consider a language to also contain an assignment of “meaning” to each string in the language.

Here’s an obvious, but important, point worth stating: strings in a language represent not only numbers and text and complex data structures, but also abstractions of computations known as functions, which produce output information from input information. Examples:

Functions are applied to arguments to produce their results. (The application is sometimes called a invocation or a call.) For example:

Functions can be expressed in so many ways. Alonzo Church invented the Lambda Calculus to show that function definition and application could serve as a foundation for mathematics. Here are example functions:

Abbreviations are okay:

There are other notations, too:

In Language Theory, we study how to express these functions, as well as the data upon which they operate, and the modules, objects, and control structures which house these functions. We study how syntax definitions can be written to define which strings belong to the language, and how semantic definitions can give meaning to these strings. We even study how these languages are used by human beings, something we call pragmatics. But how can we actually compute (or “carry out”, or “evaluate”) these functions? For that we turn to automata theory.

Automata Theory

In Automata Theory, we build and study formal models of computational devices (i.e., computers). We can start to develop a theory by thinking about how it is we expect our functions to be formally defined, and in some sense, how they “work”.

Early Computing Machines

Of course, people have entertained the idea of building calculating machines for centuries. Notable examples include:

These were all amazing feats of engineering and human ingenuity in their own right, but it wasn’t until Turing that we finally had a solid understanding of what computation really is.

“Babbage never foresaw the terrible consequences of his invention—a machine that would autocorrect his name to ‘cabbage’ every single time.” —Philomena Cunk

Turing Machines

turing.jpeg

When Turing was in his early 20s, he set out to prove Hilbert wrong by solving the Entscheidungsproblem in the negative. To do so, he needed to define, formally, what was meant by computation, so he created a simple machine model, or automaton, to capture the essence of computation (or algorithm, or effective procedure). He called machines derived from this model a-machines, but we know them today as Turing Machines. Turing Machines are by no means the only kind of automaton that can carry out computations, but they are special, for a variety of reasons.

Turing set out to formalize what (human) computers did: they followed a set of instructions in which they examined some portion of their paper worksheet, wrote something down or erased something, then moved somewhere else or finished with their result. The actual instructions were based on the human’s state of mind and what was already on the paper. He realized the contents of a two-dimensional sheet of paper could be just as easily represented as a one-dimensional sequence of symbols. So he then reduced the idea to a machine operating on an unbounded sequential paper tape with a read-write head initially pointing to the leftmost symbol of the input. At each step, the machine reads the current symbol, and depending on its current state, possibly rewrites it with a different symbol, moves one square to the left or right, and possibly enters a new state.

Turing’s Story

History is important in understanding just about anything. The story of what motivated Turing to work on the foundations of computing and how he arrived at these fascinating machines, and his brilliant insights into the nature of computation are told very well in the book Turing's Vision: The Birth of Computer Science

That book is well worth a read.

Let’s build a Turing Machine to multiply a binary number by four. How would we do this? Well, we just append two zeros to the end of the input! But how can we describe these steps more rigorously? What might the machine’s instructions look like? We might say:

Here is how our machine operates with the number five (binary 0101) initially on the tape (And oh, following convention, we use # for blank squares, just because it is easier to see!):

tmtimesfourexec.png

Nice! 5 × 4 is indeed 20 (binary 010100). Now let’s capture the the multiply-by-four machine in all its glorious generality:

When in StateLooking AtPerform these actionsAnd go to
State
Scanning0Move rightScanning
Scanning1Move rightScanning
Scanning#Print 0, Move rightDoubled
Doubled#Print 0Quadrupled

This says: starting in the Scanning state, while you see a 0 or 1, leave it alone and move right, but if you see a blank, replace with a 0, move right, and go to the Doubled state. In this state, if you see a blank, replace with a 0 and go to the Quadrupled state. There is no place to go from here, so we’re done.

We can give a pictorial representation of the TM program too:

tmtimesfour.png

CLASSWORK
Create a Turing Machine that floor-divides its input by four. This will be a little harder, as you will have to sometimes move left, and handle the case in which the input is very small and you will need to make sure an answer of 0 is properly written out.
Exercise: Give Turing machines that (a) increment a binary number by 1, (b) compute the two’s complement of a binary number, and (c) compute the value of a binary number modulo 2.

TMs for Decision Problems

If a Turing Machine is created to answer a yes-no question (a decision problem), such as “Is this number even?” or “Is this string a palindrome?” it theoretically could erase the whole tape and write a YES or NO symbol as soon as it figures out the answer. But there is a shortcut we almost always take: we can mark certain states as accept states. In such a TM, we don’t care what remains on the tape—we only care whether the state we end up in when there are no moves to make is an accept state or not.

Example Time! Let’s give a TM to determine whether a binary number is even:

tmIsEven.png

Hey, there is a connection to language theory here! The TM just given can both answer the question “Is this number even?” but it also recognizes the language $\{ w \in \{0,1\} \mid w \textrm{ is a even binary integer} \}$. That’s right, answering yes-no questions is isomorphic to determining membership in a language.

Exercise: Reflect on this. Woah, right?
Exercise: Give Turing machines that determine whether (a) a binary integer is odd, (b) a signed binary integer is negative, and (c) a string over $\{ a, b, c\}$ is a palindrome.

Universality

In principle, we can build Turing Machines for computing all kinds of functions, such as capitalizing text, finding words in text, computing interest payments, finding documents on the World Wide Web, etc. But functions themselves can be represented in symbols, whether in mathematical notation, plain text Python, or any formalism of your choice. In fact, the Turing machine we saw to multiply by four could be represented as:

  S00RS;S11RS;S#0RD;D#0NQ;

Given this, perhaps we can build a machine (call it $U$), that when you give it ANY function $f$ (as text), and some data $x$, then $U$ will produce $f(x)$? That is: $$U(f,x) = f(x)$$

As it turns out, for a certain class of functions, you can construct such a machine. You might take that for granted today, but it was not shown to be true until the 20th century! Really! (As far as anyone knows, that is.) This profound result is due to, you guessed it, Alan Turing, and it pretty much changed the world. This $U$ is called the Universal Turing Machine in his honor. Today we build variations of the original $U$. We call them computers.

Stop.

Think about this.

Woah.

Just woah.

This may the most profound idea anyone has ever had about computing.

Are you starting to see why Alan Turing is revered as the founder of computer science?

Jade explains it better

Here’s Jade Tan-Holmes explaining all about Turing Machines and Universality. She explains what motivated Turing to come up with the idea for formalizing computation, how he came up with the model he did, and the impact of universality. This video is so spot on, you need to watch it. Maybe watch it twice:

Other Kinds of Automata

Turing Machines are not the only model of what we think of as computation (though they might be the most famous). Other models that have been created include:

Some have equivalent “computability and recognition power” as Turing Machines, while others do not. What do we mean by “computability and recognition power”? That’s the domain of our next theory, computability theory.

Computability Theory

Now for our next philosophical question. Can every function be represented as a string of symbols that we can process with a machine? In other words, is there a program for every function?

It turns out that the answer to this is no.

Here’s why. Programs are just strings, so we can list them in increasing alphabetic order. Programs can be thought of as representing functions in $\mathbb{N} \rightarrow \mathbb{N}$ (since all data can be encoded in binary). Call the enumeration $f_0, f_1, f_2, \ldots$. But $\lambda n. f_n(n)+1$ isn’t in the list! Bottom line:

There exist functions that cannot be computed by a computing machine.

Hey, this is philosophically interesting.

Exercise: Discuss why the limits to computation might have something to do the possibility of finding a “Theory of Everything”. (Hint: See what Stephen Hawking has to say about this.)
CLASSWORK
That explanation of why uncomputable functions exist was given rather nonchalantly. Did it make sense? It might not have right away because it is an instance of a very powerful proof technique called diagonalization which was at first somewhat controversial. Carry out this proof by sketching the hypothetical list of all functions as a matrix: each row is a function and each column is the argument, and each cell is the result of applying the function (row) to the argument (column). Look at how $\lambda n. f_n(n)+1$ is generated. Do you see why the technique is called diagonalization? Do you see why it works? 🤯

So which functions can be computed? People have, over the years, created many models of computation—the Lambda Calculus, Post Production Systems, Phrase Structure Grammars, Partial Recursive Functions, Random Access Machines, and more. But get this: all have turned out to be equal in power, in terms of what they can compute, to the Turing machine! Even enhancing the basic Turing Machine design with non-determinism or even multiple tapes won’t increase its computing power. These facts suggest the class of computable functions is fundamental. The Church-Turing Thesis claims that what we intuitively understand as the “computable functions” are precisely those for which a Turing Machine can be given, i.e.,

A function $f$ is computable if and only if a there exists a Turing Machine that outputs $f(x)$ for any input $x$.

Exercise: Research why this is called a thesis and not a law or theorem.

Remember that function $U$ we saw before? The Universal Turing Machine? It is “universal” because it can simulate any Turing Machine. If the Church-Turing Thesis is true, it’s really universal: it can compute any computable function.

Exercise: Discuss why this is profound.

Computability Theory is concerned with figuring out which functions are and are not computable.

Turing Complete

Because Turing Machines capture that which is computable, any computational model that can simulate a Turing Machine is called Turing Complete. Being Turing-Complete is a badge of honor, as it means your model can compute anything that is computable.

Read more at Wikipedia.

Fun fact: One of the more interesting things to come out of computability theory is that there are some languages that Turing Machines can recognize, but not decide. (A TM recognizes a language $L$ if it enters an accept state and halts whenever given a string in $L$. It decides a language $L$ if it both accepts $L$ and for all strings not in $L$, the machine will always stop in a non-accept state.) So what does it mean that some languages are recognizable but not decidable? It means there are some languages—and therefore some decision problems— for which we can definitely compute a YES answer...but when the answer is NO we might...never...know.

🤯

How about an example? We’ll use JavaScript rather than TMs to simplify the explanation. Let’s write a function called halts that will determine if a function halts on a given input. So we would like:

  halts(Math.sqrt, 100)                ⟶ true
  halts(x => {while (x) {}}, true)     ⟶ false

The idea is simple. Just analyze the input and see if it will halt. But it turns out to be logically impossible to write this function. Why? Assume you can write it. Then consider:

function fox(f) {
  if halts(f, f) while (true) {}
}

Now consider the expression halts(fox, fox). Woah. We have an antinomy. The assumption you could write halts leads to a contradiction, so you can not write this function to work for all inputs. Interestingly, you can write:

function halts(f, x) {
  f(x);
  return true;
}

So if f(x) halts, you’ll eventually know it, but if it doesn’t, we may never know.

Complexity Theory

Computability theory gives us theoretical limits on what can and cannot be computed, but doesn’t tell us what is practically computable. (A problem requiring 2100 years to complete is not worth running.) The question “What resources are required to perform certain computations?” or in other words “How much time and memory is needed to solve this problem?” give rise to Complexity Theory.

Complexity Theory gives us results such as “This function requires ________________ to compute”:

It gets (massively) more interesting, because the theory deals with time and space performance on non-deterministic machines, and parallel machines, and interactive machines, and quantum computers, too. It even considers probabilistic algorithms. Interested? Check out the amazing Complexity Zoo.

Complexity Theory has philosophical as well as practical use. Make sure you are familiar with one of the most famous open questions in all of complexity theory, and perhaps even all of computer science, namely: Does $P = NP$? (Solve it and you will win a million dollars!)

Oh, wait! Here’s Jade again with another great explainer:

There is so, so, much information out there about the P vs. NP question as it really seems to captivate the public. Not unlike Gödel’s Incompleteness Theorems.

Exercise: Scott Aaronson wrote a blog post in 2006 (worth a read) in which he waxed poetic about a world in which P=NP. He later disavowed some of his more colorful claims, writing:

I, Scott Aaronson, hereby formally disown my statement about how “everyone who could appreciate a symphony would be Mozart” if P=NP.

In 2014, he wrote a new post on the scientific case for P≠NP. Read the later post. What do you think?

Working With These Theories

We’ve only briefly introduced the major concerns of four computer science theories without any real depth (or breadth). We’ll have a chance to see each theory in more detail later, covering aspects such as:

Theories and Creativity, Again

In much the same way that Bennett pointed out that music theory can expand your abilities in music composition, and Feynman spoke of theoretical knowledge in science only expanding and enhancing the feelings of awe and beauty when contemplating nature, might computer science theory enhance one’s imagination for creating new worlds that Booch talks about here?

Summary

We’ve covered:

  • What a theory is
  • Why study theory?
  • Theories within computer science
  • What Language Theory is
  • What Automata Theory is
  • What Computability Theory is
  • What Complexity Theory is
  • Where we can go from here