Computability Theory is concerned with understanding the limits of what can be computed.
But just because something can be computed doesn’t mean it can be computed efficiently.
Complexity Theory is concerned with classifying computational problems according to the amount of resources, such as time and space, required to solve them.
To study complexity formally, we need a simple model of computation and a way to measure the resources it uses. Traditionally, the models used are Turing Machines, Boolean Circuits, and Interactive Proof Systems. Some models have variations such as being parallel, probabilistic, or quantum.
On a TM, we measure time by the number of steps (state transitions) and we measure space by the number of tape cells used, not including the input.
Time is measured in state transitions (“steps”)
Space is measured in tape cells used not including the input
Other models will have slightly different notions of steps and cells, but these should be pretty obvious. They just can’t be unrealistically “large”—so, for example, register machines are not allowed to have registers of unbounded size.
There are two crucial fundamental ideas that underpin complexity theory:
For a given machine, we define $T(n)$ as the maximum number of steps taken for an input of size $n$, and $S(n)$ as the maximum number of (non-input) cells used for an input of size $n$.
Complexity measures are model-dependent. Let’s take a look at some models.
Here’s the (single-tape) increment Turing Machine again:

How many steps and extra memory cells does it take for different input sizes?
| Input | Input Size | Time (steps) | Extra Space (tape cells) |
|---|---|---|---|
| 0 | 1 | 1+1+0+1=3 | 1 |
| 1 | 1 | 1+1+1+1=4 | 1 |
| 00 | 2 | 2+1+0+1=4 | 1 |
| 01 | 2 | 2+1+1+1=5 | 1 |
| 10 | 2 | 2+1+0+1=4 | 1 |
| 11 | 2 | 2+1+2+1=6 | 1 |
| 000 | 3 | 3+1+0+1=5 | 1 |
| 001 | 3 | 3+1+1+1=6 | 1 |
| 010 | 3 | 3+1+0+1=5 | 1 |
| 011 | 3 | 3+1+0+3=7 | 1 |
| 100 | 3 | 3+1+0+1=5 | 1 |
| 101 | 3 | 3+1+1+1=6 | 1 |
| 110 | 3 | 3+1+0+1=5 | 1 |
| 111 | 3 | 3+1+3+1=8 | 1 |
| 0000 | 4 | 4+1+0+1=6 | 1 |
| 0001 | 4 | 4+1+1+1=7 | 1 |
| ... | ... | ... | ... |
| 1111 | 4 | 4+1+4+1=10 | 1 |
| ... | ... | ... | ... |
The most useful metric is the maximum number (worst-case) of steps or memory cells used for any input of a given size. In this case, for an input of size $n$, the most amount of work would be for the all-ones case:
We write $T(n) = 2n + 2$.
The best-case number of steps would be for the all-zeros case, namely $T(n) = n + 2$.
For this machine, space complexity $S(n) = 1$ since we need to store the blank symbol at the end of the input.
Here’s the single-tape Turing Machine for reversing a string (from the alphabet $\{a,b,c\}$):

Let’s count steps for the input $abbacb$
| ⋅⋅abbacb⋅⋅⋅⋅⋅⋅⋅ | Initial configuration |
| ⋅⋅ABBACB⋅⋅⋅⋅⋅⋅⋅ | 6 steps to mark each symbol while moving right, 1 to go back to last symbol |
| ⋅⋅ABBACXb⋅⋅⋅⋅⋅⋅ | 0 to find the B, 1 to X it, 0 to find where to put the b, 1 to write it |
| ⋅⋅ABBAXXbc⋅⋅⋅⋅⋅ | 1 to find the C, 1 to X it, 2 to find where to put the c, 1 to write it |
| ⋅⋅ABBXXXbca⋅⋅⋅⋅ | 3 to find the A, 1 to X it, 4 to find where to put the a, 1 to write it |
| ⋅⋅ABXXXXbcab⋅⋅⋅ | 5 to find the B, 1 to X it, 6 to find where to put the b, 1 to write it |
| ⋅⋅AXXXXXbcabb⋅⋅ | 7 to find the B, 1 to X it, 8 to find where to put the b, 1 to write it |
| ⋅⋅XXXXXXbcabba⋅ | 9 to find the A, 1 to X it, 10 to find where to put the a, 1 to write it |
| ⋅⋅⋅⋅⋅⋅⋅⋅bcabba⋅ | 11 to find the blank, 1 to move to the first X, 6 erases |
This was a special case with $n=6$, but we can generalize to arbitrary $n$:
$$ (n+1) + \sum_{k=0}^{n-1}(4k+1) + 3n $$
which gives us $T(n) = 2n^2 + 3n + 1$. It’s quadratic.
Now if we have a two-tape machine, reversal becomes much faster:

That’s $n$ steps to find the end, and $n$ steps to copy it to the second tape. $T(n) = 2n$.
Hey that went from quadratic to linear! The multitape model is often more realistic (think about real computers), so we can kind of assume it, though we really should always be explicit about the model we are using.
But is that cheating?Not really. You can always be clear about your model.
But can we be loose with our models? Well, not too loose: each model needs to be finite (in terms of states or instructions), operate on finite inputs, and be made up of instructions that each take a finite amount of time. But if we do stick to reasonable models, a problem whose time complexity is a polynomial on one model will be polynomial on any other.
Being polynomially related turns out to matter. We’ll see why later.
In our case, linear and quadratic are both polynomial. We’re good.
Random access models help us with problems that don’t always have to read the entire input, such as binary search.
We can use Random Access Turing Machines. or even register machines, but—and this is a really important but—we still have to count input in terms of bits, not in terms of unbounded-size registers. The number of bits is, after all, the true size of the input.
However, if the registers are bounded (e.g. 64 bits, 128 notes, etc.), then you can count registers rather than bits, because complexity measures will be within a constant factor, which in complexity theory, is just fine. It even lets you count bounded multiplication as a single instruction, which, as you might know, is technically quadratic time when the size of numbers are unbounded.
int and BigInteger.
Remember that a nondeterministic Turing Machine accepts an input (answers YES) if there simply exists a computation path that leads to acceptance. How do we measure complexity on such a machine? We take the maximum number of steps or tape cells used on any accepting path. Irl, physical machines tend to be deterministic, and if so, will simulate nondeterministic machines via backtracking, so sure, we’d expect the time and space measurements to differ greatly between deterministic and nondeterministic machines. And they do.
Visualizing NondeterminismYou can think of a nondeterministic machine as exploring all of its computation paths simultaneously, accepting if any path leads to a solution.
Real machines don’t work that way, but these models are theoretically very interesting: if you had a candidate solution, the nondeterministic machine would find it instantly. So nondeterministic machines model verification, while deterministic machines model the figuring out of a solution.
In a probabilistic machine, certain states are “coin flip” states, where the machine can transition to multiple possible next states with certain probabilities. The machine accepts an input if the probability of reaching an accepting state is above a certain threshold. Common thresholds are 1/2 or 2/3.
A parallel machine contains multiple sub-machines that run in parallel. At each “step,” each sub-machine performs a step of its own computation independently. Each sub-machine may or may not be running the same control program.
Start at this Wikipedia article to explore issues in analyzing parallel algorithms.
Rather than a Turing Machine, we can model a computation using Boolean circuits, that is, a directed acyclic graph where each node represents a gate (AND, OR, NOT). Complexity is based on the size of the circuit (number of gates), its depth (length of the longest path from an input to an output), and the fan-in (number of inputs to a gate).
Here’s an example of a circuit to add two 4-bit numbers, with size 17, depth 7, and fan-in 3:

Yes, this model is naturally parallel.
In practice, we are actually interested in families of Boolean circuits, where each input size has its own circuit.
Start your study at Wikipedia.
See Wikipedia’s articles on Quantum Turing Machines and Quantum circuits.
Another Turing Machine alternative is the interactive proof system, an abstract machine in which two parties, a prover and a verifier, exchange messages to decide a language. Start at Wikipedia.
Complexity measures can be sensitive to encoding.
To determine whether a signed binary number is negative, we need only one step to check the first bit, regardless of the input size: $T(n) = 1$. But to determine whether the number is even or odd, we have to scan the whole input to get to the end: $T(n) = n$.
But if we stored the number with the least significant bit first, then these complexities would be reversed. This is okay, since constant time and linear time are polynomially related.
What about encoding numbers in decimal rather than binary? This only speeds up computations by a factor of about $\log_2 10 \approx 3.32$. Constant factor speedups are not significant when studying inherent complexities, as you can imagine simply getting hardware which is $N$ times faster than your current machine. And again, we’re not measuring “absolute running time” but rather asymptotic behavior (growth rates as a function of input size).
Not all encodings are appropriateOne encoding that is not appropriate is unary. Unary encodings make the input exponentially larger than binary encodings, and make complexities look much smaller than they really are. The measures that differ between unary and binary encodings are not polynomially related.
Speaking of asymptotic behavior, it’s time to look at this topic formally.
Measuring complexity in terms of the number of steps or memory cells as a function of input size is perfect for characterizing behaviors of computations. Such measurements are explicitly not concerned with absolute running times, because these vary with hardware, so it makes no sense to look at an abstract algorithm and ask for its exact running time. Even so, getting the exact time and space functions can be terribly tedious. The good news is that it is actually not necessary to get this precise! After all, what matters is the functions’ growth rates.
In fact, for complexity functions, we typically ignore:
So for example:
The justification for doing this is that:
How can we formalize this? That is, how do we formalize things like “the set of all quadratic functions” or the “set of all linear functions” or “the set of all logarithmic functions”? Paul Bachmann introduced a notation in 1894 that is still with us today.
Definition Time! A function $f$ is in $O(g)$ whenever there exist constants $c$ and $N$ such that for every $n \ge N$, $f(n)$ is bounded above by a constant times $g(n)$. Formally:
$$ O(g) =_{\textrm{def}} \{ f \mid \exists c\!>\!0.\;\exists N\!\ge\!0.\;\forall n\!\ge\!N.\;0 \leq f(n) \leq c\cdot g(n) \} $$
This means that $O(g)$ is the set of all functions for which $g$ is an asymptotic upper bound of $f$.
Examples:
By convention, people usually drop the lambdas when writing expressions involving Big-O, so the above examples will actually appear in practice as:
So in practice, we just throw away all the leading term then throw away its constant multiplier, but why is this justified? We’ll just provide an argument for a concrete case. (You can do the formal proof on your own.) Let’s show that $7n^4 + 2n^2 + n + 20 \in O(n^4)$. We have to choose $c$ and $N$. Let’s choose $c=30$ and $N=1$. So since $n \geq 1$:
$$ \begin{eqnarray} 7n^4 + 2n^2 + n + 20 & \leq & 7n^4 + 2n^4 + n^4 + 20n^4 \\ & \leq & (7 + 2 + 1 + 20) n^4 \\ & \leq & 30n^4 \end{eqnarray} $$Too much technical detail? Want to review what this is all about? Here’s a video:
The Big-O Notation feels like we’re being sloppy and taking shortcuts. But it’s goal is to broadly classify certain algorithms, which has its use. Just don’t overuse though! Keep in mind that:
There is a Big-$\Omega$ notation for lower bounds.
A function $f$ is in $\Omega(g)$ whenever there exist constants $c$ and $N$ such that for every $n \ge N$, $f(n)$ is bounded below by a constant times $g(n)$.
$$ \Omega(g) =_{\textrm{def}} \{ f \mid \exists c\!>\!0.\;\exists N\!\ge\!0.\;\forall n\!\ge\!N.\;0 \leq c \cdot g(n) \leq f(n) \} $$Lower bounds are useful because they say that an algorithm requires at least so much time. For example, you can say that printing an array is $O(2^n)$ because $2^n$ really is an upper bound, it’s just not a very tight one! But saying printing an array is $\Omega(n)$ means it requires at least linear time which is more accurate.
When the lower and upper bounds are the same, we can use Big-Theta notation.
$$ \Theta(g) =_{\textrm{def}} \{ f \mid \exists c_1\!>\!0.\;\exists c_2\!>\!0.\;\exists N\!\ge\!0.\;\forall n\!\ge\!N.\;0 \leq c_1 \cdot g(n) \leq f(n) \leq c_2 \cdot g(n) \} $$Example: printing each element of an array is $\Theta(n)$. Now that’s useful!
Most complexity functions that arise in practice are in Big-Theta of something. An example of a function that isn’t in Big-$\Theta$ of anything is $\lambda n . n^2 \cos n$.
There are also little-oh and little-omega.
$$\omicron(g) =_{def} \{ f \mid \lim_{x\to\infty} \frac{f(x)}{g(x)} = 0 \}$$ $$\omega(g) =_{def} \{ f \mid \lim_{x\to\infty} \frac{f(x)}{g(x)} = \infty \}$$There’s also the Soft-O:
$$\tilde{O}(g) =_{def} O(\lambda n.\,g(n) (\log{g(n)})^k)$$Since a log of something to a power is bounded above by that something to a power, we have
$$\tilde{O}(g) \subseteq O(\lambda n.\,g(n)^{1+\varepsilon})$$For problems that appear in practice, the following are interesting and very general ways to classify.
| Kind | $O(\lambda\,n. \ldots)$ | Description |
|---|---|---|
| Constant | $1$ | Fixed number of steps or memory cells, regardless of input size |
| Logarithmic | $\log n$ | Number of steps or memory cells proportional to log of input size |
| Polynomial | $n^k$, $k \geq 1$ | Number of steps or memory cells a polynomial of input size |
| Exponential | $c^n$, $c > 1$ | Number of steps or memory cells exponential of input size |
We can introduce more! Expressed as times, we have:
| Kind | $O(\lambda\,n. \ldots)$ | Examples |
|---|---|---|
| Constant | $1$ | Retrieving the first element in a list |
| Inverse Ackermann | $\alpha(n)$ | |
| Iterated Log | $\log^{*} n$ | |
| Loglogarithmic | $\log \log n$ | |
| Logarithmic | $\log n$ | Binary Search |
| Polylogarithmic | $(\log n)^c$, $c>1$ | |
| Fractional Power | $n^c$, $0<c<1$ | |
| Linear | $n$ | Max or min of an unsorted array |
| N-log-star-n | $n \log^{*} n$ | Seidel’s polygon triangulation algorithm |
| Linearithmic | $n \log n$ | Mergesort, Fast Fourier Transform |
| Quadratic | $n^2$ | Insertion Sort |
| Cubic | $n^3$ | |
| Polynomial | $n^c$, $c\geq 1$ | |
| Quasipolynomial | $n^{\log n}$ | |
| Subexponential | $2^{n^\varepsilon}$, $0<\varepsilon<1$ | |
| Exponential | $c^n$, $c>1$ | Brute-force Matrix Chain Multiplication |
| Factorial | $n!$ | Brute-force Traveling Salesperson |
| N-to-the-n | $n^n$ | |
| Double Exponential | $2^{2^{cn}}$ | Deciding Presburger Arithmetic formulas |
| Ackermann | $A(n)$ | |
| Unbounded Time | $\infty$ | Busy Beaver |

For fun, let’s connect these functions to clock time. If we had a machine that could do one million operations per second, here’s how long it would take to carry out computations of size $n$ for different time complexity functions.
| $T(n)$ | 10 | 20 | 50 | 100 | 1000 | 1000000 |
|---|---|---|---|---|---|---|
| 1 | 1 μs | 1 μs | 1 μs | 1 μs | 1 μs | 1 μs |
| $\log n$ | 3.322 μs | 4.322 μs | 5.644 μs | 6.644 μs | 9.966 μs | 19.93 μs |
| $n$ | 10 μs | 20 μs | 50 μs | 100 μs | 1 ms | 1 second |
| $n \log n$ | 33.22 μs | 86.44 μs | 282.2 μs | 664.4 μs | 9.966 ms | 19.93 seconds |
| $n^2$ | 100 μs | 400 μs | 2.5 ms | 10 ms | 1 second | 11.57 days |
| $1000 n^2$ | 100 ms | 400 ms | 2.5 seconds | 10 seconds | 16.67 minutes | 31.688 years |
| $n^3$ | 1 ms | 8 ms | 125 ms | 1 second | 16.67 minutes | 31,688 years |
| $n^{\log n}$ | 2.099 ms | 419.7 ms | 1.078 hours | 224.5 days | 2.502×1016 years | 1.231×10106 years |
| $1.01^n$ | 1.105 μs | 1.220 μs | 1.645 μs | 2.705 μs | 20.96 ms | 7.493×104307 years |
| $0.000001 \times 2^n$ | 1.024 ns | 1.049 μs | 18.76 minutes | 4.017×1010 years | 3.395×10281 years | 3.137×10301010 years |
| $2^n$ | 1.024 ms | 1.049 seconds | 35.68 years | 4.017×1016 years | 3.395×10287 years | 3.137×10301016 years |
| $n!$ | 3.629 seconds | 77,093 years | 9.638×1050 years | 2.957×10144 years | 1.275×102554 years | 2.619×105565695 years |
| $n^n$ | 2.778 hours | 3.323×1012 years | 2.814×1071 years | 3.169×10186 years | 3.169×102986 years | 3.169×105999986 years |
| $2^{2^n}$ | 5.697×10294 years | 2.136×10315639 years | 103.389×1014 years | 103.382×1029 years | 103.323×10300 years | 102.980×10301029 years |
Exponents are hard to wrap your head around. The earth formed about $4.5 \times 10^9$ years ago and the big bang occurred about $13.8 \times 10^9$ years ago (so say our best estimates to date). So you’re not going to see $2^{100}$ operations performed on a single computer in your lifetime. Even if it could do a trillion operations per second, you’d still need 2.9 universe lifetimes. And 100 is a pretty small $n$. 🤯
If we use a log-scale on the vertical axis, we can start to see something important:

Anything polynomial, even with a high degree, starts to look flatter (grows slower) as $n$ gets really large. But exponentials, even with smallish bases (but strictly greater than 1), will eventually blow away every polynomial. Every. Single. One.
More on these big numbersMaybe you are not too impressed with exponents in the thousands? Consider this: the volume of the observable universe, as far as we can tell, is about $3.57 \times 10^{80}$ cubic meters, which means the approximate number of Planck volumes it contains is around $10^{186}$. That’s nothing compared to some of the numbers in this chart.
The divide between the polynomial and the exponential is so noticeable that it just seems to be something fundamental to the universe. Polynomial time algorithms are called easy or tractable, exponential and beyond are hard or intractable. Because this is such a noticeable division, we (in theory-land) allow ourselves flexibility in computational models, as long as they don’t move complexities between the polynomial and the exponential.
We can vary models as long as they are polynomially related.
We can often simplify the kinds of statements we make about complexity by recasting question of “how many steps (or how much memory) it takes to compute an answer” into “how many steps (or how much memory) does it take to decide membership in a language?” This means we can group languages into classes, much like we did in computability theory. For example, the problem of finding the shortest path between two vertices in a graph, phrased as an optimization problem is:
$$\textrm{Given graph $G$ and vertices $s$ and $t$, what is the length of the shortest path in $G$ from $s$ to $t$?}$$This can be recast as a decision problem:
$$\textrm{Is there a path in $G$ from $s$ to $t$ of length at most $k$?}$$
Then as a language acceptance problem:
$$\{ \langle G, s, t, k \rangle \mid \textrm{$\exists$ a path in $G$ from $s$ to $t$ of length at most $k$} \}$$The for major generic language classes are, where $f$ is a proper complexity function of type $\mathbb{N} \to \mathbb{N}$:
| Class | Description |
|---|---|
| $\textsf{DTIME}(f)$ | Decidable by a deterministic machine in time $O(f)$ |
| $\textsf{DSPACE}(f)$ | Decidable by a deterministic machine in space $O(f)$ |
| $\textsf{NTIME}(f)$ | Decidable by a non-deterministic machine in time $O(f)$ |
| $\textsf{NSPACE}(f)$ | Decidable by a non-deterministic machine in space $O(f)$ |
Now let’s get more specific. There are hundreds of complexity classes, but a handful are pretty important. Here are the most basic ones to know:
| Class | Description |
|---|---|
| $\textsf{P}$ | Languages decidable by a deterministic machine in polynomial time. Could also be called $\textsf{PTIME}$ but no one says that. Formally, it is $$\textsf{P} = \bigcup_{k \in \mathbb{N}} \textsf{DTIME}(n^k)$$ |
| $\textsf{NP}$ | Languages decidable by a non-deterministic machine in polynomial time. Formally, $$\textsf{NP} = \bigcup_{k \in \mathbb{N}} \textsf{NTIME}(n^k)$$ Although defined in terms of nondeterministic machines, a corollary is a more useful way to think of this class: $\textsf{NP}$ is the set of all decision problems for which a YES answer can be verified in polynomial time by a deterministic machine |
| $\textsf{PSPACE}$ | Deterministic polynomial space |
| $\textsf{NPSPACE}$ | Nondeterministic polynomial space |
| $\textsf{EXP}$ | Deterministic exponential time (sometimes called $\textsf{EXPTIME}$). Formally, $$\textsf{EXP}=\bigcup_{k \in \mathbb{N}} \textsf{DTIME}(2^{n^k})$$ |
| $\textsf{NEXP}$ | Nondeterministic exponential time (sometimes called $\textsf{NEXPTIME}$) |
| $\textsf{EXPSPACE}$ | Deterministic exponential space |
| $\textsf{NEXPSPACE}$ | Nondeterministic exponential space |
| $\textsf{L}$ | Deterministic logarithmic space (can be called $\textsf{LOGSPACE}$). Formal definition is simply $$\textsf{L} = \textsf{DSPACE}(\log n)$$ |
| $\textsf{NL}$ | Nondeterministic logarithmic space (can be called $\textsf{NLOGSPACE}$) |
Researchers have come up with various theorems, such as the time hierarchy theorem and the space hierarchy theorem, which lead us to some set containment hierarchies. We know some things:
But get this: we actually don’t know which of these containments are proper!
But we do know some of the big steps!
Those 10 complexity classes are the ones you absolutely need to know. Next, learn the ones in the following diagram.

Here’s a run-down on the classes in this diagram, with some of the descriptions horribly oversimplified.
| Class | Description |
|---|---|
| $\textsf{EMPTY}$ | The empty class (contains no languages) |
| $\textsf{REG}$ | Regular languages (decided by FAs) |
| $\textsf{DCFL}$ | Deterministic context-free languages (decided by DPDAs) |
| $\textsf{CFL}$ | Context-free languages (decided by PDAs) |
| $\textsf{CSL}$ | Context-sensitive languages (decided by LBAs) |
| $\textsf{R}$ | Recursive languages (decided by Turing machines that always halt) |
| $\textsf{RE}$ | Recursively enumerable languages (recognized by Turing machines) |
| $\textsf{ALL}$ | All languages |
| $\textsf{LIN}$ | Linear Time |
| $\textsf{NLIN}$ | Nondeterministic Linear Time |
| $\textsf{P}$ | Polynomial Time |
| $\textsf{NP}$ | Nondeterministic Polynomial Time |
| $\textsf{QP}$ | Quasipolynomial Time |
| $\textsf{EXP}$ | Exponential Time |
| $\textsf{NEXP}$ | Nondeterministic Exponential Time |
| $\textsf{L}$ | Logarithmic Space |
| $\textsf{NL}$ | Nondeterministic Logarithmic Space |
| $\textsf{QPSPACE}$ | Quasipolynomial Space |
| $\textsf{PSPACE}$ | Polynomial Space (determinism does not matter) |
| $\textsf{EXPSPACE}$ | Exponential Space (determinism does not matter) |
| $\textsf{AC}^0$ | Solvable by constant-depth, polynomial-size circuits with unbounded fan-in |
| $\textsf{NC}$ | Solvable by uniformly-generated with polynomially-size circuits with polylogarithmically-bounded depth and fan-in 2 |
| $\textsf{P/poly}$ | Solvable by nonuniform polynomial-size circuits |
| $\textsf{SZK}$ | Verifiable by a statistical zero-knowledge proof protocol |
| $\textsf{MA}$ | Verifiable by Merlin-Arthur protocol |
| $\textsf{AM}$ | Verifiable by Arthur-Merlin protocol |
| $\textsf{QMA}$ | Verifiable by Quantum Merlin-Arthur protocol |
| $\textsf{BPP}$ | Bounded-error Probabilistic Polynomial: Solvable by $\textsf{NP}$ machines that for YES instances, at least 2/3 of the computation paths accept and for NO instances, at least 2/3 reject |
| $\textsf{PP}$ | Probabilistic Polynomial: Solvable by $\textsf{NP}$ machines that for YES instances, at least 1/2 of the computation paths accept and for NO instances, at least 1/2 reject |
| $\textsf{RP}$ | Randomized Polynomial: Solvable by $\textsf{NP}$ machines that for YES instances, at least 1/2 of the computation paths accept and for NO instances, all reject |
| $\textsf{ZPP}$ | Zero-error Probabilistic Polynomial: Solvable by probabilistic TMs in expected polynomial time with zero error. Formally defined as $\textsf{RP} \cap \textsf{co-RP}$ |
| $\textsf{BQP}$ | Bounded-error Quantum Polynomial: Solvable by a quantum TM in polynomial time with at most 1/3 probability of error |
We have barely begun.
But that’s okay—these notes are meant only to be an introduction.
The Complexity Zoo is an extensive online resource cataloging complexity classes. As of December 2025, there are 551 language classes documented.
Go take a tour. Begin with the Petting Zoo.
By all means do not miss Satoshi Hada's Complexity Zoo Browser.
Gabe Robins has done a nice little annotation on the big diagram from the Zoo. It’s very near the bottom of his notes on resource bounded computation. He’s highlighted the major complexity classes as well as the ones we saw when studying automata theory. Worth a look!
There’s a super cool interactive active inclusion diagram. Check it out!
If at any time you just want to scan a brief list of classes, without diving into details at all, getting some big ideas and nothing more, Wikipedia has just such a list.
Remember that the harder a problem is, the more resources it requires to solve. In any class of problems, it makes sense to talk about the “hardest” problems in the class. We can define this notion formally!
A problem is hard for a class $\mathscr{C}$, denoted $\mathscr{C}$-hard, if every problem in $\mathscr{C}$ can be reduced to it. For example, a problem is $\textsf{NP}$-hard if every problem in $\textsf{NP}$ can be reduced to it.
A problem is complete for a class $\mathscr{C}$, denoted $\mathscr{C}$-complete, if it is hard for the class and also in the class. For example, a problem is $\textsf{NP}$-complete if it is $\textsf{NP}$-hard and also in $\textsf{NP}$.
The reduction used has to be in a class that is considered efficient for the context, typically polynomial time for anything in $\textsf{NP}$ or above. In general, the reduction has to be in a strictly smaller class than the class being considered.
We have to go deeper into these two classes. There is so, so much to say. Many nontechnical people have heard of the P vs NP Question. Here are some facts:
An example of an $\textsf{NP}$-hard problem is Sudoku: easy to verify but hard to solve.
There are some good videos out there to give you a sense of this problem. This is a well-known video which is kind of a classic, and you should watch it first:
A newer one that introduces the idea that the P vs. NP question is really about algorithm inversion:
Jade’s P vs. NP video was seen previously in the notes on Theories of Computation. She has a sequel focused on NP-completeness:
The readings and videos above will introduce you to the history of NP-completeness and provide examples of NP-complete problems. The list of known and useful NP-complete problems is stunningly large. It is fascinating that they are all in some sense all versions of the same problem. Here’s a tiny fraction of that list:
Here’s something interesting. A lot of NP Problems have special cases on their inputs for which a solution is known to ne in $\textsf{P}$.
| NP-Complete or NP-Hard Problem | Restrictions Known to Be in P |
|---|---|
| SAT | All clauses have at most 2 literals, or all clauses are Horn clauses |
| Graph Coloring | 2-Coloring |
| Hamilton Cycle | Dense Graphs |
| Vertex Cover | Bipartite Graphs, Trees |
| Traveling Salesperson | Trees, Graphs with max degree 2 |
| Clique | Planar graphs, chordal graphs |
Sometimes there is a mirror problem that is in P.
| NP-Complete or NP-Hard Problem | Variants Known to Be in P |
|---|---|
| Hamilton Cycle | Euler Tour |
| Vertex Cover | Edge Cover |
| Independent Set | Maximum Matching |
| Integer Linear Programming | Linear Programming |
| Longest Path | Shortest Path |
| Max Cut | Min Cut |
| Min Flow | Max Flow |
Why is this interesting?The difference between problems known to be in $\textsf{P}$ and those in $\textsf{NP}$ but not known to be in $\textsf{P}$ is sometimes very subtle. You should know how to tell the difference. You may have to design an algorithm for something. Knowing whether it is known-$\textsf{P}$ or not might matter a lot! So it’s a good idea to study a number of existing problems and know why they are in $\textsf{P}$, or prove them to be $\textsf{NP}$-complete. Or maybe show that they are neither.
Do you want that million-dollar prize? Here’s how you can get it:
But many thousands of brilliant people have tried to solve this problem. We still don’t have an answer. But we do know that answering this question won’t be easy. Proving this either way is a massive challenge! For example, to prove $\textsf{P} \neq \textsf{NP}$, the usual mathematical techniques just don’t work. There are “barriers” that show these techniques are impossible: the relativization barrier, the natural proofs barrier, and the algebrization barrier.
Some people think we will have to use something like Geometric Complexity Theory (which very few people understand), or to invent entirely new branches of mathematics to make progress on this problem.
Don’t let that stop you from trying. But be careful:
In 1976 when I was a graduate student in Germany, I bumped into a new professor who had been a particularly bright student .... He had been given a full professorship ... pretty much right after receiving his Doctorate. He told me straight out that he was going to solve the P=NP problem. He exuded confidence in his abilities and fairly gloated over the fact that he had a leg up on his American counterparts, because, as a German professor, he could pretty much dictate his own schedule. That is, he could go into his room, lock his door, and focus on The Problem.
Many years later I asked about this person, and I got answers indicating that no one really knew anything about him. He was still a professor as far as anyone could tell, but he had turned into a recluse and a virtual hermit, which seemed to baffle everyone.
— Rocky Ross
Although nearly 90% of experts believe $\textsf{P} \neq \textsf{NP}$, we don’t really know for sure, and it’s fun to entertain the possibility that $\textsf{P} = \textsf{NP}$. If the classes are equal, then one or two things are the case:
Since we don’t know whether $\textsf{P} = \textsf{NP}$, we just don’t have any general efficient solutions to $\textsf{NP}$-complete problems. We have some options:
We’d use the same strategies after $\textsf{P} \neq \textsf{NP}$ is proven, or after $\textsf{P} = \textsf{NP}$ is proven but with a galactic lower bound.
There is no way to list all of the P vs NP resources out there, but here are a few notable ones:
Here’s a closing observation that shows why complexity theory is fun.
Some of the most interesting open problems—that is, problems for which the answer is “we don’t know”—in computer science are related to complexity theory.
See Wikipedia’s List of Unsolved Problems in Computer Science.
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
We’ve covered: