The field we now call automata theory arose from many different areas of research, including:
These folks all came up with the notion of an automaton as a kind of machine, real or abstract, that follows a fixed set of instructions to carry out a step-by-step procedure.
A brief history of the field can be found at Wikipedia.
We want a rigorous theory of automata. Therefore, we will make the following assumptions about the automata that we study:
The first assumption is reasonable because a machine that requires an infinite description could not be built, let alone understood. The second is motivated by the fact that it’s pretty difficult (impossible?) to conceive of any kind of information that we could not represent with strings of symbols. For example, the following capture quite a variety of information:
91772.3e-8
Stay behind the yellow line!
ᐊᐃᖓᐃ ᓄᓇᕐᔪᐊᖅ
/Pecs/Los Angeles/Berlin/Madrid//1 2/3 0/2 2/2 0/3 2///
(* (+ 4 3) (-9 7))
int average(int x, int y) {return (x + y) / 2;}
<dog id="30024"><name>Lisichka</name><age>13</age></dog>
{"type": "dog", "id": "30024", "name": "Lisichka", "age": 13}
∀α β. α⊥β ⇔ (α•β = 0)
1. f3 e5 2. g4 ♛h4++
Why create and study a theory of automata? Two reasons:
An automaton is any device, physical or abstract, that runs by itself (hence the prefix “auto”) to process some information. There should be no limits to your imagination regarding how an automaton should or should not look, nor what kinds of internal operations it carries out (as long as it is finitely describable).
People have created many kinds of automata. Some strive to be as minimal as possible (theoretically valuable, but hard to work with in practice); some are much more complex internally (but easier for humans to work with). In terms of technical details, there are at least four dimensions on which we can characterize automata:
There are more dimensions: synchronous vs. asynchronous, deterministic vs. non-deterministic, definite vs. probabilistic, classical vs. quantum, etc. But as these notes are introductory, we will focus only on the first four dimensions.
Our first dimension is one of convenience only.
If a recognizer machine $M$ says YES to input $w$, we say that $M$ accepts $w$.
A recognizer that always finishes and answers YES or NO (and never loops) is called a decider.
There are variations on how the automata are structured, particularly how their control unit, or algorithm, is specified. The main options are with a (1) a state machine, (2) a set of instructions, or (3) a set of rewrite rules.
If you are in this state | and this event occurs | Then perform this action | and move to state |
---|---|---|---|
If there can be more than one action/transition to take from a given state/input configuration, the machine is said to be non-deterministic, otherwise it is deterministic.
┌───────────────────┐ │ Instruction 0 │ ├───────────────────┤ │ Instruction 1 │ ├───────────────────┤ │ Instruction 2 │ ├───────────────────┤ │ . │ │ . │ │ . │ ├───────────────────┤ │ Instruction n-1 │ └───────────────────┘
If you see this substring | Rewrite with this substring |
---|---|
Are these really all the same in some sense? Are these distinctions just for human convenience?
There’s a lot of variety in memory, but two primary kinds are:
─┬───┬───┬───┬───┬───┬───┬───┬───┬───┬─ • • • │ │ | | | | | | | | • • • ─┴───┴───┴───┴───┴───┴───┴───┴───┴───┴─
┌────────────────────┐ r0 │ │ ├────────────────────┤ r1 │ │ ├────────────────────┤ r2 │ │ ├────────────────────┤ r3 │ │ ├────────────────────┤ r4 │ │ ├────────────────────┤
What else? You can have fun, too, with things like multi-dimensional tapes. (A two-dimensional tape would allow for up, down, left, and right movements.) Use your imagination to build new computing machines.
One final variation of interest, applicable to instruction-based automata only, really, is: where are the instructions stored? There are two main architectures:
Turing Machines are so historically significant and influential, that I have an entire separate page of notes on them.
But the basic idea is that the input sits on a infinite tape of cells surrounded by blanks. The machine has a head pointing to the first symbol of the input. The machine’s control is a finite set of rules that says “If in state $q$ looking at symbol $\sigma$, replace the current symbol with $\sigma'$, move either one cell right or left, and transition to state $q'$.”
Watch Jade’s video again:
Again, we’ll cover Turing Machines in detail later, but for now here’s an example Turing Machine (of the transducer variety) to increment a binary number:
Roleplay, with paper and pencil, the human computer of Turing’s day sitting at a desk carrying out the task of incrementing the binary number 11010011111. Put yourself in each “mental state” expressed in the above machine.
And here is one (of the recognizer) variety to determine if a given unsigned binary number is divisible by 3:
Later, we’ll talk about how we came up with these beauties. Not too bad for us, right? But how the heck did Turing just figure all this stuff out to begin with? Pretty amazing, right? Yes, it is amazing even though his work was massively influenced by previous work of Gödel and Church and others, which Turing was happy to acknowledge.
Wait, only one tape?
Can a Turing Machine have multiple tapes? Sure. Turing was going for simplicity (if not outright minimalism) when we introduced this machine as a model for computation. You can show that any TM with multiple tapes can be simulated with a 1-tape TM, so various other tape-based models of computation can generally be considered kinds of Turing Machines. Some of these we’ll see right now.
An LBA is a Turing Machine whose tape is not infinite but rather contains only its input with one cell to the left containing an non-replaceable “left-end-marker symbol” and one cell to the right containing a non-replaceable “right-end-marker” symbol.
It feels a little more realistic, closer to a real computer, because of it having a finite “memory,” right?
A PDA is an automata with a finite-state control, with a read-only input, and separate stack. At each step you read the next input symbol, and depending on the current top of the stack and the current state, you pop from the stack, push something new, and move to a new state.
A Type-3 Automaton, often called just a Finite Automaton or FA, is a Turing Machine whose input is read-only and that only moves to the right on each step. For transducing FAs, the output is written to a separate output tape that is write-only, initially blank, and only appended to.
Do not let this terrible name confuse youIt is unfortunate that these machines are called finite automata, because Turing Machines have a finite number of states, as do LBAs and PDAs. Maybe it is because of finite memory? No LBAs have finite memory. No one has a good name for this class of automata. Everyone calls them FAs.
Perhaps if you never heard of LBAs the name might make sense.
By the way, you may sometimes hear them called FSMs for “finite state machines” but that is much, much worse since all reasonable automata have a finite number of states and the acronym is easily confused with the deity of Pastafarianism.
When describing FAs, we don’t have to worry about blanks, or say whether we are moving left or right! All we need is the input symbol and the output symbol(s). For a first example, here is a transducer FA that computes the one’s complement of its input:
For acceptors, we don’t have any output at all, so the machine transitions just have the input symbol. Here’s an FA to determine whether its input has three consecutive $a$’s somewhere:
An FA that writes to its output on a transition is called a Mealy Machine. You can also make an FA transducer that outputs a symbol whenever it enters a state; this would be called a Moore Machine.
A queue automaton is a TM that, instead of moving left or right at each step, instead always dequeues the symbol at the left and enqueues some other string of symbols on the right.
An interpreter for the $\mathcal{P''}$ language can be considered an automata. This is an exceedingly low-level and minimal notation. Check it out at Wikipedia.
Like $\mathcal{P''}$, Brainfuck (or Brainf**k as it is commonly known) is a programming language with an explicit machine execution model. It is much more widely known and way easier to understand than $\mathcal{P''}$, as it is less minimal.
The execution model features (1) a fixed program stored on a read-only tape where each cell holds one of eight instructions, (2) a read-only input stream of bytes (you can think of it as a tape), (3) a write-only output stream of bytes (also a tape), and (4) a data tape with a fixed left end and infinite to the right (although really the language specification says at least 30,000 cells). The eight instructions are:
Instruction | Description |
---|---|
+ | Increment the byte in the current data cell |
- | Decrement the byte in the current data cell |
. | Output the byte in the current data cell |
, | Read next byte from input and store in current data cell |
< | Move left on tape |
> | Move right on tape |
[ | If byte in current data cell is 0, jump to instruction right after the next matching ] |
] | If byte in current data cell is not 0, jump to instruction right after the previous matching [ |
Here’s a program in the language. Can you figure out what it does?
+++++++[>+++++[>++>+++<<-]<-]>>++.>.
7 5 2 3 7 0 10 15 0 0 70 105 72 105
See the Language overview at Esolang, the Wikipedia article, and this collection of example programs.
We can mechanically compute by applying rules (from a finite set of rules) that allow us to rewrite part of a string into another. We start with the input surrounded by begin and end markers [
and ]
, and keep rewriting, non-deterministically, there are no more rules to apply.
For example, here’s an SRS for incrementing binary numbers:
Some example runs:
How about a formal definition?
An SRS is a tuple $(\Sigma, R)$ where:
The one-step derivation relation $\Rightarrow_R \;\subseteq \Sigma^* \times \Sigma^*$ is defined as: $xuy \Rightarrow_R xvy$ iff $u R v$.
SRS Computation
An SRS $(\Sigma, R)$ computes output $w'$ from input $w$ iff $$[w] \Longrightarrow^*_R w' \wedge \forall\,x\,y\,z . w' = xyz \supset \not \exists u . (y,u) \in R$$
We can make a SRS be a recognizer, too. Here is an SRS that determines whether or not its input is in the form $a^nb^nc^n$, for $n \geq 0$:
SRS Acceptance
An SRS $(\Sigma, R)$ accepts $w$ iff $[w] \Longrightarrow^*_R \fbox{YES}$
Now let’s move from tapes to registers.
A register machine has a number of registers $r_0, r_1, r_2, \ldots$, each holding a single (unbounded) natural number. The machine’s control is a program, defined as a list of (labeled) instructions. The instructions are normally executed in sequence, but some instructions can cause a “jump” to an arbitrary instruction. The instruction set varies from model to model (of which there exist an infinite number of possibilities). Here’s an example of a register machine with a very small instruction set:
Instruction | Description |
---|---|
$\textsf{LOAD } n, r_k$ | $r_k := n$ |
$\textsf{COPY }r_1, r_j$ | $r_j := [r_i]$ |
$\textsf{ADD } r_i, r_j, r_k$ | $r_k := [r_i] + [r_j]$ |
$\textsf{SUB } r_i, r_j, r_k$ | $r_k := [r_i] - [r_j]$ |
... and so on for MUL, DIV, POW, AND, OR, LT, LE, EQ, NE, GE, GT, SHL, SHR | |
$\textsf{NEG } r_i, r_j$ | $r_j := - [r_i]$ |
$\textsf{NOT } r_i, r_j$ | $r_j := \neg [r_i]$ |
Can also include JUMP, JZ, JE, READ, WRITE, HALT from above |
Register machines are rather important, as they are pretty much the basis for most “real” machines today. We’ll discuss them in more detail later.
Any lambda calculus function is essentially a program. We tend to think of the $\lambda$-calculus as a formalism for modeling computation (and computable functions) in general, but one could, in theory, build hardware to carry out the operations of the calculus, so it’s fun to consider it along with more classical automata.
An early one was the SECD Machine.
The μ-Recursive Functions, also known as the General Recursive Functions, or the Partial Recursive Functions, are those functions defined by a small number of rules. They allow computations in “bigger steps” than the λ-Calculus.
Yes, people have created probabilistic automata and quantum automata, too.
A recognizer automata actually defines a formal language! That is, the language of machine $M$ is:
$$ L(M) = \{ w \mid M \textrm{ accepts } w \} $$Nice! This means a formal language can be defined by a grammar or by a machine. This is philosophically interesting, if not profound. It feels like there is a kind of symmetry going here.
How about an example, to understand the important concepts here, and to get the vocabulary right? Let $M$ be the Turing Machine given by the following state diagram:
When run on any non-blank input of binary digits, the machine will end in the accept state iff the input ends with a 0. Otherwise it will get stuck. Therefore we say:
Accepting vs. DecidingIn tne example above, the machine $M$ did indeed both accept and decide the language of even binary integers. However, there do exist languages that are recognizable by a TM but not decidable by any TM. The difference between accepting and deciding is part of computation theory, and we’ll study this later.
We can characterize different automata by the classes of functions they can compute, or equivalently, the languages they can recognize. For example Turing Machines can recognize a much larger class of languages than can Type-3 Machines.
So we can rank automata in increasing “power.” In the following table, the class of languages/problems in each row is a proper subset of the following row:
Automata | Problems that can be solved / Languages that can recognized | Example |
---|---|---|
Loop-free FA | Finite Languages. Languages with a finite number of strings, you can basically just list them all out. Here the FA is simply a directed-acyclic graph. | $\{a,b\}$ |
FA (Type-3 Machine) | Regular Languages. You can recognize these without any external memory whatsoever. And you can process them in linear time. | $a^*$ |
Deterministic PDA (DPDA) | Deterministic Context-Free Languages. An interesting subset of the context-free languages. | $a^nb^n$ |
Nondeterministic PDA (NPDA) | Context-Free Languages. It’s quite interesting, and not really too hard to prove, that the NPDAs recognize exactly the class of CFLs, which are the languages generated by a Context-Free (Type-2) Chomsky Grammar. This is a fairly natural class of languages. | $ww^R$ |
LBA | Type-1 Languages. Bounding the tape of a TM reduces the computing power significantly! | $a^nb^nc^n$ |
TM that always halts | Decidable Languages. Also known as the Recursive Languages, R. | $LBA_{diag}$ |
TM | Type-0 Languages. Recursively Enumerable Languages, RE. For languages that are in RE but not in R, a TM can be found that will accept (say YES) every string in the language, but will not be able to halt and say NO for all strings not in the language. | $H$ |
— | Finitely Describable Languages. Any language that can be finitely described, whether it can be computationally recognized or not. | $\overline{H}$ |
Can any automata compute a larger class of functions than a Turing Machine? We don’t think so. In fact, the following are all equal in computing power:
The fact that these all are essentially the same thing, and all attempts at making them more powerful don’t actually increase their power, led to the Church-Turing Thesis, namely, “The Turing Machine captures exactly we intuitively think of as computable.”
One of the reasons we have theories is to apply them. (Remember, a theory is an organized body of knowledge with explanatory and predictive powers.) What are some applications of automata?
Type-3 machines are used in Lexical Analysis, (extracting “words” from a string of characters), an important operation in text processing, and especially in compiler writing.
Type-2 machines are used in Parsing, uncovering the structure of a string of symbols, another hugely important and common operation.
We can see the relationship between automata and functions, through pictures like:
ant this one:
These pictures are good for humans to build a conceptual understanding of programming by combining functions.
Check out the following sources for more information:
We’ve covered: