Automata Theory

Part of understanding computation involves the study of formal models of computation, which have traditionally been called automata.

Motivation and History

The field we now call automata theory arose from many different areas of research, including:

These folks all came up with the notion of an automaton as a kind of machine, real or abstract, that follows a fixed set of instructions to carry out a step-by-step procedure.

A brief history of the field can be found at Wikipedia.

Exercise: Skim the Wikipedia article now.

We want a rigorous theory of automata. Therefore, we will make the following assumptions about the automata that we study:

The first assumption is reasonable because a machine that requires an infinite description could not be built, let alone understood. The second is motivated by the fact that it’s pretty difficult (impossible?) to conceive of any kind of information that we could not represent with strings of symbols. For example, the following capture quite a variety of information:

Exercise: Can you think of any information that cannot be represented as a string of symbols? If you can, how would you communicate it to someone else? What would “computing” with such information look like?

Why create and study a theory of automata? Two reasons:

Kinds of Automata

An automaton is any device, physical or abstract, that runs by itself (hence the prefix “auto”) to process some information. There should be no limits to your imagination regarding how an automaton should or should not look, nor what kinds of internal operations it carries out (as long as it is finitely describable).

People have created many kinds of automata. Some strive to be as minimal as possible (theoretically valuable, but hard to work with in practice); some are much more complex internally (but easier for humans to work with). In terms of technical details, there are at least four dimensions on which we can characterize automata:

There are more dimensions: synchronous vs. asynchronous, deterministic vs. non-deterministic, definite vs. probabilistic, classical vs. quantum, etc. But as these notes are introductory, we will focus only on the first four dimensions.

Behavioral Variations

Our first dimension is one of convenience only.

Example: A example problem for a transducer might be: Compute the product of this pair of numbers.
Example: A problem for a recognizer might be: Is this number prime?

Operational Variations

There are variations on how the automata are structured, particularly how their control unit, or algorithm, is specified. The main options are with a (1) a state machine, (2) a set of instructions, or (3) a set of rewrite rules.

Are these really all the same in some sense? Are these distinctions just for human convenience?

Exercise: Show how to represent a list-of-instructions machine as a state machine. Show how to represent a string rewriter as a state machine.

Memory Variations

There’s a lot of variety in memory, but two primary kinds are:

What else? You can have fun, too, with things like multi-dimensional tapes. (A two-dimensional tape would allow for up, down, left, and right movements.) Use your imagination to build new computing machines.

Architectural Variations

One final variation of interest, applicable to instruction-based automata only, really, is: where are the instructions stored? There are two main architectures:

Classic Examples of Automata

Turing Machines (TMs)

Turing Machines are so historically significant and influential, that I have an entire separate page of notes on them.

But the basic idea is that the input sits on a infinite tape of cells surrounded by blanks. The machine has a head pointing to the first symbol of the input. The machine’s control is a finite set of rules that says “If in state $q$ looking at symbol $\sigma$, replace the current symbol with $\sigma'$, move either one cell right or left, and transition to state $q'$.”

tm.png

Watch Jade’s video again:

Again, we’ll cover Turing Machines in detail later, but for now here’s an example Turing Machine (of the transducer variety) to increment a binary number:

tmIncrement.png

CLASSWORK
Roleplay, with paper and pencil, the human computer of Turing’s day sitting at a desk carrying out the task of incrementing the binary number 11010011111. Put yourself in each “mental state” expressed in the above machine.

And here is one (of the recognizer) variety to determine if a given unsigned binary number is divisible by 3:

tmDividesThree.png

Later, we’ll talk about how we came up with these beauties. Not too bad for us, right? But how the heck did Turing just figure all this stuff out to begin with? Pretty amazing, right? Yes, it is amazing even though his work was massively influenced by previous work of Gödel and Church and others, which Turing was happy to acknowledge.

Wait, only one tape?

Can a Turing Machine have multiple tapes? Sure. Turing was going for simplicity (if not outright minimalism) when we introduced this machine as a model for computation. You can show that any TM with multiple tapes can be simulated with a 1-tape TM, so various other tape-based models of computation can generally be considered kinds of Turing Machines. Some of these we’ll see right now.

Linear Bounded Automata (LBAs)

An LBA is a Turing Machine whose tape is not infinite but rather contains only its input with one cell to the left containing an non-replaceable “left-end-marker symbol” and one cell to the right containing a non-replaceable “right-end-marker” symbol.

lba.png

It feels a little more realistic, closer to a real computer, because of it having a finite “memory,” right?

Pushdown Automata (PDAs)

A PDA is an automata with a finite-state control, with a read-only input, and separate stack. At each step you read the next input symbol, and depending on the current top of the stack and the current state, you pop from the stack, push something new, and move to a new state.

pda.png

Exercise: Why would anyone care about a machine like this? (Hint: how efficient are stacks, in terms of the Big-Θ complexity measure you learned about back in the day?)

Type-3, or “Finite” Automata (FAs)

A Type-3 Automaton, often called just a Finite Automaton or FA, is a Turing Machine whose input is read-only and that only moves to the right on each step. For transducing FAs, the output is written to a separate output tape that is write-only, initially blank, and only appended to.

fa.png

Do not let this terrible name confuse you

It is unfortunate that these machines are called finite automata, because Turing Machines have a finite number of states, as do LBAs and PDAs. Maybe it is because of finite memory? No LBAs have finite memory. No one has a good name for this class of automata. Everyone calls them FAs.

Perhaps if you never heard of LBAs the name might make sense.

By the way, you may sometimes hear them called FSMs for “finite state machines” but that is much, much worse since all reasonable automata have a finite number of states and the acronym is easily confused with the deity of Pastafarianism.

When describing FAs, we don’t have to worry about blanks, or say whether we are moving left or right! All we need is the input symbol and the output symbol(s). For a first example, here is a transducer FA that computes the one’s complement of its input:

fa-ones-complement.png

For acceptors, we don’t have any output at all, so the machine transitions just have the input symbol. Here’s an FA to determine whether its input has three consecutive $a$’s somewhere:

fa-three-consecutive-as.png

An FA that writes to its output on a transition is called a Mealy Machine. You can also make an FA transducer that outputs a symbol whenever it enters a state; this would be called a Moore Machine.

Exercise: What “kind” of computations do Finite Automata represent?

Queue Automata

A queue automaton is a TM that, instead of moving left or right at each step, instead always dequeues the symbol at the left and enqueues some other string of symbols on the right.

$\mathcal{P''}$

An interpreter for the $\mathcal{P''}$ language can be considered an automata. This is an exceedingly low-level and minimal notation. Check it out at Wikipedia.

Brainfuck

Like $\mathcal{P''}$, Brainfuck (or Brainf**k as it is commonly known) is a programming language with an explicit machine execution model. It is much more widely known and way easier to understand than $\mathcal{P''}$, as it is less minimal.

The execution model features (1) a fixed program stored on a read-only tape where each cell holds one of eight instructions, (2) a read-only input stream of bytes (you can think of it as a tape), (3) a write-only output stream of bytes (also a tape), and (4) a data tape with a fixed left end and infinite to the right (although really the language specification says at least 30,000 cells). The eight instructions are:

InstructionDescription
+Increment the byte in the current data cell
-Decrement the byte in the current data cell
.Output the byte in the current data cell
,Read next byte from input and store in current data cell
<Move left on tape
>Move right on tape
[If byte in current data cell is 0, jump to instruction right after the next matching ]
]If byte in current data cell is not 0, jump to instruction right after the previous matching [

Here’s a program in the language. Can you figure out what it does?

+++++++[>+++++[>++>+++<<-]<-]>>++.>.
Need a hint?
7  5   2    3
7  0  10   15
0  0  70  105
      72  105

See the Language overview at Esolang, the Wikipedia article, and this collection of example programs.

String Rewriting Systems

We can mechanically compute by applying rules (from a finite set of rules) that allow us to rewrite part of a string into another. We start with the input surrounded by begin and end markers [ and ], and keep rewriting, non-deterministically, there are no more rules to apply.

For example, here’s an SRS for incrementing binary numbers:

$\begin{array}{lcl} 0 ] & \rightarrow & Y 1 \\ 0 Y & \rightarrow & Y 0 \\ 1 Y & \rightarrow & Y 1 \\ [ Y & \rightarrow & \varepsilon \\ 1 ] & \rightarrow & X 0 \\ 0 X & \rightarrow & Y 1 \\ 1 X & \rightarrow & X 0 \\ [ X & \rightarrow & 1 \\ \end{array}$

Some example runs:

How about a formal definition?

An SRS is a tuple $(\Sigma, R)$ where:

  • $\Sigma$ is the alphabet of symbols
  • $R \subseteq \Sigma^* \times \Sigma^*$ is the rewriting relation.

The one-step derivation relation $\Rightarrow_R \;\subseteq \Sigma^* \times \Sigma^*$ is defined as: $xuy \Rightarrow_R xvy$ iff $u R v$.

SRS Computation

An SRS $(\Sigma, R)$ computes output $w'$ from input $w$ iff $$[w] \Longrightarrow^*_R w' \wedge \forall\,x\,y\,z . w' = xyz \supset \not \exists u . (y,u) \in R$$

We can make a SRS be a recognizer, too. Here is an SRS that determines whether or not its input is in the form $a^nb^nc^n$, for $n \geq 0$:

$\begin{array}{lcl} a b & \rightarrow & X \\ X b & \rightarrow & b X \\ X c & \rightarrow & \varepsilon \\ [ ] & \rightarrow & \fbox{YES} \\ \end{array}$
SRS Acceptance

An SRS $(\Sigma, R)$ accepts $w$ iff $[w] \Longrightarrow^*_R \fbox{YES}$
Exercise: Show how $aaabbbccc$ is accepted. Show how abbbca is rejected.
Exercise: Give a Turing Machine to decide this language.

Register Machines

Now let’s move from tapes to registers.

A register machine has a number of registers $r_0, r_1, r_2, \ldots$, each holding a single (unbounded) natural number. The machine’s control is a program, defined as a list of (labeled) instructions. The instructions are normally executed in sequence, but some instructions can cause a “jump” to an arbitrary instruction. The instruction set varies from model to model (of which there exist an infinite number of possibilities). Here’s an example of a register machine with a very small instruction set:

InstructionDescription
$\textsf{LOAD } n, r_k$$r_k := n$
$\textsf{COPY }r_1, r_j$$r_j := [r_i]$
$\textsf{ADD } r_i, r_j, r_k$$r_k := [r_i] + [r_j]$
$\textsf{SUB } r_i, r_j, r_k$$r_k := [r_i] - [r_j]$
... and so on for MUL, DIV, POW, AND, OR, LT, LE, EQ, NE, GE, GT, SHL, SHR
$\textsf{NEG } r_i, r_j$$r_j := - [r_i]$
$\textsf{NOT } r_i, r_j$$r_j := \neg [r_i]$
Can also include JUMP, JZ, JE, READ, WRITE, HALT from above

Register machines are rather important, as they are pretty much the basis for most “real” machines today. We’ll discuss them in more detail later.

Lambda Calculus Interpreters

Any lambda calculus function is essentially a program. We tend to think of the $\lambda$-calculus as a formalism for modeling computation (and computable functions) in general, but one could, in theory, build hardware to carry out the operations of the calculus, so it’s fun to consider it along with more classical automata.

An early one was the SECD Machine.

Exercise: Study this machine.

μ-Recursive Functions

The μ-Recursive Functions, also known as the General Recursive Functions, or the Partial Recursive Functions, are those functions defined by a small number of rules. They allow computations in “bigger steps” than the λ-Calculus.

And More!

Yes, people have created probabilistic automata and quantum automata, too.

Connections To Formal Languages

A recognizer automata actually defines a formal language! That is, the language of machine $M$ is:

$$ L(M) = \{ w \mid M \textrm{ accepts } w \} $$

Nice! This means a formal language can be defined by a grammar or by a machine. This is philosophically interesting, if not profound. It feels like there is a kind of symmetry going here.

How about an example, to understand the important concepts here, and to get the vocabulary right? Let $M$ be the Turing Machine given by the following state diagram:

tmIsEven.png

When run on any non-blank input of binary digits, the machine will end in the accept state iff the input ends with a 0. Otherwise it will get stuck. Therefore we say:

Accepting vs. Deciding

In tne example above, the machine $M$ did indeed both accept and decide the language of even binary integers. However, there do exist languages that are recognizable by a TM but not decidable by any TM. The difference between accepting and deciding is part of computation theory, and we’ll study this later.

Connections To Computability Theory

We can characterize different automata by the classes of functions they can compute, or equivalently, the languages they can recognize. For example Turing Machines can recognize a much larger class of languages than can Type-3 Machines.

So we can rank automata in increasing “power.” In the following table, the class of languages/problems in each row is a proper subset of the following row:

AutomataProblems that can be solved /
Languages that can recognized
Example
Loop-free FAFinite Languages. Languages with a finite number of strings, you can basically just list them all out. Here the FA is simply a directed-acyclic graph.$\{a,b\}$
FA (Type-3 Machine)Regular Languages. You can recognize these without any external memory whatsoever. And you can process them in linear time.$a^*$
Deterministic PDA (DPDA)Deterministic Context-Free Languages. An interesting subset of the context-free languages.$a^nb^n$
Nondeterministic PDA (NPDA)Context-Free Languages. It’s quite interesting, and not really too hard to prove, that the NPDAs recognize exactly the class of CFLs, which are the languages generated by a Context-Free (Type-2) Chomsky Grammar. This is a fairly natural class of languages.$ww^R$
LBAType-1 Languages. Bounding the tape of a TM reduces the computing power significantly!$a^nb^nc^n$
TM that always haltsDecidable Languages. Also known as the Recursive Languages, R.$LBA_{diag}$
TMType-0 Languages. Recursively Enumerable Languages, RE. For languages that are in RE but not in R, a TM can be found that will accept (say YES) every string in the language, but will not be able to halt and say NO for all strings not in the language.$H$
Finitely Describable Languages. Any language that can be finitely described, whether it can be computationally recognized or not.$\overline{H}$

Can any automata compute a larger class of functions than a Turing Machine? We don’t think so. In fact, the following are all equal in computing power:

The fact that these all are essentially the same thing, and all attempts at making them more powerful don’t actually increase their power, led to the Church-Turing Thesis, namely, “The Turing Machine captures exactly we intuitively think of as computable.”

Exercise: Read about the Chruch-Turing Thesis at the Stanford Encyclopedia of Philosophy and at Wikipedia.

Applications

One of the reasons we have theories is to apply them. (Remember, a theory is an organized body of knowledge with explanatory and predictive powers.) What are some applications of automata?

Type-3 machines are used in Lexical Analysis, (extracting “words” from a string of characters), an important operation in text processing, and especially in compiler writing.

Type-2 machines are used in Parsing, uncovering the structure of a string of symbols, another hugely important and common operation.

We can see the relationship between automata and functions, through pictures like:

odd.png

ant this one:

interest-block-diagram.png

These pictures are good for humans to build a conceptual understanding of programming by combining functions.

Where to Lean More

Check out the following sources for more information:

Summary

We’ve covered:

  • What automata theory is
  • Some history of automata theory
  • Various types of automata
  • Relationship of automata theory with formal language theory
  • Relationship of automata theory to computability theory