Humans deal with complexity in several ways, most notably via abstraction (suppression of details), classification (grouping, naming, tagging), and hierarchy. Here are examples of hierarchical arrangements of objects, starting with an example from geography:
Here is an example showing that a book has a hierarchical structure:
The structure of a computer program is often tree-shaped, for example:
File systems are a classic example of hierarchy:
/ ├── Library │ ├── Fonts │ │ ├── Farisi.ttf │ │ ├── GillSans.ttf │ │ └── Sana.ttc │ └── Keychains │ └── System.keychain ├── Users │ ├── jessica │ │ ├── homework │ │ │ ├── hw1.py │ │ │ └── hw2.txt │ │ └── resume.pdf │ └── kimiko │ ├── IMG0001.jpg │ └── IMG0002.jpg └── bin ├── date ├── hostname ├── kill └── sleep
And if you’ve done any web development, you’ve worked with a DOM tree:
It’s important to know that this tree, known as the DOM, or Document Object Model is exactly the data structure used in web browsers. Knowledge of this tree is absolutely essential for web programmers as front-end web development involves the styling and manipulation of the DOM. The operations to manipulate the DOM all use the language of trees.
Spend some time making observations about these data arrangements. Here are a few to get you started: (1) There are no “cycles” among the nodes in a tree. (2) They are usually drawn upside down with the “root” at the top. (3) For some trees, the order of the “children” of a node seem significant (like in the book, expression, and HTML examples), but in other cases it does not (like in the geography and file system examples). What other observations can you make?
These hierarchies have an interesting “look” to them: the various objects are arranged with connectors between them in such a way that there are no cycles. This idea shows up in other places! When finding minimal connectivity in networks, we get this kind of structure:
We can also arrange items this way to make it possible to do extremely efficient search, spell checking and autocomplete.
Looks like we are on to something here. The science in computer science gets us wondering if there is anything deeper here to study. These connected, cycle-free arrangements, remind us of trees, so that’s what we will call them. Now let’s build up a theory to get us primed to do great things.
It is good to have a vocabulary surrounding trees so we can talk about them precisely and start using them in computer programming. Let’s start with the basics.
Here’s a tree:
And here is some vocabulary:
For the tree above:
We are going to store data in trees, and so trees are a kind of data structure. And what do we do with data structures? We navigate through them. For trees, the way we navigate is through paths.
For the tree above, the following are paths:
The paths have lengths 1, 1, 8, 12, 5, an 12, respectively. Only the first three are simple paths. The rest are non-simple paths.
The tree above did not look like the pictures of hierarchies that came before. All of the hierarchies had a special node at the top and all the edges seemed to grow outward from it. Ah, so there are at least two different kinds of trees:
Here is an oriented tree:
If drawn with the root at the top and all edges pointing downward (as is conventional) then the arrows are redundant and often omitted. Like so:
So yeah, sometimes the only way to tell if a particular picture of a tree is representing a free or an oriented tree is by context.
Oriented trees introduce a lot more vocabulary. Here are many of the important terms.
Rather than explicit definitions, we’re going to explain the terms relative to the example oriented tree we just saw.
Remember in the examples above that int the book and HTML trees, the order mattered, but that in the geography and file system examples, it did not? We actually have a special word for the former, and some variations of interest.
Earlier we mentioned three places where trees tended to show up. Let’s review them here and connect them to the kinds of trees we just defined:
Network applications
Hierarchical data
Searching, spell-checking, autocomplete
Circles and lines are commonly used to show trees, but there are other ways. Here are a few notations people have used in the past for oriented trees. Maybe you can think up sone on your own, too.
This notation for oriented trees is hopefully self-explanatory:
North America Canada Alberta Quebec Chibougamau Montreal Mexico Sonora Jalisco Guadalajara Colima USA California Los Angeles Sunland Hollywood Eagle Rock San Pedro Hawai`i Po`ipu Alaska Juneau Utqiaġvik
Since every node in an oriented tree has zero or one parent, we can store a tree very compactly by keeping only the index of the parent together with the data in the node, for example:
┌───────────────┬────┐ 0 │ │ │ ├───────────────┼────┤ 1 │ Mexico │ 5 │ ├───────────────┼────┤ 2 │ Los Angeles │ 17 │ ├───────────────┼────┤ 3 │ Po`ipu │ 15 │ ├───────────────┼────┤ 4 │ Hollywood │ 2 │ ├───────────────┼────┤ 5 │ North America │ 0 │ ├───────────────┼────┤ 6 │ Montreal │ 8 │ ├───────────────┼────┤ 7 │ San Pedro │ 2 │ ├───────────────┼────┤ 8 │ Quebec │ 11 │ ├───────────────┼────┤ 9 │ Alberta │ 11 │ ├───────────────┼────┤ 10 │ Chibougamau │ 8 │ ├───────────────┼────┤ 11 │ Canada │ 5 │ ├───────────────┼────┤ 12 │ Eagle Rock │ 2 │ ├───────────────┼────┤ 13 │ Colima │ 1 │ ├───────────────┼────┤ 14 │ Sonora │ 1 │ ├───────────────┼────┤ 15 │ Hawai`i │ 19 │ ├───────────────┼────┤ 16 │ Guadalajara │ 20 │ ├───────────────┼────┤ 17 │ California │ 19 │ ├───────────────┼────┤ 18 │ Alaska │ 19 │ ├───────────────┼────┤ 19 │ USA │ 5 │ ├───────────────┼────┤ 20 │ Jalisco │ 1 │ ├───────────────┼────┤ 21 │ Juneau │ 19 │ ├───────────────┼────┤ 22 │ Utqiaġvik │ 19 │ ├───────────────┼────┤ 23 │ Sunland │ 2 │ └───────────────┴────┘
Interestingly, nested structures exhibit a natural hierarchy, making it suitable for oriented trees:
The outline representation we saw earlier forces us to put each item on its own line. We can use parentheses to give us much more flexibility. Each node has the form (root subtree subtree ...)
(North America (Canada Alberta (Quebec Chibougamau Montreal)) (Mexico Sonora (Jalisco Guadalajara) Colima) (USA (California (Los Angeles Sunland Hollywood Eagle Rock San Pedro)) (Hawai`i Po`ipu) (Alaska Juneau Utqiaġvik)))
There are several variations on string representations. By way of example, the representation of the DOM tree we saw earlier can be expressed as a string in the Hypertext Markup Language, HTML, as follows:
<!doctype html> <html> <head> <meta charset="utf-8"> <title>Hello</title> </head> <body> <h1>Welcome</h1> <p>Hello, world, this is my <a href="fido.html">dog</a>. <img src="fido.jpg" alt="fido"> <!-- That wasn't too bad --> </p> </body> </html>
Free trees don’t have a root, so we pretty much treat them as graphs, and can use any one of the many known graph representations (adjacency list, adjacency matrix, incidence list), which we will study later when looking at graphs. For now, we’ll be focusing exclusively on oriented trees.
How many operations can we think of? Let’s see. For a tree $t$ and a node $n$, we can imagine:
Sometimes we will be traversing trees, performing operations on each node as we go. Two popular approaches to traversal are:
So we can add the operations:
When traversing, it is assumed you always process the children in the proper order in ordered trees.
Implementing trees is such a big topic, because there are so many kinds of trees. Let’s get a big picture before diving into programming.
null
s.In practice, $k$-ary trees are so prevalent, with so many applications and so many variations, that we have a whole separate page of notes on them.
For now, though, let’s take a quick peek at how a plain-old ordered tree (with no bounds on the number of children) would be represented in Java. Note that there’s no real distinction between the tree and its root: we just deal with nodes and just straight up treat a node as the root of “its” tree. We are only going to write a small number of methods; you can write the rest:
import java.util.List; import java.util.Queue; import java.util.ArrayList; import java.util.LinkedList; import java.util.function.Consumer; public class TreeNode<T> { private T data; private List<TreeNode<T>> children = new ArrayList<>(); public TreeNode(T data) { this.data = data; } public TreeNode<T> appendChild(T item) { var newNode = new TreeNode<>(item); children.add(newNode); return newNode; } public int size() { return 1 + children.stream().mapToInt(TreeNode::size).sum(); } public boolean isLeaf() { return children.isEmpty(); } public int height() { return isLeaf() ? 0 : 1 + children.stream().mapToInt(TreeNode::height).max().orElse(0); } public void depthFirst(Consumer<T> consumer) { consumer.accept(data); children.forEach(node -> node.depthFirst(consumer)); } public void breadthFirst(Consumer<T> consumer) { Queue<TreeNode<T>> queue = new LinkedList<>(); queue.offer(this); while (!queue.isEmpty()) { var node = queue.remove(); consumer.accept(node.data); for (var child : node.children) { queue.offer(child); } } } @Override public String toString() { var builder = new StringBuilder("("); builder.append(data); for (var child : children) { builder.append(child.toString()); } builder.append(")"); return builder.toString(); } }
How is the tree used? This gives you an idea:
public class TreeNodeExample { public static void main(String[] args) { var nodeA = new TreeNode<String>("A"); var nodeB = nodeA.appendChild("B"); var nodeC = nodeA.appendChild("C"); nodeB.appendChild("D"); nodeB.appendChild("E"); nodeB.appendChild("F"); var nodeG = nodeC.appendChild("G"); nodeC.appendChild("H"); System.out.printf("Here is a tree: %s%n", nodeA); System.out.printf("It has size %d%n", nodeA.size()); System.out.printf("It has height %d%n", nodeA.height()); System.out.printf("Is B a leaf? %b%n", nodeB.isLeaf()); System.out.printf("Is G a leaf? %b%n", nodeG.isLeaf()); System.out.println("Here are the nodes in depth first order:"); nodeA.depthFirst(System.out::println); System.out.println("Here are the nodes in breadth first order:"); nodeA.breadthFirst(System.out::println); } }
$ javac TreeNodeExample.java && java TreeNodeExample Here is a tree: (A(B(D)(E)(F))(C(G)(H))) It has size 8 It has height 2 Is B a leaf? false Is G a leaf? true Here are the nodes in depth first order: A B D E F C G H Here are the nodes in breadth first order: A B C D E F G H [/Users/ray/git-repos/personal-site/code/java/collections]
We’ve covered: