Lists

Lists. You’ve probably heard of them. You probably use them in your daily life. Did you know there is a mathematical theory of them? And that they are one of the most useful structures in programming? Okay sure you have, but they are still fun to study!

The Basics

A list is a positionally-ordered, completely unrestricted sequence. You can add, update, or remove items at any position. You can lookup an item by position, or find the index at which a given value appears.

There are tons of list operations. Here is a small sampling:

OperationDescription
add(x)add item $x$ at the end
addFirst(x)add item $x$ at the beginning
addLast(x)add item $x$ at the end
addAt(x, i)add item $x$ so it is at position $i$ after the add.
remove(x)remove item $x$
removeAt(i)remove the item at position $i$
updateAt(i, x)replace the item at position $i$ with $x$
getAt(i)return the element at position $i$
size()return the number of items
isEmpty()return whether this list has no items
contains(x)return whether $x$ appears
indexOf(x)return the index of the first occurrence of $x$
lastIndexOf(x)return the index of the last occurrence of $x$
indexOf(x, n)return the index of the $n$th occurrence of $x$
reverse()reverse this list
shuffle()permute the elements of this list
sort()sort the items according to their natural order
sort(c)sort the items using comparator $c$
subList(i,j)the sublist from index $i$ inclusive to $j$ exclusive
take(n)same as subList(0, n)
drop(n)same as subList(n, size())
head()same as subList(0, 1)
tail()same as subList(1, size())
map(f)apply function $f$ to each element
filter(p)the elements for which predicate $p$ holds
append(list)The list made by appending another list to this list

Many of these have mutable and immutable variations. If you are building an immutable list, some of these methods will return a new list which is just like the original except with the differences specified by the operation. For a mutable list, some of these methods will directly modify the list itself.

Exercise: Think up five more.

Lists can be used to model:

among other things.

Building Lists From Scratch

Yes, it’s true: there's already a List interface in the Java Core API and a few implementing classes (including ArrayList and LinkedList), but we're going to write lists from scratch because:

Interestingly, the difference between mutable lists and immutable lists is so great, there are separate interfaces for each.

Mutable vs. Immutable Lists

The flavor of immutable list we will study is known as a persistent data structure and is one of the powerful mechanisms underlying functional and logic programming.

Mutable Lists

Mutable lists are used for storing things in a collection that you intend to add items to and remove items from.

An Interface

We start with a very basic mutable list interface with just a few operations:

SimpleList.java
import java.util.function.Consumer;

public interface SimpleList {

    /**
     * Returns the number of elements in the list.
     */
    int size();

    /**
     * Inserts <code>item</code> as the new first item.
     */
    void addFirst(Object item);

    /**
     * Inserts <code>item</code> as the new last item.
     */
    void addLast(Object item);

    /**
     * Inserts <code>item</code> so that after insertion it is the item at the given
     * index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size].
     */
    void add(int index, Object item);

    /**
     * Removes the item at the given index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    void remove(int index);

    /**
     * Returns the item at the given index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    Object get(int index);

    /**
     * Replaces the item at the given index position with <code>item</code>.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    void set(int index, Object item);

    /**
     * Returns the index of the first item that is <code>".equals"</code> to the
     * given item.
     */
    int indexOf(Object item);

    /**
     * Applies <code>consumer</code> to each list element.
     */
    void forEach(Consumer<Object> consumer);
}

Representations

There are a few ways to represent mutable lists. We’ll look only at two: (1) A doubly-linked collection of nodes, and (2) a bounded array.

Linked Lists

Professional linked implementations are generally doubly linked and make use of a "header node" that contains no information. These choices greatly simply insertion and deletion since we don’t have to worry about nulls.

linked-list-pictures.png

There are no nulls in this implementation (except for one fleeting microsecond in the field declaration for the header, but you can probably figure out why that has to be):

SimpleLinkedList.java
import java.util.NoSuchElementException;
import java.util.function.Consumer;
import java.util.function.Predicate;
import java.util.function.UnaryOperator;

/**
 * A simple linked list class, implemented completely from scratch, using doubly
 * linked nodes and a cool "header" node to greatly simplify insertion and
 * deletions on the end of the list.
 */
public class SimpleLinkedList {

    /**
     * Doubly-linked node class, completely private to the List class, as clients
     * don't care about the implementation of the list. This is a regular class and
     * not a record because nodes are mutable.
     */
    private class Node {
        Object item;
        Node next;
        Node previous;

        Node(Object item, Node next, Node previous) {
            this.item = item;
            this.next = next;
            this.previous = previous;
        }
    }

    /**
     * The list itself maintains only a reference to its "header" node. The header
     * is a node that does not store any data. Its 'next' field points to the first
     * item in the list and its 'previous' field points to the last item. This makes
     * all insertions and deletions uniform, even at the beginning and the end of
     * the list!
     */
    private Node header = new Node(null, null, null);

    /**
     * The number of items in the list, stored to make size() O(1).
     */
    private int size = 0;

    public SimpleLinkedList() {
        // The empty list is made by pointing the header to itself.
        header.next = header.previous = header;
    }

    public int size() {
        // The size stored as a field for easy lookup
        return size;
    }

    public void addFirst(Object item) {
        // The first node is the one right after the header.
        addBefore(item, header.next);
    }

    public void addLast(Object item) {
        // The last node is the one right before the header.
        addBefore(item, header);
    }

    public void add(int index, Object item) {
        // This one is trickier, as there is a special case. If the index is equal
        // to the size, we are just adding the new element at the end. Otherwise
        // we use the special nodeAt() utility to point to the existing node at the
        // desired position. Adding before this node does the trick.
        addBefore(item, (index == size ? header : nodeAt(index)));
    }

    public void remove(int index) {
        // The utility methods do all the work here.
        remove(nodeAt(index));
    }

    public Object get(int index) {
        // Many thanks to the nodeAt() utility!
        return nodeAt(index).item;
    }

    public void set(int index, Object item) {
        // nodeAt() is sure cool, innit?
        nodeAt(index).item = item;
    }

    public int indexOf(Object item) {
        // Ah the old "linear search" pattern: a for-loop with a return in
        // the middle. Note that to iterate the actual nodes, we start at
        // header.next, since header does not hold a list item. The loop
        // terminates when we get back to the header.
        var index = 0;
        for (var node = header.next; node != header; node = node.next) {
            if (node.item.equals(item)) {
                return index;
            }
            index++;
        }
        return -1;
    }

    public void forEach(Consumer<Object> consumer) {
        // Iterate from the first node (the one after the header).
        for (var node = header.next; node != header; node = node.next) {
            consumer.accept(node.item);
        }
    }

    /**
     * Get the node at a given index.
     */
    private Node nodeAt(int index) {
        // The only legal indexes are 0, 1, ... size-1.
        if (index < 0 || index >= size) {
            throw new IndexOutOfBoundsException(index + " for size " + size);
        }
        // Only way to get by index in a simple linked list is to walk and
        // count. There are other kinds of lists that give quicker access by
        // position, but those structures are for another time.
        Node node = header;
        for (var i = 0; i <= index; i++) {
            node = node.next;
        }
        return node;
    }

    private void addBefore(Object item, Node node) {
        var newNode = new Node(item, node, node.previous);
        newNode.previous.next = newNode;
        newNode.next.previous = newNode;
        size++;
    }

    /**
     * Removes the referenced node by making its neighbors point around it.
     */
    private void remove(Node node) {
        // Just in case
        if (node == header) {
            throw new NoSuchElementException();
        }
        node.previous.next = node.next;
        node.next.previous = node.previous;
        size--;
    }
}
CLASSWORK
We will explore these implementations, in class, on a whiteboard.
Exercise: Make sure you understand how the circularity of the implementation, combined with the awesome header node, greatly simplifies the implementation.
Exercise: Fill in the blanks:
  • Adding at the end of the list is implemented by adding a new node before ________________.
  • Adding at the front of the list is implemented by adding a new node before ________________.
Also explain how you can quickly locate the first item of the list and the last item.

Array Lists

When using arrays, we can opt for fixed-size or “expandable” structures. Unlike stacks and queues, lists allow insertion “in the middle” which for arrays can have nasty performance implications as elements have to be “sifted.” Then again, with arrays, we can look up elements by position immediately, but with links we have to “search.”

TODO images

Here is a bounded array implementation:

TODO

The “expandable array” implementation is left to you.

Array Lists vs. Linked Lists

Pay attention people this shows up all the time in internship interviews.

OperationArrayListLinkedList
Lookup by PositionFast (always instant)Slow (unless small index)
Insert / DeleteSlow (unless at end)Fast (once at position)

Immutable Lists

If lists are immutable, there is a beautiful (elegant) singly-linked implementation. Before we get to it, we need to change our thinking a bit. Instead of saying a list is a sequence of elements, we’re going to say something pretty neat:

A list is either empty or is an item attached to a list.

Really? It’s true! Notice:

Exercise: Why does this definition work? Why is it not a “circular” definition (which is bad), but is instead of “recursive” definition (which is good)?

So empty lists and nodes at the two “flavors” of a list. In pictures, our lists look like this:

singly-linked-list.png

Time for an implementation! And what a great time it is, to introduce the sealed interface!

We’ll start small, with the only abstract methods being size, head, tail, at, and forEach. Two static methods will help with construction: of and from.

SimpleImmutableList.java
import java.util.NoSuchElementException;
import java.util.function.Consumer;
import java.util.function.Predicate;
import java.util.function.UnaryOperator;

/**
 * A simple immutable linked list, implement completely from scratch as a
 * singly-linked structure. There aren't too many operations, as additional will
 * be left for homework.
 * 
 * The implementation uses a sum type because I have major null-phobia. Not
 * going to pay the billion-dollar mistake, not with this one.
 */
public sealed interface SimpleImmutableList permits EmptyList, ListNode {
    int size();

    Object head();

    SimpleImmutableList tail();

    static SimpleImmutableList of(Object... items) {
        SimpleImmutableList list = new EmptyList();
        for (var i = items.length - 1; i >= 0; i--) {
            list = new ListNode(items[i], list);
        }
        return list;
    }

    static SimpleImmutableList from(Object head, SimpleImmutableList tail) {
        return new ListNode(head, tail);
    }

    Object at(int index);

    void forEach(Consumer<Object> consumer);
}

final record EmptyList() implements SimpleImmutableList {
    public int size() {
        return 0;
    }

    public Object head() {
        throw new NoSuchElementException();
    }

    public SimpleImmutableList tail() {
        throw new NoSuchElementException();
    }

    public Object at(int index) {
        throw new NoSuchElementException();
    }

    public void forEach(Consumer<Object> consumer) {
        // Intentionally empty: nothing to iterate
    }
}

final record ListNode(
        Object head, SimpleImmutableList tail) implements SimpleImmutableList {
    public int size() {
        return 1 + tail.size();
    }

    public Object at(int index) {
        return index == 0 ? head : tail.at(index - 1);
    }

    public void forEach(Consumer<Object> consumer) {
        consumer.accept(head);
        tail.forEach(consumer);
    }
}
CLASSWORK
We will explore these implementations, in class, on a whiteboard.
Exercise: Add methods take, drop, reversed, shuffled, sorted, map, and filter. Do some research to see what other kinds of methods appear in a list data type, and implement as many as you can.

Unit Testing

This class is pretty large; we expect the test suite to be huge. Here is a partial list what we need to test:

Unit tests are shown in a homework assignment.

Summary

We’ve covered:

  • What lists are
  • List operations
  • Uses of lists
  • Mutable vs. immutable lists
  • Linked vs. array lists
  • Implementation techniques