Lists

Lists. You’ve probably heard of them. You probably use them in your daily life. Did you know there is a mathematical theory of them? And that they are one of the most useful structures in programming? Okay sure you have, but they are still fun to study!

The Basics

A list is a positionally-ordered, completely unrestricted sequence. You can add, update, or remove items at any position. You can lookup an item by position, or find the index at which a given value appears.

There are tons of list operations. Here is a small sampling:

Operation	Description
`add(x)`	add item $x$ at the end
`addFirst(x)`	add item $x$ at the beginning
`addLast(x)`	add item $x$ at the end
`addAt(x, i)`	add item $x$ so it is at position $i$ after the add.
`remove(x)`	remove item $x$
`removeAt(i)`	remove the item at position $i$
`updateAt(i, x)`	replace the item at position $i$ with $x$
`getAt(i)`	return the element at position $i$
`size()`	return the number of items
`isEmpty()`	return whether this list has no items
`contains(x)`	return whether $x$ appears
`indexOf(x)`	return the index of the first occurrence of $x$
`lastIndexOf(x)`	return the index of the last occurrence of $x$
`indexOf(x, n)`	return the index of the $n$th occurrence of $x$
`reverse()`	reverse this list
`shuffle()`	permute the elements of this list
`sort()`	sort the items according to their natural order
`sort(c)`	sort the items using comparator $c$
`subList(i,j)`	the sublist from index $i$ inclusive to $j$ exclusive
`take(n)`	same as `subList(0, n)`
`drop(n)`	same as `subList(n, size())`
`head()`	same as `subList(0, 1)`
`tail()`	same as `subList(1, size())`
`map(f)`	apply function $f$ to each element
`filter(p)`	the elements for which predicate $p$ holds
`append(list)`	The list made by appending another list to this list

Many of these have mutable and immutable variations. If you are building an immutable list, some of these methods will return a new list which is just like the original except with the differences specified by the operation. For a mutable list, some of these methods will directly modify the list itself.

Exercise: Think up five more.

Lists can be used to model:

The slides in a presentation
A playlist in a music playing application
The tasks you need to get done
Search results (there is always some order here)
Your top-ten favorite movies / songs / desserts / vacation spots

among other things.

Building Lists From Scratch

Yes, it’s true: there's already a List interface in the Java Core API and a few implementing classes (including ArrayList and LinkedList), but we're going to write lists from scratch because:

This is the best way for you to really learn how lists work behind the scenes
You need coding practice, especially with linked structures
Job interviews sometimes involve implementing lists on a whiteboard
You need to know how to code things up when you find yourself in a very restrictive environment that doesn't have a collections library.

Interestingly, the difference between mutable lists and immutable lists is so great, there are separate interfaces for each.

Mutable vs. Immutable Lists
The flavor of immutable list we will study is known as a persistent data structure and is one of the powerful mechanisms underlying functional and logic programming.

Mutable Lists

Mutable lists are used for storing things in a collection that you intend to add items to and remove items from.

An Interface

We start with a very basic mutable list interface with just a few operations:

Size
Add and Remove
Read and Write element at position
Find
Iterate

SimpleList.java

import java.util.function.Consumer;

public interface SimpleList {

    /**
     * Returns the number of elements in the list.
     */
    int size();

    /**
     * Inserts <code>item</code> as the new first item.
     */
    void addFirst(Object item);

    /**
     * Inserts <code>item</code> as the new last item.
     */
    void addLast(Object item);

    /**
     * Inserts <code>item</code> so that after insertion it is the item at the given
     * index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size].
     */
    void add(int index, Object item);

    /**
     * Removes the item at the given index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    void remove(int index);

    /**
     * Returns the item at the given index position.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    Object get(int index);

    /**
     * Replaces the item at the given index position with <code>item</code>.
     *
     * @throws IndexOutOfBoundsException if index not in [0,size).
     */
    void set(int index, Object item);

    /**
     * Returns the index of the first item that is <code>".equals"</code> to the
     * given item.
     */
    int indexOf(Object item);

    /**
     * Applies <code>consumer</code> to each list element.
     */
    void forEach(Consumer<Object> consumer);
}

Representations

There are a few ways to represent mutable lists. We’ll look only at two: (1) A doubly-linked collection of nodes, and (2) a bounded array.

Linked Lists

Professional linked implementations are generally doubly linked and make use of a "header node" that contains no information. These choices greatly simply insertion and deletion since we don’t have to worry about nulls.

There are no nulls in this implementation (except for one fleeting microsecond in the field declaration for the header, but you can probably figure out why that has to be):

SimpleLinkedList.java

import java.util.NoSuchElementException;
import java.util.function.Consumer;
import java.util.function.Predicate;
import java.util.function.UnaryOperator;

/**
 * A simple linked list class, implemented completely from scratch, using doubly
 * linked nodes and a cool "header" node to greatly simplify insertion and
 * deletions on the end of the list.
 */
public class SimpleLinkedList {

    /**
     * Doubly-linked node class, completely private to the List class, as clients
     * don't care about the implementation of the list. This is a regular class and
     * not a record because nodes are mutable.
     */
    private class Node {
        Object item;
        Node next;
        Node previous;

        Node(Object item, Node next, Node previous) {
            this.item = item;
            this.next = next;
            this.previous = previous;
        }
    }

    /**
     * The list itself maintains only a reference to its "header" node. The header
     * is a node that does not store any data. Its 'next' field points to the first
     * item in the list and its 'previous' field points to the last item. This makes
     * all insertions and deletions uniform, even at the beginning and the end of
     * the list!
     */
    private Node header = new Node(null, null, null);

    /**
     * The number of items in the list, stored to make size() O(1).
     */
    private int size = 0;

    public SimpleLinkedList() {
        // The empty list is made by pointing the header to itself.
        header.next = header.previous = header;
    }

    public int size() {
        // The size stored as a field for easy lookup
        return size;
    }

    public void addFirst(Object item) {
        // The first node is the one right after the header.
        addBefore(item, header.next);
    }

    public void addLast(Object item) {
        // The last node is the one right before the header.
        addBefore(item, header);
    }

    public void add(int index, Object item) {
        // This one is trickier, as there is a special case. If the index is equal
        // to the size, we are just adding the new element at the end. Otherwise
        // we use the special nodeAt() utility to point to the existing node at the
        // desired position. Adding before this node does the trick.
        addBefore(item, (index == size ? header : nodeAt(index)));
    }

    public void remove(int index) {
        // The utility methods do all the work here.
        remove(nodeAt(index));
    }

    public Object get(int index) {
        // Many thanks to the nodeAt() utility!
        return nodeAt(index).item;
    }

    public void set(int index, Object item) {
        // nodeAt() is sure cool, innit?
        nodeAt(index).item = item;
    }

    public int indexOf(Object item) {
        // Ah the old "linear search" pattern: a for-loop with a return in
        // the middle. Note that to iterate the actual nodes, we start at
        // header.next, since header does not hold a list item. The loop
        // terminates when we get back to the header.
        var index = 0;
        for (var node = header.next; node != header; node = node.next) {
            if (node.item.equals(item)) {
                return index;
            }
            index++;
        }
        return -1;
    }

    public void forEach(Consumer<Object> consumer) {
        // Iterate from the first node (the one after the header).
        for (var node = header.next; node != header; node = node.next) {
            consumer.accept(node.item);
        }
    }

    /**
     * Get the node at a given index.
     */
    private Node nodeAt(int index) {
        // The only legal indexes are 0, 1, ... size-1.
        if (index < 0 || index >= size) {
            throw new IndexOutOfBoundsException(index + " for size " + size);
        }
        // Only way to get by index in a simple linked list is to walk and
        // count. There are other kinds of lists that give quicker access by
        // position, but those structures are for another time.
        Node node = header;
        for (var i = 0; i <= index; i++) {
            node = node.next;
        }
        return node;
    }

    private void addBefore(Object item, Node node) {
        var newNode = new Node(item, node, node.previous);
        newNode.previous.next = newNode;
        newNode.next.previous = newNode;
        size++;
    }

    /**
     * Removes the referenced node by making its neighbors point around it.
     */
    private void remove(Node node) {
        // Just in case
        if (node == header) {
            throw new NoSuchElementException();
        }
        node.previous.next = node.next;
        node.next.previous = node.previous;
        size--;
    }
}

CLASSWORK

We will explore these implementations, in class, on a whiteboard.

Exercise: Make sure you understand how the circularity of the implementation, combined with the awesome header node, greatly simplifies the implementation.

Exercise: Fill in the blanks:

Adding at the end of the list is implemented by adding a new node before ________________.
Adding at the front of the list is implemented by adding a new node before ________________.

Also explain how you can quickly locate the first item of the list and the last item.

Array Lists

When using arrays, we can opt for fixed-size or “expandable” structures. Unlike stacks and queues, lists allow insertion “in the middle” which for arrays can have nasty performance implications as elements have to be “sifted.” Then again, with arrays, we can look up elements by position immediately, but with links we have to “search.”

TODO images

Here is a bounded array implementation:

TODO

The “expandable array” implementation is left to you.

Array Lists vs. Linked Lists
Pay attention people this shows up all the time in internship interviews.

Operation ArrayList LinkedList

Lookup by Position Fast (always instant) Slow (unless small index)

Insert / Delete Slow (unless at end) Fast (once at position)

Operation	ArrayList	LinkedList
Lookup by Position	Fast (always instant)	Slow (unless small index)
Insert / Delete	Slow (unless at end)	Fast (once at position)

Immutable Lists

If lists are immutable, there is a beautiful (elegant) singly-linked implementation. Before we get to it, we need to change our thinking a bit. Instead of saying a list is a sequence of elements, we’re going to say something pretty neat:

A list is either empty or is an item attached to a list.

Really? It’s true! Notice:

The empty list, [], is a list.
Therefore, 8 :: [] is a list.
Therefore, 2 :: 8 :: [] is a list.
Therefore, 5 :: 2 :: 8 :: [] is a list.
Therefore, 1 :: 5 :: 2 :: 8 :: [] is a list.
and so on

Exercise: Why does this definition work? Why is it not a “circular” definition (which is bad), but is instead of “recursive” definition (which is good)?

So empty lists and nodes at the two “flavors” of a list. In pictures, our lists look like this:

Time for an implementation! And what a great time it is, to introduce the sealed interface!

We’ll start small, with the only abstract methods being size, head, tail, at, and forEach. Two static methods will help with construction: of and from.

SimpleImmutableList.java

import java.util.NoSuchElementException;
import java.util.function.Consumer;
import java.util.function.Predicate;
import java.util.function.UnaryOperator;

/**
 * A simple immutable linked list, implement completely from scratch as a
 * singly-linked structure. There aren't too many operations, as additional will
 * be left for homework.
 * 
 * The implementation uses a sum type because I have major null-phobia. Not
 * going to pay the billion-dollar mistake, not with this one.
 */
public sealed interface SimpleImmutableList permits EmptyList, ListNode {
    int size();

    Object head();

    SimpleImmutableList tail();

    static SimpleImmutableList of(Object... items) {
        SimpleImmutableList list = new EmptyList();
        for (var i = items.length - 1; i >= 0; i--) {
            list = new ListNode(items[i], list);
        }
        return list;
    }

    static SimpleImmutableList from(Object head, SimpleImmutableList tail) {
        return new ListNode(head, tail);
    }

    Object at(int index);

    void forEach(Consumer<Object> consumer);
}

final record EmptyList() implements SimpleImmutableList {
    public int size() {
        return 0;
    }

    public Object head() {
        throw new NoSuchElementException();
    }

    public SimpleImmutableList tail() {
        throw new NoSuchElementException();
    }

    public Object at(int index) {
        throw new NoSuchElementException();
    }

    public void forEach(Consumer<Object> consumer) {
        // Intentionally empty: nothing to iterate
    }
}

final record ListNode(
        Object head, SimpleImmutableList tail) implements SimpleImmutableList {
    public int size() {
        return 1 + tail.size();
    }

    public Object at(int index) {
        return index == 0 ? head : tail.at(index - 1);
    }

    public void forEach(Consumer<Object> consumer) {
        consumer.accept(head);
        tail.forEach(consumer);
    }
}

CLASSWORK

We will explore these implementations, in class, on a whiteboard.

Exercise: Add methods take, drop, reversed, shuffled, sorted, map, and filter. Do some research to see what other kinds of methods appear in a list data type, and implement as many as you can.

Unit Testing

This class is pretty large; we expect the test suite to be huge. Here is a partial list what we need to test:

A list is empty on construction.
A list has size 0 on construction.
After $n$ adds to an empty list ($n > 0$), the list is not empty and its size is $n$.
add(i, x) followed by get(i) returns $x$, where $$ is legal.
If one calls addLast() 100 times in succession, with the values 0 through 99, into an empty list, then calling get(i) for all $i \in 0..99$ returns $i$ each time. And size() will return 100.
Alternating addFirst and addLast calls starting at an empty list, for values 1..10, will give the list [9 7 5 3 1 2 4 6 8 10].
If the size is $n$, then after calling removeAt(0) n times, the list is empty and has size 0.
Removing with bad indices does indeed throw; these tests should be run for arbitrary negative indices, and where the index is equal to size, and size+1 and very large out of range positive indices too.
Adding and setting with bad indices needs to fail properly
The get method does not affect the size
set(i, x) followed by get(i) returns $x$, where $i$ is legal.
indexOf anything on an empty list is -1.
indexOf("a") on ["b", "a", "a"] returns 1.
lastIndexOf("a") on ["b", "a", "a"] returns 2.
...and many more ...

Unit tests are shown in a homework assignment.

Summary

We’ve covered:

What lists are
List operations
Uses of lists
Mutable vs. immutable lists
Linked vs. array lists
Implementation techniques