Java Data Structures

You can study data structures in a language-independent way, or you can pick a language to help make the ideas concrete, and enable you to practice. Let’s see how Java does things.

What is a Data Structure?

A data type a set of values together with operations that specify how these values behave. A data structure is a collection of data values, arranged in a certain way, generally for the express purpose of implementing a data type.

Why do we have them? Answer: Different arrangements enable efficient access in different situations.

Knowledge of data structures is indispensible for programmers and computer scientists. You cannot write efficient code “at scale” without this knowledge. Your choice of data structure often has the most impact on the quality and efficiency of your code. It is difficult to overstate the importance of data structures.

Data Types vs. Data Structures

Sometimes, it’s pretty obvious what the data type is and what the data structure is. Data types (behavioral specifications) are things like sequences, stacks, queues, priority queues, sets, dictionaries, graphs, and so on. These are implemented with data structures (physical program entities) like singly-linked chains, doubly-linked chains, arrays, adjacency matrices, and so on.

In practice, things can get kind of messy, with heaps, hashtables, splay trees, tries, and R-trees feeling like they can be either.

But you know what? Even when the distinction is obvious, many folks conflate the terms “data type,” “data structure,” and “data structure implementation.” Sometimes, it’s not worth getting too picky. Following convention, these notes will be a little loose with the terminology at first. Apologies in advance.

Four Basic Java “Data Structures”

There are hundreds of data structures out there! But there are four basic kinds you should know about.

/4datastructures.png

Some details about these four:

KindWhen to useExamples
Array When the size is known in advance and will never grow or shrink. Access by position (index) is most important.
  • A top-ten list
  • The squares in a game of Go or Chess (an array of arrays!)
  • A program’s command line arguments
List When order is important, but the position (index number) is less important than being able to add and remove elements, even in the middle.
  • Things you want to do
  • Possible next moves in a game (ordered by how good they look)
  • Steps in a recipe
  • Search results (ranked by relevance)
Set When order is unimportant. You just have a bunch of elements. You only care whether something is in the set, not where in the set it is.
  • People in a room
  • Cards in a hand
  • Racers in a race (assuming no order at the starting line)
  • During a search, the places you have seen so far
Map When you want to store a particular value for each of a set of things. The things are called keys, and the entries of the map are called key-value pairs.
  • Word frequencies
  • Employees of the Month
  • How far each racer has traveled so far
  • Election results (number of votes for each candidate)
Data Types vs. Data Structures, Again

Technically lists, sets, and maps are data types, not data structures. A data type tells us what objects can do, while data structures tell us how exactly things are arranged, or stored. As we’ll soon see, there are multiple ways to arrange the elements of lists, sets, and maps. A preview:
TypePossible Structures
ListLinkedList, ArrayList
SetHashSet, TreeSet
MapHashMap, TreeMap

Arrays

Java arrays must have their element type and their size specified when they are created. They cannot grow and shrink. The elements are packed together tightly in memory, so it is very efficient to find values by their position, and very slow to find the position given the value (unless the array is sorted, in which case there is a fast way, namely binary search).

// Creating with default values (always, 0, false, or null)
var picks = new int[6];         // picks is [0, 0, 0, 0, 0, 0]
var dice = new Die[5];          // dice is [null, null, null, null, null]

// Creating with supplied values
var friends = new String[]{"Ratty", "Mole", "Badger"};

// Accessing and Updating
friends.length                  // 3
friends[0]                      // "Ratty"
friends[2]                      // "Badger"
friends[2] = "Otter"            // friends is now ["Ratty", "Mole", "Otter"]

// Filling
var rolls = new int[5];         // rolls is [0, 0, 0, 0, 0]
Arrays.fill(rolls, 6);          // rolls is now [6, 6, 6, 6, 6]

// Iterating the usual way
for (var r: rolls) {
    System.out.println(r);
}

// Iterating where you care about the index
for (var i = 0; i < rolls.length; i++) {
  System.out.println("Value at index " + i + " is " + rolls[i]);
}

// Finding (slow)
var words = new String[]{"the", "rat", "ate", "the", "strawberry"};
words.indexOf("rat")                    // 1
words.indexOf("the")                    // 0
words.lastIndexOf("the")                // 3
words.indexOf("banana")                 // -1

// Sorting (always mutates)
Arrays.sort(words)
        // words is now ["ate", "rat", "strawberry", "the", "the"]
Arrays.sort(words, Comparator.reverseOrder())
        // words is now ["the", "the", "strawberry", "rat", "ate"]

// Comparing for equality
var a = new int[]{10, 300, 99, -987}
var b = new int[]{10, 300, 99, -987}
a == b                                  // false
Arrays.equals(a, b)                     // true

int[][] matrix1 = {{5, 13}, {8, 2}}
int[][] matrix2 = {{5, 13}, {8, 2}}
Arrays.equals(matrix1, matrix2)         // false
Arrays.deepEquals(matrix1, matrix2)     // true

// Printing
var a = new int[]{10, 300, 99, -987}
System.out.println(a)                   // prints [I@41cf53f9
System.out.println(Arrays.toString(a))  // prints [10, 300, 99, -987]

// Splitting and Joining
var s = "alice:3:root:f87bZ%s-iq2#"
s.split(":")                            // ["alice", "3", "root", "f87bZ%s-iq2#"]
var b = new String[]{"Ooh", "la", "la"}
String.join("-", b)                     // "Ooh-la-la"

// Mapping and Filtering
var a = new int[]{10, 300, 99, -987}
Arrays.stream(a).map(x -> x * 2).toArray()       // [20, 600, 198, -1974]
Arrays.stream(a).filter(x -> x < 100).toArray()  // [10, 99, -987]

// Summing
Arrays.stream(a).sum()                  // -578
AN IMPORTANT WEIRD THING ABOUT JAVA YOU HAVE TO KNOW

Arrays in Java can contain elements of any type at all, primitive or object. Arrays can contain bytes, shorts, ints, longs, floats, doubles, booleans, and chars. No problem. However, the other collections (Lists, Sets, Maps, etc.) CANNOT CONTAIN PRIMITIVES.

Wat. Right?

So if you need Lists or Sets or Maps, and you want to store primitives, you can’t. You can make use of special objects that “wrap” primitives: these classes are Byte, Short, Integer, Long, Float, Double, Boolean, and Character.

Lists

It turns out there is not just one kind of list in Java. There are many kinds of lists. They can be:

// Create fixed-size, read-only (immutable) lists
List.of(1, 13, 8)               // [1, 13, 8]
List.of()                       // []
Collections.nCopies(5, 3)       // [3, 3, 3, 3, 3]
Collections.singletonList(5)    // [5]

// Create editable lists that can grow and shrink
var a = new ArrayList<Integer>();
var b = new LinkedList<String>();

// Adding to end
a.add(2);                   // a is now [2]
a.addAll(List.of(89, 3));   // a is now [2, 89, 3]

// Inserting at a given position
a.add(1, 144);              // a is now [2, 144, 89, 3]

// Find by position (slow)
a.get(2);                   // 89

// Replace at position
a.set(2, 5);                // a is now [2, 144, 5, 3]

// Replace everything
a.replaceAll(x -> x + 1);   // a is now [3, 145, 6, 4]

// Removing
a.remove(2);                // a is now [3, 145, 4]
a.size();                   // 3
a.removeIf(x -> x > 9)      // a is now [3, 4]
a.clear();                  // a is now []
a.isEmpty();                // true
a.size();                   // 0

// Slicing
var a = List.of(10, 11, 12, 13, 14, 15)
a.subList(1, 5)             // [11, 12, 13, 14]

// Iterating
for (var e : a) {
  // Do something with e
}

// Printing
System.out.println(a);

// Comparing for equality
var a = List.of(2, 2, 5);
var b = Arrays.asList(2, 2, 5);
a == b                      // false
a.equals(b)                 // true

// Mapping and Filtering
a.stream().map(x -> x * 10).collect(Collectors.toList())
                            // [20, 20, 50]
a.stream().filter(x -> x % 2 == 0).collect(Collectors.toList())
                            // [2, 2]

Sets

There are two main types of sets:

If you use Set.of or Set.copyOf to create a set, you won’t know what kind of set you got, and the elements are not guaranteed to be iterated in any specific order.

// Create fixed-size, read-only (immutable) sets
Set.of(1, 13, 8)        // {1, 13, 8}
Set.of()                // {}

// Create editable sets that can grow and shrink
var s = new HashSet<Integer>();
var t = new TreeSet<Integer>();

// Adding
s.add(3)                // s is {3}
s.add(5)                // s is {3, 5}
t.add(5)                // t is {5}
t.add(8)                // t is {5, 8}
t.add(13)               // t is {5, 8, 13}
s.addAll(t)             // s is {3, 5, 8, 13} (set "union")
s.size()                // 4

// Removing
s.remove(5)             // s is {3, 8, 13}
s.remove(55)            // s is still {3, 8, 13} no harm no foul
s.removeAll(t)          // s is {3} (set "difference")
s.size()                // 1
s.isEmpty()             // false
s.clear()               // s is {}
s.isEmpty()             // true

// Membership test
t                       // {5, 8, 13}
t.contains(8)           // true
t.contains(21)          // false

// Iterating
for (var e: t) {
  // Do something with e
}

// Printing
System.out.println(t)   // prints [5, 8, 13]
Exercise: Java sets don’t seem to have non-mutating union, intersection, or different methods. How would you carry this out in a non-mutating fashion (e.g., produce the union of s and t as a brand new set, without modifying s and t)?

Maps

There are two main types of maps:

If you use Map.of, Map.copyOf, or Map.ofEntries to create a map, you won’t know what kind of map you got, but the keys are not guaranteed to be iterated in any specific order.

// Create fixed-size, read-only (immutable) maps
Map.of("Spades", "♤", "Hearts", "♡", "Diamonds", "♢", "Clubs", "♧")
Map.of()

// Create editable maps that can grow and shrink
var m = new HashMap<String, Integer>();
var m2 = new TreeMap<String, Integer>();

// Adding and/or updating
m.put("Alice", 83);          // If Alice is not already a key, add [Alice, 83]
                             // ...but if Alice is there, then update value to 83
m.putIfAbsent("Bob", 29);    // If Bob is not already a key, add [Bob, 29]
                             // ...but if Bob is there, do nothing
m2.put("Carolyne", 70);
m2.put("Bob", 55);
m.putAll(m2)                 // m is now {Carloyne=70, Bob=55, Alice=83}
m.merge("Alice", 1, Integer::sum)
                             // m is now {Carloyne=70, Bob=55, Alice=84}

// Removing
m.remove("Carolyne");        // m is now {Bob=55, Alice=83}

// Lookup by key
m.get("Alice")               // Get corresponding value for Alice if Alice is present,
                             // ...otherwise return null
m.getOrDefault("Bob", 0)     // Get corresponding value for Alice if Alice is present,
                             // ...otherwise return 0

// Iterating
for (var e : m.entrySet()) {
    // Do something with m.getKey() and m.getValue()
}

// Printing
System.out.println(m);       // Prints {Bob=55, Alice=83}

That merge is pretty crazy, and useful! Check out the documentation to see exactly what it does.

More Data Structures

When studying data structures, the differences between types and structures becomes a little more clear. You’ll see types such as:

And data structures such as:

In general terms, not specific to Java or any programming language at all, the major kinds of collections, from the arrangement and uniqueness perspectives, are more or less these:

TypeDescription
SetA group of unordered objects without duplicates
Bag (Multiset)A group of unordered objects allowing duplicates
List (Sequence)A sequence of objects, with a distinguished first object, and in which each object has a unique successor. Various kinds of restricted sequences can be defined, such as stacks, queues, priority queues and deques
Map (Dictionary)Essentially a set of (key,value) pairs with unique keys, supporting an efficient lookup operation to find the value associated with a given key
MultimapEssentially a bag of pairs
Hierarchy (Oriented Tree)A group of objects in which every object except a single distinguished root node has a single parent. You can distinguish between ordered and unordered hierarchies by considering the children of an object to be lists or sets, respectively
Graph (Network)A group of objects in which each object is “related to,” somehow or other, to zero or more members of the group
Exercise: In what sense are sets, lists, and trees actually all just graphs?

The Big Picture: Java Collections

Java has its own philosophy about how you program with collections. This philosophy, with lots of discussion and examples is well documented in the Java Collections trail of the Official Java Tutorial.

Please go through that tutorial trail.

Exercise: Seriously, go through that tutorial trail.