A data type a set of values together with operations that specify how these values behave. A data structure is a collection of data values, arranged in a certain way, generally for the express purpose of implementing a data type.
Why do we have them? Answer: Different arrangements enable efficient access in different situations.
Knowledge of data structures is indispensible for programmers and computer scientists. You cannot write efficient code “at scale” without this knowledge. Your choice of data structure often has the most impact on the quality and efficiency of your code. It is difficult to overstate the importance of data structures.
Data Types vs. Data StructuresSometimes, it’s pretty obvious what the data type is and what the data structure is. Data types (behavioral specifications) are things like sequences, stacks, queues, priority queues, sets, dictionaries, graphs, and so on. These are implemented with data structures (physical program entities) like singly-linked chains, doubly-linked chains, arrays, adjacency matrices, and so on.
In practice, things can get kind of messy, with heaps, hashtables, splay trees, tries, and R-trees feeling like they can be either.
But you know what? Even when the distinction is obvious, many folks conflate the terms “data type,” “data structure,” and “data structure implementation.” Sometimes, it’s not worth getting too picky. Following convention, these notes will be a little loose with the terminology at first. Apologies in advance.
There are hundreds of data structures out there! But there are four basic kinds you should know about.
Some details about these four:
Kind | When to use | Examples |
---|---|---|
Array | When the size is known in advance and will never grow or shrink. Access by position (index) is most important. |
|
List | When order is important, but the position (index number) is less important than being able to add and remove elements, even in the middle. |
|
Set | When order is unimportant. You just have a bunch of elements. You only care whether something is in the set, not where in the set it is. |
|
Map | When you want to store a particular value for each of a set of things. The things are called keys, and the entries of the map are called key-value pairs. |
|
Data Types vs. Data Structures, Again
Technically lists, sets, and maps are data types, not data structures. A data type tells us what objects can do, while data structures tell us how exactly things are arranged, or stored. As we’ll soon see, there are multiple ways to arrange the elements of lists, sets, and maps. A preview:
Type Possible Structures List LinkedList, ArrayList Set HashSet, TreeSet Map HashMap, TreeMap
Java arrays must have their element type and their size specified when they are created. They cannot grow and shrink. The elements are packed together tightly in memory, so it is very efficient to find values by their position, and very slow to find the position given the value (unless the array is sorted, in which case there is a fast way, namely binary search).
// Creating with default values (always, 0, false, or null) var picks = new int[6]; // picks is [0, 0, 0, 0, 0, 0] var dice = new Die[5]; // dice is [null, null, null, null, null] // Creating with supplied values var friends = new String[]{"Ratty", "Mole", "Badger"}; // Accessing and Updating friends.length // 3 friends[0] // "Ratty" friends[2] // "Badger" friends[2] = "Otter" // friends is now ["Ratty", "Mole", "Otter"] // Filling var rolls = new int[5]; // rolls is [0, 0, 0, 0, 0] Arrays.fill(rolls, 6); // rolls is now [6, 6, 6, 6, 6] // Iterating the usual way for (var r: rolls) { System.out.println(r); } // Iterating where you care about the index for (var i = 0; i < rolls.length; i++) { System.out.println("Value at index " + i + " is " + rolls[i]); } // Finding (slow) var words = new String[]{"the", "rat", "ate", "the", "strawberry"}; words.indexOf("rat") // 1 words.indexOf("the") // 0 words.lastIndexOf("the") // 3 words.indexOf("banana") // -1 // Sorting (always mutates) Arrays.sort(words) // words is now ["ate", "rat", "strawberry", "the", "the"] Arrays.sort(words, Comparator.reverseOrder()) // words is now ["the", "the", "strawberry", "rat", "ate"] // Comparing for equality var a = new int[]{10, 300, 99, -987} var b = new int[]{10, 300, 99, -987} a == b // false Arrays.equals(a, b) // true int[][] matrix1 = {{5, 13}, {8, 2}} int[][] matrix2 = {{5, 13}, {8, 2}} Arrays.equals(matrix1, matrix2) // false Arrays.deepEquals(matrix1, matrix2) // true // Printing var a = new int[]{10, 300, 99, -987} System.out.println(a) // prints [I@41cf53f9 System.out.println(Arrays.toString(a)) // prints [10, 300, 99, -987] // Splitting and Joining var s = "alice:3:root:f87bZ%s-iq2#" s.split(":") // ["alice", "3", "root", "f87bZ%s-iq2#"] var b = new String[]{"Ooh", "la", "la"} String.join("-", b) // "Ooh-la-la" // Mapping and Filtering var a = new int[]{10, 300, 99, -987} Arrays.stream(a).map(x -> x * 2).toArray() // [20, 600, 198, -1974] Arrays.stream(a).filter(x -> x < 100).toArray() // [10, 99, -987] // Summing Arrays.stream(a).sum() // -578
AN IMPORTANT WEIRD THING ABOUT JAVA YOU HAVE TO KNOW
Arrays in Java can contain elements of any type at all, primitive or object. Arrays can contain bytes, shorts, ints, longs, floats, doubles, booleans, and chars. No problem. However, the other collections (Lists, Sets, Maps, etc.) CANNOT CONTAIN PRIMITIVES.
Wat. Right?
So if you need Lists or Sets or Maps, and you want to store primitives, you can’t. You can make use of special objects that “wrap” primitives: these classes areByte
,Short
,Integer
,Long
,Float
,Double
,Boolean
, andCharacter
.
It turns out there is not just one kind of list in Java. There are many kinds of lists. They can be:
// Create fixed-size, read-only (immutable) lists List.of(1, 13, 8) // [1, 13, 8] List.of() // [] Collections.nCopies(5, 3) // [3, 3, 3, 3, 3] Collections.singletonList(5) // [5] // Create editable lists that can grow and shrink var a = new ArrayList<Integer>(); var b = new LinkedList<String>(); // Adding to end a.add(2); // a is now [2] a.addAll(List.of(89, 3)); // a is now [2, 89, 3] // Inserting at a given position a.add(1, 144); // a is now [2, 144, 89, 3] // Find by position (slow) a.get(2); // 89 // Replace at position a.set(2, 5); // a is now [2, 144, 5, 3] // Replace everything a.replaceAll(x -> x + 1); // a is now [3, 145, 6, 4] // Removing a.remove(2); // a is now [3, 145, 4] a.size(); // 3 a.removeIf(x -> x > 9) // a is now [3, 4] a.clear(); // a is now [] a.isEmpty(); // true a.size(); // 0 // Slicing var a = List.of(10, 11, 12, 13, 14, 15) a.subList(1, 5) // [11, 12, 13, 14] // Iterating for (var e : a) { // Do something with e } // Printing System.out.println(a); // Comparing for equality var a = List.of(2, 2, 5); var b = Arrays.asList(2, 2, 5); a == b // false a.equals(b) // true // Mapping and Filtering a.stream().map(x -> x * 10).collect(Collectors.toList()) // [20, 20, 50] a.stream().filter(x -> x % 2 == 0).collect(Collectors.toList()) // [2, 2]
There are two main types of sets:
If you use Set.of
or Set.copyOf
to create a set, you won’t know what kind of set you got, and the elements are not guaranteed to be iterated in any specific order.
// Create fixed-size, read-only (immutable) sets Set.of(1, 13, 8) // {1, 13, 8} Set.of() // {} // Create editable sets that can grow and shrink var s = new HashSet<Integer>(); var t = new TreeSet<Integer>(); // Adding s.add(3) // s is {3} s.add(5) // s is {3, 5} t.add(5) // t is {5} t.add(8) // t is {5, 8} t.add(13) // t is {5, 8, 13} s.addAll(t) // s is {3, 5, 8, 13} (set "union") s.size() // 4 // Removing s.remove(5) // s is {3, 8, 13} s.remove(55) // s is still {3, 8, 13} no harm no foul s.removeAll(t) // s is {3} (set "difference") s.size() // 1 s.isEmpty() // false s.clear() // s is {} s.isEmpty() // true // Membership test t // {5, 8, 13} t.contains(8) // true t.contains(21) // false // Iterating for (var e: t) { // Do something with e } // Printing System.out.println(t) // prints [5, 8, 13]
There are two main types of maps:
If you use Map.of
, Map.copyOf
, or Map.ofEntries
to create a map, you won’t know what kind of map you got, but the keys are not guaranteed to be iterated in any specific order.
// Create fixed-size, read-only (immutable) maps Map.of("Spades", "♤", "Hearts", "♡", "Diamonds", "♢", "Clubs", "♧") Map.of() // Create editable maps that can grow and shrink var m = new HashMap<String, Integer>(); var m2 = new TreeMap<String, Integer>(); // Adding and/or updating m.put("Alice", 83); // If Alice is not already a key, add [Alice, 83] // ...but if Alice is there, then update value to 83 m.putIfAbsent("Bob", 29); // If Bob is not already a key, add [Bob, 29] // ...but if Bob is there, do nothing m2.put("Carolyne", 70); m2.put("Bob", 55); m.putAll(m2) // m is now {Carloyne=70, Bob=55, Alice=83} m.merge("Alice", 1, Integer::sum) // m is now {Carloyne=70, Bob=55, Alice=84} // Removing m.remove("Carolyne"); // m is now {Bob=55, Alice=83} // Lookup by key m.get("Alice") // Get corresponding value for Alice if Alice is present, // ...otherwise return null m.getOrDefault("Bob", 0) // Get corresponding value for Alice if Alice is present, // ...otherwise return 0 // Iterating for (var e : m.entrySet()) { // Do something with m.getKey() and m.getValue() } // Printing System.out.println(m); // Prints {Bob=55, Alice=83}
That merge
is pretty crazy, and useful! Check out the documentation to see exactly what it does.
When studying data structures, the differences between types and structures becomes a little more clear. You’ll see types such as:
And data structures such as:
In general terms, not specific to Java or any programming language at all, the major kinds of collections, from the arrangement and uniqueness perspectives, are more or less these:
Type | Description |
---|---|
Set | A group of unordered objects without duplicates |
Bag (Multiset) | A group of unordered objects allowing duplicates |
List (Sequence) | A sequence of objects, with a distinguished first object, and in which each object has a unique successor. Various kinds of restricted sequences can be defined, such as stacks, queues, priority queues and deques |
Map (Dictionary) | Essentially a set of (key,value) pairs with unique keys, supporting an efficient lookup operation to find the value associated with a given key |
Multimap | Essentially a bag of pairs |
Hierarchy (Oriented Tree) | A group of objects in which every object except a single distinguished root node has a single parent. You can distinguish between ordered and unordered hierarchies by considering the children of an object to be lists or sets, respectively |
Graph (Network) | A group of objects in which each object is “related to,” somehow or other, to zero or more members of the group |
Java has its own philosophy about how you program with collections. This philosophy, with lots of discussion and examples is well documented in the Java Collections trail of the Official Java Tutorial.
Please go through that tutorial trail.