Sets and Dictionaries
These are important.
Sets
A set is an unordered collection of (unique) items. Examples:
Operations
Here’s a bunch, including mutators and non-mutators:
- s.add(e)
- Add element e to the set s
- s.remove(e)
- Remove element e from the set s
- s.contains(e)
- Return whether e is a member of s
- s.size()
- Return the number of elements in s
- s.isEmpty()
- Return whether there are no elements in s
- s.addAll(t)
- Add all the elements of set t to set s
- s.removeAll(t)
- Remove all the elements of t from s
- s.retainAll(t)
- Remove all the elements from s that are not in t
- s.union(t)
- Return a new set which contains all the elements of
set s and all the elements of set t, and no others
- s.intersect(t)
- Return a new set which contains all and only those
elements in both s and t
- s.minus(t)
- Return a new set which contains all and only those
elements in s but not in t
- s.symmetricMinus(t)
- Return a new set which contains all and only those
elements in either set s or set t but not both
Dictionaries
A dictionary (also called a map, associative array, hash, or lookup table) is a collection of key-value pairs where every key is unique. We don’t care as much about the pairs as we do about looking up the value associated with a given key.
Example:
| |
www.nato.int | 152.152.96.1 |
cr.yp.to | 131.193.178.175 |
losangeles.citysearch.com | 209.104.35.15 |
www.cl.cam.ac.uk | 128.232.0.20 |
mit.edu | 18.7.22.69 |
www.altan.ie | 212.108.64.74 |
securitysearch.net | 64.94.136.5 |
linuxassembly.org | 64.202.189.170 |
regular-expressions.info | 66.39.67.31 |
jpl.nasa.gov | 137.78.160.180 |
groups.google.com | 216.239.57.147 |
script.aculo.us | 86.59.5.67 |
Say: www.nato.int maps to 152.152.96.1, or
www.nato.int is bound to 152.152.96.1.
Operations
Here are some sample operations, both mutators and non-mutators:
- m.put(k,v)
- Associate k with v (will either add a new entry or replace the existing entry for k)
- m.putIfAbsent(k,v)
- If there is no entry with key k, add a new entry associating k with v
- m.replace(k,v)
- If there an entry with key k, replace its value with v
- m.merge(k,v,f)
- If there an entry with key k and value oldv, replace its value with f(oldv, v)
- m.get(k)
- Return the value associated with key k (failure can handled in many different ways)
- m.getOrDefault(k)
- Return the value associated with key k, or return d if key k is not present
- m.containsKey(k)
- Return whether there’s a key-value pair in map m with key k
- m.containsValue(v)
- Return whether there’s a key-value pair in map m with value v
- m.putAll(otherMap)
- Add all the key-value pairs of otherMap to m
- m.keySet()
- Return the set of keys of m
- m.values()
- Return the values of m in some collection (not a set, since they
don’t have to be unique)
- m.entrySet()
- Return the set of key-value pairs in m
- m.remove(k)
- Remove the key-value pair with key k from map m
- m.clear()
- Remove all the key-value pairs from m
- m.size()
- Return the number of key-value pairs in m
- m.isEmpty()
- Return whether there are zero pairs in map m
Sets vs. Dictionaries
Note that a set is just a kind of dictionary in which the value part is ignored.
Representations
There are dozens of ways to implement these.
- Association Lists
- Unsorted association lists
- Sorted association lists
- Search Trees
- General Search Trees
- (a,b) Trees
- B Trees and B+ Trees
- Binary Search Trees
- Unbalanced Binary Search Trees
- Self-Balancing Binary Search Trees
- AVL Trees
- Scapegoat Trees
- Red-Black Trees
- AA Trees
- Splay Trees
- Cartesian Trees
- Skip Lists
- Tries and Radix (Patricia) Trees
- Hash Tables
- Bitsets
- Judy Arrays
- Bloom Filters
No one representation is best for all situations. You need to take into account:
- Will the collection be loaded once and only read and
never written?
- Will insertions and deletions be frequent or uncommon,
compared to lookups?
- Will the collection contain only a very small number
of elements or can it be huge?
- Are there any restrictions on the keys? For example are
keys only strings? Or only integers?
- Will the keys come from an type that can be ordered?
- Is the collection subject to algorithmic complexity
attacks?
It is worth noting:
If your keys are integers in the range 0..N, use a PLAIN OLD ARRAY
As you go over each data structure, make a little fact sheet such as the following:
Unsorted Associative Array or List |
Basic idea: Key-value pairs are stored in a simple
unsorted array or linked list
Insertion: Θ(n) if you check for duplicate keys, Θ(1) otherwise
Deletion: Θ(n) since you have to find the item first
Lookup: Θ(n) since the whole list must be scanned
Key type restrictions: None at all!
Good for: Small collections only |
Sorted Associative Array |
Basic idea: Key-value pairs are stored in an array
sorted by key
Insertion: Θ(n) in general due to sifting; Θ(1) if adding at the end
Deletion: Θ(n) in general due to sifting; Θ(1) if from the end
Lookup: Θ(log n) using binary search
Key type restrictions: Must be ordered
Good for: Memory-constrained environments; read-only collections
or those filled only in ascending order |
Exercise: Complete fact sheets for as many of the other implementations as you can.
Summary
We’ve covered:
- What sets are
- Operations on sets
- What dictionaries are
- Operations on dictionaries
- How sets and dictionaries are alike
- Representations of sets and dictionaries, without details
- Ideas of when to use different representations