A majority element of an $n$-element list is an element that appears more than $\frac{n}{2}$ times. How can we write a function that returns a majority element if present?
Given an array [
$a_0, a_1, ..., a_{n-1}$]
, you can:
def majority(a): for x in a: if a.count(x) > len(a) // 2: return x return None
Bah! That’s $\Theta(n^2)$ time. BUT it’s constant space (okay, okay, it’s technically logarithmic space, because it takes $\log n$ bits to represent the number $n$) and a good solution for very, very small lists.
Rather than counting occurrences for all the values, let’s just count occurrences for the majority elements in each half of the list. And as a bonus: if each half has the same majority element, then that’s our majority element for the whole list.
def majority(a): if len(a) == 0: return None if len(a) == 1: return a[0] half = len(a) // 2 left = majority(a[0:half]) right = majority(a[half:]) if left == right: return left if a.count(left) > half: return left if a.count(right) > half: return right return None
You can do the analysis and get $\Theta(n \log n)$. We pay for the asymptotic speedup with recursion. So we expect it to be slower that the naïve approach for tiny lists but much faster for larger lists.
Got a really good constant-time dictionary (like a hashmap with few if any collisions)? You can get linear time (but with some overhead). As soon as a count exceeds $\frac{n}{2}$, you’ve got your majority element.
from collections import Counter def majority(a): counts = Counter() for x in a: counts[x] += 1 if counts[x] > len(a) // 2: return value return None
Wow! Moore’s voting algorithm. Linear time and constant (okay, okay, logarithmic) space! Check this out:
def majority(a): count = 0 for x in a: if count == 0: candidate = x; if x == candidate: count += 1 else: count -= 1 if a and a.count(candidate) > len(a) // 2: return candidate return None
How did anyone figure that out?
Let’s write a function to find the value of $n!$.
def factorial(n): return 1 if n <= 1 else n * factorial(n-1)
What is wrong with this? Linear space! (not taking into account the size of the numbers, which can get large...)
f(5) = 5 * f(4) = 5 * (4 * f(3)) = 5 * (4 * (3 * f(2))) = 5 * (4 * (3 * (2 * f(1)))) -- note space for buffered results! = 5 * (4 * (3 * (2 * 1))) = 5 * (4 * (3 * 2)) = 5 * (4 * 6) = 5 * 24 = 120
def factorial(n, a=1): return a if n <= 1 else factorial(n-1, a*n)
f(5) = f(5, 1) = f(4, 5) = f(3, 20) = f(2, 60) = f(1, 120)
No space problems now!
You’re probably thinking “why not just use mutable variables?” Well some people despise mutability very much, so it’s good to see the recursive versions. Are you ready for the simple iterative approach? Here you go:
def factorial(n): result = 1 for x in range(1, n+1): result *= x return result
Oh that famous sequence... $$ F(n) = \begin{cases} n & n \leq 1\\ F(n-1) + F(n-2) & \mathrm{otherwise} \end{cases} $$
Thinking of implementing the definition directly. DON’T DO IT. What happens if you do?
F(5) = F(4) + F(3) = (F(3) + F(2)) + F(3) = ((F(2) + F(1)) + F(2)) + F(3) = (((F(1) + F(0)) + F(1)) + F(2)) + F(3) = (((1 + F(0)) + F(1)) + F(2)) + F(3) = (((1 + 0) + F(1)) + F(2)) + F(3) = ((1 + F(1)) + F(2)) + F(3) = ((1 + 1) + F(2)) + F(3) = (2 + F(2)) + F(3) = (2 + (F(1) + F(0))) + F(3) = (2 + (1 + F(0))) + F(3) = (2 + (1 + 0)) + F(3) = (2 + 1) + F(3) = 3 + F(3) = 3 + (F(2) + F(1)) = 3 + ((F(1) + F(0)) + F(1)) = 3 + ((1 + F(0)) + F(1)) = 3 + ((1 + 0) + F(1)) = 3 + (1 + F(1)) = 3 + (1 + 1) = 3 + 2 = 5
NOOOOOOO. Care to compute $F(40)$? That will make over 331 million recursive calls. And it takes over a sextillion calls (specifically 1,146,295,688,027,634,168,201 calls) to do $F(100)$. This algorithm sucks.
The previous algorithm was horrible because we repeated so much work! To avoid repeating the work, we remember, or memoize, values as we compute them, then just reuse these remembered (cached) values whenever we need them again:
F(5) = F(4) + F(3) = (F(3) + F(2)) + F(3) = ((F(2) + F(1)) + F(2)) + F(3) = (((F(1) + F(0)) + F(1)) + F(2)) + F(3) = (((1 + 0) + F(1)) + F(2)) + F(3) = ((1 + F(1)) + F(2)) + F(3) -- here we record that F(2) = 1 = ((1 + 1) + F(2)) + F(3) = (2 + F(2)) + F(3) -- here we record that F(3) = 2 = (2 + 1) + F(3) -- here we just use the "cached" value of F(2) = 3 + F(3) -- here we record that F(4) = 3 = 3 + 2 -- use the cached value of F(3) = 5
A quick and dirty way to save up values as we compute them is to use a global variable:
fib_cache = {} # Ick, ick, ick, ugly global variable. def fib(n): if n in fib_cache: # Did we compute this already? return fib_cache[n] # If so, use it again if n <= 1: fib_cache[n] = n # Save (cache) values just before returning return n result = fib(n-1) + fib(n-2) fib_cache[n] = result return result
But you should avoid using global variables whenever possible. Fortunately, Python has some amazing magical things called decoarators. Don’t worry about the mechanism right now—it’s enough to just know it’s possible. Here is how the pros do things:
def memoized(f): cache = {} def wrapper(*args): if args in cache: return cache[args] cache[args] = f(*args) return cache[args] return wrapper @memoized def fib(n): return n if n <= 1 else fib(n-1) + fib(n-2)
Tail recursion works wonders too:
def fib(n, a=0, b=1): return a if n == 0 else fib(n-1, b, a+b)
F(5) = F(5, 0, 1) = F(4, 1, 1) = F(3, 1, 2) = F(2, 2, 3) = F(1, 3, 5) = F(0, 5, 8) = 5
Do you like mutable variables? Yeah you can keep track of the “last two values” and count your way up. I hope you have a language with parallel assignment, though:
def fib(n): a, b = 0, 1 for i in range(0, n): a, b = b, a + b return a
How can we systematically generate all the permutations of a collection? (Refresher: the permutations of $\langle r, a, t \rangle$ are: $\{\langle r, a, t \rangle$, $\langle r, t, a \rangle$, $\langle a, r, t \rangle$, $\langle a, t, r \rangle$, $\langle t, a, r \rangle$, $\langle t, r, a \rangle\}$).
We’ll just let Robert Sedgewick teach us permutations.