Algorithms
Here it is, from thirty-thousand feet, as they say.
Why Study Algorithms
Studying algorithms helps you develop widely applicable analytical skills—you learn not just how to solve individual problems but to develop recipes for getting answers to many problems.
“A person well-trained in computer science knows
how to deal with algorithms: how to construct them, manipulate them,
understand them, analyze them. This knowledge is preparation for much
more than writing good computer programs; it is a general-purpose mental
tool that will be a definite aid to the understanding of other subjects,
whether they be chemistry, linguistics, or music, etc. The reason for
this may be understood in the following way: It has often been said
that a person does not really understand something until after teaching
it to someone else. Actually, a person does not really understand
something until after teaching it to a computer, i.e., expressing
it as an algorithm . . . An attempt to formalize things as algorithms
leads to a much deeper understanding than if we simply try to comprehend
things in the traditional way.” — Donald Knuth
And:
“Algorithmics is more than a branch of computer science. It is the core of computer science, and, in all fairness, can be said to be relevant to most of science, business, and technology.” — David Harel
Classifying Algorithms
Primary dimensions:
- By domain
- By complexity class
- By underlying data structure
- By problem solving strategy IMPORTANT
Secondary concerns include:
- Is the algorithm sequential or parallel?
- Is the algorithm deterministic or randomized?
- Are you looking for one solution or all solutions?
- Are you looking for any solution or the optimal solution?
- Are you looking for an exact or approximate solution?
- Are you computing
- Whether $f(x)$ is true or not (decision),
- The value of $f(x)$ (functional),
- Any $x$ such that $f(x)=y$ (constraint), or
- The $x$ that minimizes (or maximizes) $f(x)$ (optimization)?
Problem Domain
What kind of a problem are you trying to solve?
- Numeric (esp. super large numbers, floating-point numbers with errors, ...)
- Sorting (putting things in order)
- Searching (finding an element by search key)
- String (pattern matching, compression, sequencing, crypto, ...)
- Partitioning (fair division, ...)
- Network (routing, spanning trees, connectivity, flow, ...)
- Combinatoric (permutations, combinations, subsets, ...)
- Geometric (convex hull, closest pair, bounding sphere, ...)
- AI (computer vision, machine learning, training, ...)
Complexity Class
Let $n$ be the input size and $k$ be a fixed constant.
- Constant ($k$)
- Logarithmic ($\log{n}$)
- Linear ($n$)
- Linearithmic ($n \log{n}$)
- Polynomial ($n^{k} \;\mathrm{where}\:k > 1$)
- Exponential ($k^n$)
- Superexponential
Underlying Data Structure
There is a huge list of data structures at Wikipedia, but here are some of the more common ones:
- Arrays
- Sorted Arrays
- Linked Lists (singly-linked, doubly-linked, with sentinels, ...)
- Hashtables (with several collision resolution strategies)
- Unbalanced Trees
- Balanced Trees
- Skip Lists
- Heaps (min, max, binary, Fibonacci)
Strategies
This isn’t a complete list, but it covers some of the more famous ones:
- Brute Force. Enumerate all possible solutions, unintelligently, and try them all.
- Divide and Conquer. Break down a problem into multiple independent
subproblems, solve the subproblems (recursively), and combine those solutions into a solution
for the original problem.
- Decrease and Conquer. Like Divide and Conquer but break down in to one subproblem.
- Transform and Conquer. Reformulate problem into an equivalent problem in another domain.
- Input Enhancement. Precalculate results for certain inputs, or create
a cache of partial results, so that the actual run is faster.
- Prestructuring. Pre-arrange data for faster processing.
- Dynamic Programming. Solve an optimization problem by breaking it down into
multiple overlapping subproblems, solving the
subproblems (recursively), and combining those
solutions into a solution for the original problem.
- The Greedy Method. When searching for solution, take the most promising next
step and never backtrack.
- Backtracking. Systematically generate possible solutions to a problem
sometimes having to back up when realizing your partially generated candidate
can’t possibly be extended to a real solution.
- Branch and Bound. Backtrack in an optimization setting.
- Hill Climbing. Solve (or find an approximate solution to) an optimization
problem by generating candidate solutions that are (hopefully)
improvements over the previous candidate.
- Particle Swarm. Solve an optimization problem with a bunch of
decentralized particles all searching for a solution with something that looks
like its has a collective organization (e.g. ant colonies, bird flocks, animal
herds).
- Las Vegas. Generate possible solutions non-deterministically in a way
that always produces the correct answer but makes no guarantees on how long it
will run or how much space it will need (in the worst case).
- Monte Carlo. Generate possible solutions non-deterministically in
a way that has has time and space guarantees but has a small probability
of giving the wrong answer. The probability of error can be reduced by running
the algorithm longer.
Learning and Applying the Strategies
There are so many strategies that ultimately, you just have to gain a lot of experience seeing the various strategies applied in context.
But we can take advantage of those that came before us and provided nice little overviews of various strategies with specific problems and “when to apply it” guides. One very useful guide is this one by Himanshu Singour.
Exercise: Browse the article now. After a couple months of formal study of algorithms, reread the article in detail.
Recall Practice
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
- What are the four primary dimensions of algorithm classification?
Domain, complexity class, underlying data structure, and problem solving strategy.
- What are some secondary concerns when classifying algorithms?
Whether the algorithm is sequential or parallel, deterministic or randomized, whether it finds one solution or all solutions, whether it finds any solution or the optimal solution, whether it finds an exact or approximate solution.
- What is a decision problem?
One where the algorithm determines whether a certain condition is true or false.
- What is a functional problem?
One where the algorithm computes a value for a function.
- What is a constraint problem?
One where the algorithm finds any input that satisfies a given condition.
- What is an optimization problem?
One where the algorithm finds the input that minimizes or maximizes a function.
Summary
We’ve covered:
- Why study algorithms
- Dimensions of algorithm classification
- A big list of algorithm strategies