Algorithms

Here it is, from thirty-thousand feet, as they say.

Why Study Algorithms

Studying algorithms helps you develop widely applicable analytical skills—you learn not just how to solve individual problems but to develop recipes for getting answers to many problems.

“A person well-trained in computer science knows how to deal with algorithms: how to construct them, manipulate them, understand them, analyze them. This knowledge is preparation for much more than writing good computer programs; it is a general-purpose mental tool that will be a definite aid to the understanding of other subjects, whether they be chemistry, linguistics, or music, etc. The reason for this may be understood in the following way: It has often been said that a person does not really understand something until after teaching it to someone else. Actually, a person does not really understand something until after teaching it to a computer, i.e., expressing it as an algorithm . . . An attempt to formalize things as algorithms leads to a much deeper understanding than if we simply try to comprehend things in the traditional way.” — Donald Knuth

And:

“Algorithmics is more than a branch of computer science. It is the core of computer science, and, in all fairness, can be said to be relevant to most of science, business, and technology.” — David Harel

Classifying Algorithms

Primary dimensions:

By domain
By complexity class
By underlying data structure
By problem solving strategy IMPORTANT

Secondary concerns include:

Is the algorithm sequential or parallel?
Is the algorithm deterministic or randomized?
Are you looking for one solution or all solutions?
Are you looking for any solution or the optimal solution?
Are you looking for an exact or approximate solution?
Are you computing
- Whether $f(x)$ is true or not (decision),
- The value of $f(x)$ (functional),
- Any $x$ such that $f(x)=y$ (constraint), or
- The $x$ that minimizes (or maximizes) $f(x)$ (optimization)?

Problem Domain

What kind of a problem are you trying to solve?

Numeric (esp. super large numbers, floating-point numbers with errors, ...)
Sorting (putting things in order)
Searching (finding an element by search key)
String (pattern matching, compression, sequencing, crypto, ...)
Partitioning (fair division, ...)
Network (routing, spanning trees, connectivity, flow, ...)
Combinatoric (permutations, combinations, subsets, ...)
Geometric (convex hull, closest pair, bounding sphere, ...)
AI (computer vision, machine learning, training, ...)

Complexity Class

Let $n$ be the input size and $k$ be a fixed constant.

Constant ($k$)
Logarithmic ($\log{n}$)
Linear ($n$)
Linearithmic ($n \log{n}$)
Polynomial ($n^{k} \;\mathrm{where}\:k > 1$)
Exponential ($k^n$)
Superexponential

Underlying Data Structure

There is a huge list of data structures at Wikipedia, but here are some of the more common ones:

Arrays
Sorted Arrays
Linked Lists (singly-linked, doubly-linked, with sentinels, ...)
Hashtables (with several collision resolution strategies)
Unbalanced Trees
Balanced Trees
Skip Lists
Heaps (min, max, binary, Fibonacci)

Strategies

This isn’t a complete list, but it covers some of the more famous ones:

Brute Force. Enumerate all possible solutions, unintelligently, and try them all.
Divide and Conquer. Break down a problem into multiple independent subproblems, solve the subproblems (recursively), and combine those solutions into a solution for the original problem.
Decrease and Conquer. Like Divide and Conquer but break down in to one subproblem.
Transform and Conquer. Reformulate problem into an equivalent problem in another domain.
Input Enhancement. Precalculate results for certain inputs, or create a cache of partial results, so that the actual run is faster.
Prestructuring. Pre-arrange data for faster processing.
Dynamic Programming. Solve an optimization problem by breaking it down into multiple overlapping subproblems, solving the subproblems (recursively), and combining those solutions into a solution for the original problem.
The Greedy Method. When searching for solution, take the most promising next step and never backtrack.
Backtracking. Systematically generate possible solutions to a problem sometimes having to back up when realizing your partially generated candidate can’t possibly be extended to a real solution.
Branch and Bound. Backtrack in an optimization setting.
Hill Climbing. Solve (or find an approximate solution to) an optimization problem by generating candidate solutions that are (hopefully) improvements over the previous candidate.
Particle Swarm. Solve an optimization problem with a bunch of decentralized particles all searching for a solution with something that looks like its has a collective organization (e.g. ant colonies, bird flocks, animal herds).
Las Vegas. Generate possible solutions non-deterministically in a way that always produces the correct answer but makes no guarantees on how long it will run or how much space it will need (in the worst case).
Monte Carlo. Generate possible solutions non-deterministically in a way that has has time and space guarantees but has a small probability of giving the wrong answer. The probability of error can be reduced by running the algorithm longer.

Learning and Applying the Strategies

There are so many strategies that ultimately, you just have to gain a lot of experience seeing the various strategies applied in context.

But we can take advantage of those that came before us and provided nice little overviews of various strategies with specific problems and “when to apply it” guides. One very useful guide is this one by Himanshu Singour.

Exercise: Browse the article now. After a couple months of formal study of algorithms, reread the article in detail.

Recall Practice

Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.

What are the four primary dimensions of algorithm classification?
Domain, complexity class, underlying data structure, and problem solving strategy.
What are some secondary concerns when classifying algorithms?
Whether the algorithm is sequential or parallel, deterministic or randomized, whether it finds one solution or all solutions, whether it finds any solution or the optimal solution, whether it finds an exact or approximate solution.
What is a decision problem?
One where the algorithm determines whether a certain condition is true or false.
What is a functional problem?
One where the algorithm computes a value for a function.
What is a constraint problem?
One where the algorithm finds any input that satisfies a given condition.
What is an optimization problem?
One where the algorithm finds the input that minimizes or maximizes a function.

Summary

We’ve covered:

Why study algorithms
Dimensions of algorithm classification
A big list of algorithm strategies