Data Structures and Algorithms

Of the many subfields within computer science, algorithms and data structures may be the most fundamental—it seems to characterize computer science like perhaps no other. What is involved in the study of algorithms and data structures?

Unit Goals

We wish to get a sense of what is meant by the terms “algorithm” and “data structure,” why the associated topics are so important in computer science, and what sorts of things make up the study of these fields. Today will just be a preview of what is to come.

Importance

Welcome to the study of data structures and algorithms, a discipline within Computer Science.

A while ago, the ACM created a taxonomy of Computer Science, identifying eighteen knowledge areas. They are presented here for reference only—please don’t try to memorize this table! In the breakdown, Data Structures and Algorithms appears as a single, tiny topic:

Knowledge AreaCodeSome Topics
Algorithms and ComplexityALAlgorithmic Analysis • Algorithmic Strategies • Data Structures and Algorithms • Automata • Computability • Computational Complexity
Architecture and OrganizationARDigital Logic and Digital Systems • Machine-level Data Representation • Assembly-Level Machine Organization • Memory System Organization and Architecture • Interfacing and Communication • Functional Organization • Multiprocessing and Alternative Architectures • Performance Enhancements
Computational ScienceCNModeling and Simulation • Processing • Interactive Visualization • Data, Information and Knowledge • Numeric Analysis • Symbolic Computation • Mathematical Modeling • High-Performance Computing
Discrete StructuresDSLogic • Sets, Relations, and Functions • Proof Techniques • Basics of Counting • Graphs and Trees • Discrete Probability
Graphics and VisualizationGVMedia Applications • Digitization • Color Models • Rendering • Modeling • Animation • Visualization
Human-Computer InteractionHCIInteractive Systems • Testing • Collaboration and Communication • Statistical Methods • Human Factors • Mixed, Augmented, and Virtual Reality
Information Assurance and SecurityIASDefensive Programming • Threats and Attacks • Network Security • Cryptography • Web Security • Platform Security • Policy and Governance • Digital Forensics • Secure Software Engineering
Information ManagementIMDatabase Systems • Data Modeling • Indexing • Key-Value, Document, Relational, and Graph Databases • Query Languages • Transaction Processing • Distributed Databases • Physical Database Design • Data Mining • Information Storage and Retrieval • Multimedia Systems
Intelligent SystemsISKnowledge Representation and Reasoning • Search • Machine Learning • Reasoning Under Uncertainty • Agents • Natural Language Processing • Robotics • Speech Recognition and Synthesis • Perception and Computer Vision
Networking and CommunicationNCNetworked Applications • Reliable Data Delivery • Routing and Forwarding • Local and Wide Area Networks • Resource Allocation • Mobility • Social Networking
Operating SystemsOSOperating System Organization • Concurrency • Scheduling and Dispatch • Memory Management • Security and Protection • Virtual Machines • Device Management • File Systems • Realtime and Embedded Systems • Fault Tolerance • System Performance and Evaluation
Platform-Based DevelopmentPBDWeb Platforms • Mobile Platforms • Industrial Platforms • Game Platforms
Parallel and Distributed ComputingPDParallel Decomposition • Communication and Coordination • Parallel Algorithms, Analysis, and Programming • Parallel Architecture • Parallel Performance • Distributed Systems • Cloud Computing • Formal Models and Semantics
Programming LanguagesPLObject Oriented Programming • Functional Programming • Event-Driven and Reactive Programming • Type Systems • Program Representation • Language Translation and Execution • Syntax Analysis • Semantic Analysis • Code Generation • Runtime Systems • Static Analysis • Concurrency and Parallelism • Type Systems • Formal Semantics • Language Pragmatics • Logic Programming
Software Development FundamentalsSDFAlgorithms and Design • Fundamental Programming Concepts • Fundamental Data Structures • Development Methods
Software EngineeringSESoftware Processes • Project Management • Tools and Environments • Requirements Engineering • Software Design • Software Construction • Software Verification and Validation • Software Evolution • Software Reliability • Formal Methods
Systems FundamentalsSFComputational Paradigms • Cross-Layer Communications • State and State Machines • Parallelism • Evaluation • Resource Allocation and Scheduling • Proximity • Virtualization and Isolation • Reliability through Redundancy • Quantitative Evaluation
Social Issues and Professional PracticeSPSocial Context • Analytic Tools • Professional Ethics • Intellectual Property • Privacy and Civil Liberties • Professional Communication • Sustainability • History • Economics of Computing • Security Policies, Law, and Crime

But don’t be fooled. Data Structures and Algorithms is not “just another topic.” It is one of the most central and important topics in all of Computer Science. Knowledge of data structures and algorithms is essential to gaining any sort of proficiency in nearly every one of the other topics in this table!

Exercise: Wait, that table was kind of neat, right? Computer Science has something for everyone! Which of the topics above sounds like it might interest you?

Why Must They Go Together?

In the table above, data structures and algorithms were mentioned together. Why?

Expressed more powerfully:

Bob Barton, the main designer of the B5000 and a professor at Utah had said in one of his talks a few days earlier: "The basic principle of recursive design is to make the parts have the same power as the whole." For the first time I thought of the whole as the entire computer and wondered why anyone would want to divide it up into weaker things called data structures and procedures. Why not divide it up into little computers, as time sharing was starting to? But not in dozens. Why not thousands of them, each simulating a useful structure?

— Alan Kay, The Early History of Smalltalk
Exercise: What is Alan Kay actually saying here? Think about it.

How Do They Go Together?

It’s very helpful to think of the data type as the bridge between data structures and algorithms.

Data Structures

Specific organizations or arrangements of data, whether sequential, hierarchical, arbitrarily linked up, hashed, specially indexed, or just thrown into unorganized buckets.

Algorithms

Recipes for carrying out computations. Algorithms are made up of a finite number of instructions woven together with control structures.

 
 
Data Type

Data types combine data entities (units units of information) together with the algorithms that define the behaviors of these entities.

Data Types

So these data types are the key to everything right? What are they, exactly?

First of all, note that programs manipulate information, and information in a program is represented with entities. Entities are things like numbers, text, lists, and so on. To help us understand entities, it helps to classify them into types.

We have an intuition about what types are. As a first approximation, a type is just a set of values, for example:

That’s not really good enough, though! The reason why entities have certain types is that they behave in certain ways: you can multiply numbers, reverse lists, capitalize text, block users, revoke credentials, promote employees, and so on. Properly specifying a type requires defining how the entities of that type behave.

These behaviors are captured (algebraically 😮😮😮) as operations with zero or more inputs and a single output. For example:

types.png

The same thing can be said without pictures, for example:

CLUBS: Suit
HEARTS: Suit
DIAMONDS: Suit
SPADES: Suit
ACE: Rank
TWO: Rank
THREE: Rank
FOUR: Rank
FIVE: Rank
SIX: Rank
SEVEN: Rank
EIGHT: Rank
NINE: Rank
TEN: Rank
JACK: Rank
QUEEN: Rank
KING: Rank
makeCard : Suit × Rank → Card
getSuit : Card → Suit
getRank : Card → Rank
newDeck : Deck
positionOf : Deck × Card → int
cardAt : Deck × int → Card
topCard : Deck → Card
shuffle : Deck → Deck
split : Deck → Deck

Ah, now we have behaviors. A data type is a set of values together with operations that define their behavior.

Not every operation is applicable to every type. Types provide constraints on what we can and cannot say.

Exercise: Repeat that last sentence to yourself a few times. Meditate on it.
CLASSWORK
Draw a picture, and enumerate operations, for the type of integer lists. An integer list is a sequence of zero or more integers with operations such as obtaining the length, adding an item to a given position, removing the nth item, reversing, finding the position of a given item, etc.
Abstract Data Types

As an aside: you will sometimes hear the term abstract data type, abbreviated “ADT.” People use this term to emphasize the fact that the “internal structure and internal workings of the operations” are somehow unknown to the outside world. Only the effects of the operations matter. To many people, though, saying “data type” is enough—all datatypes are “abstract” in that sense.

A useful exercise is to identify things in your world and try to list as many behaviors as you can. Here is a starter set:

Data TypeSample Behaviors
Numbernegate • add • subtract • multiply • divide • remainder • modulo • to the power • mod_pow • to string • to string with a specific radix • negative infinity • positive infinity • $e$ • $\pi$ • floor • ceiling • round • truncate • $e$ to the power • natural log • log base 10 • log base 2 • hypot • sine • cosine • tangent • arc tangent • arc tangent given two values • square root • cube root
Stringcreate from code points • create from UTF-8 bytes • get list of code points • get list of UTF-8 bytes • get ith code point or byte • get number of code points • get number of bytes • normalize • starts with • ends with • is this a substring? • index of • lower case • upper case • capitalize • slice • substring • split • join • trim • trim at start • trim at end • pad at start • pad at end • repeat • replace • append • interpolate • match
Booleantrue • false • and • or • xor • not • and-not • nand • nor
Pointx • y • distance to origin • distance to another point • midpoint between this point and another • quadrant • origin
Polygonnumber of points • area • perimeter
Listcreate • destroy • add an element or elements • remove an element or elements • replace a single occurrence or multiple occurrence of an element • find (search for) an element or elements • get the size of (number of elements in) the collection • is it empty? • is it full? • enumerate the elements • filter (keep only certain) elements • merge lists together to form a new one • is this a sublist? • are these two lists equal? • make a copy of all or part of a list • serialize (convert to string or some other unique, recoverable representation) • deserialize (create from a serialized representation)
Username • email • avatar • is logged in • log in • logout • block • assign roles • get timestamp the user was created
Playername • id • score • update score • start turn • end turn
Timelinelabel • list of posts • add post • delete post • update post • archive
TimelineItemtitle • content • list of comments • add comment • remove comment • report comment
Exercise: Come up with examples of your own. Work with a friend.

Data Structures

A data structure is an arrangement of data for the purpose of being able to store and retrieve information. Usually a data structure exists in order to provide a physical representation of a data type.

For example, a list (a data type, which we saw above) can be represented by an array (a data structure) or by attaching a link from each element to its successor (another kind of data structure).

Wait, what are arrays and links?

Building Blocks

Data structures are built with three primary structuring mechanisms:

Objects

Groupings of properties into a single unit with named fields

Arrays

Collect multiple entities into positionally indexed, contiguous slots in memory

Links

Also known as references, allow arbitrary directional associations of entities

Let’s dive in. A object, also known as an record, struct, or named tuple, is a single entity that has many properties, where each property has a name and a value, which is another entity. Here is a simple object, can you guess what it represents?

latitude
29.9792
longitude
31.1344

An array is an entity that has a number of constituent elements, packed tightly together in memory and indexable by small natural numbers. In most programming languages, the first index is 0, in others (Julia, Lua, MATLAB), the first index is 1. Here are some cities in Alaska to visit (in order, because arrays are ordered):

0
"Juneau"
1
"Fairbanks"
2
"Utqiaġvik"
3
"Koyukuk"
4
"Anchorage"

Important: Note that were the properties of a object come together to make one conceptual thing; arrays are generally thought of as a collection of distinct things.

Exercise: That last remark was very important. Make sure you understand it.

Objects and arrays can be nested inside each other. Here is a dog at the shelter that you should adopt:

name
"Lisichka"
breed
"G-SHEP"
weightInKg
28
colors
0
"brown"
1
"black"
2
"white"
available
true
birthdate
year
2021
month
3
day
17

And here is an array of objects, representing a polygon:

0
x
3
y
5
color
"cyan"
1
x
5
y
1
color
"mauve"
2
x
3
y
-4
color
"green"
3
x
-3
y
-2
color
"aqua"
4
x
-3.5
y
2
color
"olive"
5
x
0
y
10
color
"pink"

Theoretically, arrays and objects can do any thing, sort of, but not too conveniently. For example, here are two different people, and their pets:

name
"Ani"
birthdate
"2019-03-03"
city
"Seattle"
pet
name
Sparky
breed
G-SHEP
weightInKg
33.5
primaryColor
"brown"
secondaryColor
"black"
name
"Aaron"
birthdate
"2020-09-05"
city
"Seattle"
pet
name
Sparky
breed
G-SHEP
weightInKg
33.5
primaryColor
"brown"
secondaryColor
"black"

Now here’s the question. Do these two people have the same pet, or are there two different dogs with the same name, the same birthday, the same breed, and the same everything else? It could be a coincidence, right? We would like a way to distinguish those two cases. That’s where links, also known as a references or pointers come in. The scenario in the previous picture shows two similar, but distinct, dogs, but the following picture shows the two people sharing a single pet:

shared-dogs.png

Links are helpful in so many other situations. Here’s an attempt to object information about a family using only objects. (For simplicity, we’re only showing some of the many fields we’d likely have irl.)

name
"Ani"
mom
name
Xiang
mom
name
Claire
mom
name
Farah
mom
name
Tanya
dad
name
Atuqtuaq
mom
name
Amelia
dad
name
Yafet
dad
name
Ka`eo
dad
name
Sipho
mom
name
`Aolani
dad
name
Tadashi
mom
name
Zobhule

There is one huge problem with this scenario. If we wanted add properties like “spouse” or “guardian” or “best friend” then...ugh...try it. Really, try it. What if two different people were each others’ best friends?

🤯

Exercise: So.... What actually happened when you tried adding a best-friend property to two people that were each other’s best friend?

We should have links here too, since after all, if two people have the same mom, the links are a great way to make clear they have the same mom rather than their separate moms have the same properties. Here’s the nested-object family tree above, using links to give each person their own identity:

family tree object diagram

Exercise: To the figure above, add properties to `Aolani and Claire to make them each other’s best friends. Notice how well it works now.

You’ve probably noticed the various person entities above all have different numbers of properties. While this really isn’t a big deal from a modeling perspective, programs can (believe it or not) often run much more efficiently when similar objects all have exactly the same size. But to make things the same size, we’ll need a way to indicate that certain properties don’t reference anything. Such a reference is called a null reference. Here’s the family tree again, this time with null references:

family tree object diagram

Null References

What do you think of nulls? They seem powerful. They seem convenient. They are used practically everywhere. Most of the popular languages have them. But did you know, the null reference is a disaster known as the Billion Dollar Mistake? Watch its inventor, Sir Tony Hoare, take the blame for this abomination.

In many programming languages, not only must all similar objects have the same properties, but you are also not allowed to add new elements to an array and you are not allowed to add new properties to a object once the program is running. Hard to believe, but IT’S TRUE! In these languages the only way you can grow and shrink structures once a program is running is to use links. That’s why you might see structures like the following when you need variable-sized structures:

step next "Ready" step next "Set" step next "Go" null

Arrangement of Elements

When building data structures, we must often think about the best ways to arrange the internal elements, as this impacts how we use the structure. Here are four classic arrangements that are good to know:

Classification Topology Examples
SET
(NON-POSITIONAL)
Unrelated

set.gif

  • People in a company
  • The cards in a hand
  • The toys in a box
  • The things on a table
SEQUENCE
(LINEAR)
One-to-one

linear.gif

  • The documents in a printer queue
  • People in line at the deli
  • Computers on a ring network
  • List to things to do (in order)
  • Items on a schedule
TREE
(HIERARCHICAL)
One-to-many

tree.gif

  • Family tree
  • Organization chart
  • Expression tree
  • Folders and files in a file system
GRAPH
(NETWORK)
Many-to-many

graph.gif

  • Road map
  • Schedule
  • Belief Network
  • Communication Network

These arrangements are actually rather conceptual, if you think about it, because one can still build them out of objects, arrays, and links in a variety of ways.

Choosing the Right Data Structure

Different data structures have different storage and performance characteristics. So when choosing a data structure, you should first figure out what kind of constraints you have. For example, in an accounting system, creation and deletion of a new account can be slow (they don’t happen too often), but lookup and update of balances happen all the time and must be extremely fast.

Exercise: Why do those make sense? (Think about your users!)

It turns out, in the study of data structures, people have found certain questions to pop up again and again and again and again. Being able to answer the following questions, and others like them, can make the difference between a system that satisfies its users and one that doesn’t.

Classic Data Structures

History is important. Over time, many data structures have become well-known (for various) reasons, and gaining knowledge of the classics is beneficial. Some of these are:

Algorithms

What is an Algorithm?

We design algorithms to carry out the desired behavior of a software system. Good programmers tend to use stepwise refinement when developing algorithms, and understand the tradeoffs between different algorithms for the same problem.

Classifying Algorithms

There are several classification schemes for algorithms:

Exercise: Although this is not the time to study each of these topics, you might wish, for fun, to look up the difference between Las Vegas and Monte Carlo algorithm types. Can you give a precise, one-sentence description of how these two strategies differ?

Kinds of Problems

This classification has proved useful:

Problem TypeDescription
DecisionGiven $f$ and $x$, is it true that $f(x)$?
FunctionalGiven $f$ and $x$, find $f(x)$
ConstraintGiven $f$ and $y$, find an $x$ such that $f(x)=y$
OptimizationGiven $f$, find the $x$ such that $f(x)$ is less than or equal to $f(y)$ for all $y$.

Choosing the Right Algorithm

Being a computer scientist or software engineer is more than just studying algorithms that people have already written; one should be able to know when to apply which kind of algorithm for a given task, and even be able to come up with novel algorithms.

Choosing the right algorithm is both:

Classic Algorithms

History is important. Over time, many algorithms have become well-known for various reasons. Some of these are:

Summary

We’ve covered:

  • Why data structures and algorithms are important
  • Why they go together
  • Data structures vs. algorithms vs. data types
  • Data types = values + behavior
  • The building blocks of data structures (the big three: objects, arrays, links)
  • Four data structure topologies
  • A super brief overview of the study of algorithms