# Algorithms and Data Structures

Of the many subfields within computer science, algorithms and data structures may be the most fundamental—it seems to characterize computer science like perhaps no other. What is involved in the study of algorithms and data structures?

## Importance

Computer Science encompasses theoretical and practical approaches to computation. It’s been said that the four essential subdisciplines are:

THEORY
OF
COMPUTATION
ALGORITHMS AND DATA STRUCTURES
PROGRAMMING METHODOLOGY AND LANGUAGES
COMPUTER ELEMENTS AND ARCHITECTURE

The ACM has identified eighteen knowledge areas of Computer Science as:

Knowledge AreaCodeSome Topics
Algorithms and ComplexityALAlgorithmic Analysis; Algorithmic Strategies; Data Structures and Algorithms; Automata; Computability; Computational Complexity.
Architecture and OrganizationARDigital Logic and Digital Systems; Machine-level Data Representation; Assembly-Level Machine Organization; Memory System Organization and Architecture; Interfacing and Communication; Functional Organization; Multiprocessing and Alternative Architectures; Performance Enhancements.
Computational ScienceCNModeling and Simulation; Processing; Interactive Visualization; Data, Information and Knowledge; Numeric Analysis; Symbolic Computation; Mathematical Modeling; High-Performance Computing.
Discrete StructuresDSLogic; Sets, Relations, and Functions; Proof Techniques; Basics of Counting; Graphs and Trees; Discrete Probability.
Graphics and VisualizationGVMedia Applications; Digitization; Color Models; Rendering; Modeling; Animation; Visualization.
Human-Computer InteractionHCIInteractive Systems; Testing; Collaboration and Communication; Statistical Methods; Human Factors; Mixed, Augmented, and Virtual Reality.
Information Assurance and SecurityIASDefensive Programming; Threats and Attacks; Network Security; Cryptography; Web Security; Platform Security; Policy and Governance; Digital Forensics; Secure Software Engineering.
Information ManagementIMDatabase Systems; Data Modeling; Indexing; Key-Value, Document, Relational, and Graph Databases; Query Languages; Transaction Processing; Distributed Databases; Physical Database Design; Data Mining; Information Storage and Retrieval; Multimedia Systems.
Intelligent SystemsISKnowledge Representation and Reasoning; Search; Machine Learning; Reasoning Under Uncertainty; Agents; Natural Langauge Processing; Robotics; Speech Recognition and Synthesis; Perception and Computer Vision.
Networking and CommunicationNCNetworked Applications; Reliable Data Delivery; Routing and Forwarding; Local and Wide Area Networks; Reource Allocation; Mobility; Social Networking.
Operating SystemsOSOperating System Organization; Concurrency; Scheduling and Dispatch; Memory Management; Security and Protection; Virtual Machines; Device Management; File Systems; Realtime and Embedded Systems; Fault Tolerance; System Performance and Evaluation.
Platform-Based DevelopmentPBDWeb Platforms; Mobile Platforms; Industrial Platforms; game Platforms.
Parallel and Distributed ComputingPDParallel Decomposition; Communication and Coordination; Parallel Algorithms, Analysis, and Programming; Parallel Architecture; Parallel Performance; Distributed Systems; Cloud Computing; Formal Models and Semantics.
Programming LanguagesPLObject Oriented Programming; Functional Programming; Event-Driven and Reactive Programming; Type Systems; Program Representation; Language Translation and Execution; Syntax Analysis; Semantic Analysis; Code Generation; Runtime Systems; Static Analysis; Concurrency and Parallelism; Type Systems; Formal Semantics; Language Pragmatics; Logic Programming.
Software Development FundamentalsSDFAlgorithms and Design; Fundamental Programming Concepts; Fundamental Data Structures; Development Methods.
Software EngineeringSESoftware Processes; Project Management; Tools and Environments; Requirements Engineering; Software Design; Software Construction; Software Verification and Validation; Software Evolution; Software Reliability; Formal Methods.
Systems FundamentalsSFComputational Paradigms; Cross-Layer Communications; State and State Machines; Parallelism; Evaluation; Resource Allocation and Scheduling; Proximity; Virtualization and Isolation; Reliability through Redundancy; Quantitative Evaluation.
Social Issues and Professional PracticeSPSocial Context; Analytic Tools; Professional Ethics; Intellectual Property; Privacy and Civil Liberties; Professional Communication; Sustainability; History; Economics of Computing; Security Policies, Law, and Crime.

## Why Do They Go Together?

We can’t really separate data structures and algorithms because:

• Computing systems are concerned with the storage and retrieval of information.
• For systems to be economical the data must be organized (into data structures) in such a way as to support efficient manipulation (by algorithms).
• Choosing the wrong algorithms and data structures makes a program slow at best and unmaintainable and insecure at worst.

## Topics

While it’s possible to study algorithms and data structures exclusively at a theoretical level, we often study them together with introductory software engineering concepts; the major topics, grouping them roughly as follows:

• Data: data types, abstract data types, data structures (arrays and links), classes, APIs, collections, OO.
• Algorithms: What is an algorithm?, algorithmic patterns and paradigms (recursion, backtracking, search, etc.), complexity analysis.
• Software development: abstraction, efficiency, robustness, classification, inheritance and composition, polymorphism, software life cycle, methodology, modeling languages, design patterns, and testing.

## Data Types

During the design of a system, before the algorithms are written and before the data structures are sketched out, we must first identify the kinds of objects we need and how they should behave.

Some terminology would be nice here. Surprisingly, not everyone agrees on official terminology. I’ll try my hand and it: A type is a collection of values, for example:

• $\mathbb{Z} = \{..., -2, -1, 0, 1, 2, ...\}$
• $\mathbb{N} = \{0, 1, 2, 3, 4, ...\}$
• $\mathbb{B} = \{\textsf{false}, \textsf{true}\}$
• $\mathbb{R} = \textrm{the set of all real numbers}$
• $\mathbb{C} = \textrm{the set of complex numbers}$
• $\textsf{PrimaryColor} = \{\textsf{RED}, \textsf{GREEN}, \textsf{BLUE}\}$
• $\textsf{Suit} = \{\textsf{SPADES}, \textsf{HEARTS}, \textsf{DIAMONDS}, \textsf{CLUBS}\}$
• $\textsf{TwoDimensionalPoint} = \mathbb{R} \times \mathbb{R}$
• $\textsf{Card} = \{1..13\} \times \textsf{Suit}$
• $\textsf{Deck} = \textrm{the set of all permutations of Card}$
• $\textsf{SmallReal} = \{x \mid x \in \mathbb{R} \wedge 0.0 \leq x \wedge x \leq 1.0\}$
• $\textsf{Color} = \textsf{PrimaryColor} \rightarrow \textsf{SmallReal}$
• $\textsf{IntegerList} = \mathbb{Z}^{*}$
• $\textrm{List of }\alpha = \alpha^{*}$

A data type is a type together with operations. Operations specify the things you can do to the values of the type. We can draw what the types and operations are, for example:

The same thing can be said without pictures, for example:

    CLUBS : Suit         TEN : Rank
HEARTS : Suit        JACK : Rank
DIAMONDS : Suit      QUEEN : Rank
SPADES : Suit        KING : Rank
ACE : Rank           makeCard : Suit × Rank → Card
TWO : Rank           getSuit : Card → Suit
THREE : Rank         getRank : Card → Rank
FOUR : Rank          newDeck : Deck
FIVE : Rank          positionOf : Deck × Card → int
SIX : Rank           cardAt : Deck × int → Card
SEVEN : Rank         topCard : Deck → Card
EIGHT : Rank         shuffle : Deck → Deck
NINE : Rank          split : Deck → Deck

Exercise: Draw a picture, and enumerate operations, for the type of integer lists. An integer list is a sequence of zero or more integers with operations such as obtaining the length, adding an item to a given position, removing the nth item, reversing, finding the position of a given item, etc.

The operations need to be specified in more detail than the pictures show, but it should give you the basic idea.

You will sometimes hear the term abstract data type (ADT). People use this term to emphasize the fact that the “internal structure and internal workings of the operations” are somehow unknown to the outside world. Only the effects of the operations matter. To me, anyway, saying data type is enough; all datatypes are “abstract” in that sense.

### Collections

A collection is a group of elements treated a single object. Collection types can be:

• Homogeneous or heterogeneous
• Unordered or ordered
• Linear, Hierarchical, Networked, or Non-Positional

The latter classification looks like this:

Classification Topology Examples
SET
(NON-POSITIONAL)

Unrelated items, such as people in a class or a hand of cards.
SEQUENCE
(LINEAR)

One-to-one relationship, as in a printer queue, ring LAN, or a line at the deli.
TREE
(HIERARCHICAL)

One-to-many relationship, as in a family tree, organization chart, expression tree, or folders and files in a file system.
GRAPH
(NETWORK)

Many-to-many relationship, as in a road map, schedule, belief network, or communication net.

Operations common to most collections include:

• Creation and destruction
• Adding, removing, replacing, and searching for elements
• Getting the size, or figuring out if it is empty or full
• Enumeration (traversal)
• Equality testing
• Cloning
• Serializing

## Data Structures

A data structure is an arrangement of data for the purpose of being able to store and retrieve information. Usually a data structure is some physical representation of an ADT.

Example: A list (data type) can be represented by an array (data structure) or by attaching a link from each element to its successor (another kind of data structure).

### Building Blocks

Data structures are built with

• Records (a.k.a. structs, classes, hashes), which group entities into a single unit with named fields
• Arrays, which store fixed size entities into indexed, contiguous slots in memory
• Links (a.k.a. references or pointers)

### Choosing the Right Data Structure

The data structure chosen for a data type must meet the resource constraints of the problem. Figure out the constraints before you start coding!

Example: In an accounting system,
• Creation and deletion of a new account can be slow
• Lookup and update of the balance of a given account must be extremely fast.

These are the extremely important questions you must ask to determine acceptable tradeoffs:

• Can the data structure be completely filled at the beginning, or will there be insertions intermingled with deletions, lookups, updates, and other operations?
• Can items be deleted from the structure?
• Will the items always be processed in a well-defined order, or will random-access have to be supported?

### The Classics

History is important. Over time, many data structures have become well-known (for various) reasons. Some of these are:

## Algorithms

We design algorithms to carry out the desired behavior of a software system. Good programmers tend to use stepwise refinement when developing algorithms, and understand the tradeoffs between different algorithms for the same problem.

### Classifying Algorithms

There are several classification schemes for algorithms:

• By problem domain: numeric, text processing (matching, compression, sequencing), cryptology, fair division, sorting, searching, computational geometry, networks (routing, connectivity, flow, span), computer vision, machine learning.
• By design strategy: divide and conquer, greedy, algebraic transformation, dynamic programming, linear programming, brute force (exhaustive search), backtracking, branch and bound, heuristic search, genetic algorithms, simulated annealing, particle swarm, Las Vegas, Monte Carlo, quantum algorithms.
• By complexity: constant, logarithmic, linear, linearithmic, quadratic, cubic, polynomial, exponential.
• Around the central data structures and their behavior: whether (1) organizational: set, linear, hierachical, or network; or (2) implementational: hashtables, skip lists, red-black trees, a-b trees, cartesian trees, neural networks, Bayesian networks, swarms.
• By one of many implementation dimensions: sequential vs. parallel, recursive vs. iterative, deterministic vs. non-deterministic, exact vs. approximate, ....

### Kinds of Problems

This classification has proved useful:

Problem TypeDescription
DecisionGiven $f$ and $x$, is it true that $f(x)$?
FunctionalGiven $f$ and $x$, find $f(x)$
ConstraintGiven $f$ and $y$, find an $x$ such that $f(x)=y$
OptimizationGiven $f$, find the $x$ such that $f(x)$ is less than or equal to $f(y)$ for all $y$.

### Choosing the Right Algorithm

Choosing the right algorithm is both

• an art, because cleverness, insight, ingenuity, and dumb luck are often required to find efficient algorithms for certain problems
• a science, because principles of algorithm analysis and widely applicable algorithmic patterns have been developed over time for you to use.

### The Classics

History is important. Over time, many algorithms have become well-known (for variaous) reasons. Some of these are:
• Euclid’s GCD
• The Fast Fourier Transform
• Miller-Rabin Primality Test
• Wait, I can’t make up a great list, but Wikipedia can. Check out Wikpedia’s List of Algorithms.