Algorithms and Data Structures

Of the many subfields within computer science, algorithms and data structures may be the most fundamental—it seems to characterize computer science like perhaps no other. What is involved in the study of algorithms and data structures?

Importance

Computer Science encompasses theoretical and practical approaches to computation. It’s been said that the four essential subdisciplines are:

THEORY
OF
COMPUTATION
ALGORITHMS AND DATA STRUCTURES
PROGRAMMING METHODOLOGY AND LANGUAGES
COMPUTER ELEMENTS AND ARCHITECTURE

The ACM has identified eighteen knowledge areas of Computer Science as:

Knowledge AreaCodeSome Topics
Algorithms and ComplexityALAlgorithmic Analysis; Algorithmic Strategies; Data Structures and Algorithms; Automata; Computability; Computational Complexity.
Architecture and OrganizationARDigital Logic and Digital Systems; Machine-level Data Representation; Assembly-Level Machine Organization; Memory System Organization and Architecture; Interfacing and Communication; Functional Organization; Multiprocessing and Alternative Architectures; Performance Enhancements.
Computational ScienceCNModeling and Simulation; Processing; Interactive Visualization; Data, Information and Knowledge; Numeric Analysis; Symbolic Computation; Mathematical Modeling; High-Performance Computing.
Discrete StructuresDSLogic; Sets, Relations, and Functions; Proof Techniques; Basics of Counting; Graphs and Trees; Discrete Probability.
Graphics and VisualizationGVMedia Applications; Digitization; Color Models; Rendering; Modeling; Animation; Visualization.
Human-Computer InteractionHCIInteractive Systems; Testing; Collaboration and Communication; Statistical Methods; Human Factors; Mixed, Augmented, and Virtual Reality.
Information Assurance and SecurityIASDefensive Programming; Threats and Attacks; Network Security; Cryptography; Web Security; Platform Security; Policy and Governance; Digital Forensics; Secure Software Engineering.
Information ManagementIMDatabase Systems; Data Modeling; Indexing; Key-Value, Document, Relational, and Graph Databases; Query Languages; Transaction Processing; Distributed Databases; Physical Database Design; Data Mining; Information Storage and Retrieval; Multimedia Systems.
Intelligent SystemsISKnowledge Representation and Reasoning; Search; Machine Learning; Reasoning Under Uncertainty; Agents; Natural Langauge Processing; Robotics; Speech Recognition and Synthesis; Perception and Computer Vision.
Networking and CommunicationNCNetworked Applications; Reliable Data Delivery; Routing and Forwarding; Local and Wide Area Networks; Reource Allocation; Mobility; Social Networking.
Operating SystemsOSOperating System Organization; Concurrency; Scheduling and Dispatch; Memory Management; Security and Protection; Virtual Machines; Device Management; File Systems; Realtime and Embedded Systems; Fault Tolerance; System Performance and Evaluation.
Platform-Based DevelopmentPBDWeb Platforms; Mobile Platforms; Industrial Platforms; game Platforms.
Parallel and Distributed ComputingPDParallel Decomposition; Communication and Coordination; Parallel Algorithms, Analysis, and Programming; Parallel Architecture; Parallel Performance; Distributed Systems; Cloud Computing; Formal Models and Semantics.
Programming LanguagesPLObject Oriented Programming; Functional Programming; Event-Driven and Reactive Programming; Type Systems; Program Representation; Language Translation and Execution; Syntax Analysis; Semantic Analysis; Code Generation; Runtime Systems; Static Analysis; Concurrency and Parallelism; Type Systems; Formal Semantics; Language Pragmatics; Logic Programming.
Software Development FundamentalsSDFAlgorithms and Design; Fundamental Programming Concepts; Fundamental Data Structures; Development Methods.
Software EngineeringSESoftware Processes; Project Management; Tools and Environments; Requirements Engineering; Software Design; Software Construction; Software Verification and Validation; Software Evolution; Software Reliability; Formal Methods.
Systems FundamentalsSFComputational Paradigms; Cross-Layer Communications; State and State Machines; Parallelism; Evaluation; Resource Allocation and Scheduling; Proximity; Virtualization and Isolation; Reliability through Redundancy; Quantitative Evaluation.
Social Issues and Professional PracticeSPSocial Context; Analytic Tools; Professional Ethics; Intellectual Property; Privacy and Civil Liberties; Professional Communication; Sustainability; History; Economics of Computing; Security Policies, Law, and Crime.

Why Do They Go Together?

We can’t really separate data structures and algorithms because:

Topics

While it’s possible to study algorithms and data structures exclusively at a theoretical level, we often study them together with introductory software engineering concepts; the major topics, grouping them roughly as follows:

Data Types

During the design of a system, before the algorithms are written and before the data structures are sketched out, we must first identify the kinds of objects we need and how they should behave.

Some terminology would be nice here. Surprisingly, not everyone agrees on official terminology. I’ll try my hand and it: A type is a collection of values, for example:

A data type is a type together with operations. Operations specify the things you can do to the values of the type. We can draw what the types and operations are, for example:

types.png

The same thing can be said without pictures, for example:

    CLUBS : Suit         TEN : Rank
    HEARTS : Suit        JACK : Rank
    DIAMONDS : Suit      QUEEN : Rank
    SPADES : Suit        KING : Rank
    ACE : Rank           makeCard : Suit × Rank → Card
    TWO : Rank           getSuit : Card → Suit
    THREE : Rank         getRank : Card → Rank
    FOUR : Rank          newDeck : Deck
    FIVE : Rank          positionOf : Deck × Card → int
    SIX : Rank           cardAt : Deck × int → Card
    SEVEN : Rank         topCard : Deck → Card
    EIGHT : Rank         shuffle : Deck → Deck
    NINE : Rank          split : Deck → Deck
Exercise: Draw a picture, and enumerate operations, for the type of integer lists. An integer list is a sequence of zero or more integers with operations such as obtaining the length, adding an item to a given position, removing the nth item, reversing, finding the position of a given item, etc.

The operations need to be specified in more detail than the pictures show, but it should give you the basic idea.

You will sometimes hear the term abstract data type (ADT). People use this term to emphasize the fact that the “internal structure and internal workings of the operations” are somehow unknown to the outside world. Only the effects of the operations matter. To me, anyway, saying data type is enough; all datatypes are “abstract” in that sense.

Collections

A collection is a group of elements treated a single object. Collection types can be:

The latter classification looks like this:

Classification Topology Examples
SET
(NON-POSITIONAL)

set.gif

Unrelated items, such as people in a class or a hand of cards.
SEQUENCE
(LINEAR)

linear.gif

One-to-one relationship, as in a printer queue, ring LAN, or a line at the deli.
TREE
(HIERARCHICAL)

tree.gif

One-to-many relationship, as in a family tree, organization chart, expression tree, or folders and files in a file system.
GRAPH
(NETWORK)

graph.gif

Many-to-many relationship, as in a road map, schedule, belief network, or communication net.

Operations common to most collections include:

Data Structures

A data structure is an arrangement of data for the purpose of being able to store and retrieve information. Usually a data structure is some physical representation of an ADT.

Example: A list (data type) can be represented by an array (data structure) or by attaching a link from each element to its successor (another kind of data structure).

Building Blocks

Data structures are built with

Choosing the Right Data Structure

The data structure chosen for a data type must meet the resource constraints of the problem. Figure out the constraints before you start coding!

Example: In an accounting system,
  • Creation and deletion of a new account can be slow
  • Lookup and update of the balance of a given account must be extremely fast.

These are the extremely important questions you must ask to determine acceptable tradeoffs:

The Classics

History is important. Over time, many data structures have become well-known (for various) reasons. Some of these are:

Algorithms

We design algorithms to carry out the desired behavior of a software system. Good programmers tend to use stepwise refinement when developing algorithms, and understand the tradeoffs between different algorithms for the same problem.

Classifying Algorithms

There are several classification schemes for algorithms:

Kinds of Problems

This classification has proved useful:

Problem TypeDescription
DecisionGiven $f$ and $x$, is it true that $f(x)$?
FunctionalGiven $f$ and $x$, find $f(x)$
ConstraintGiven $f$ and $y$, find an $x$ such that $f(x)=y$
OptimizationGiven $f$, find the $x$ such that $f(x)$ is less than or equal to $f(y)$ for all $y$.

Choosing the Right Algorithm

Choosing the right algorithm is both

The Classics

History is important. Over time, many algorithms have become well-known (for variaous) reasons. Some of these are: