How to Prepare
Students best prepare for a final when they have certain things:
- An outline of the topics that were covered in the course
- A review of the salient points, learning objectives, and why they took the course
- Some practice problems, with answers, that are similar to what would be on the exam
- An understanding of the expected exam scope
I would suggest going over the outline and concept review below, making sure you understand each topic by going back over the course notes and the relevant sections in the textbook (if any) and associated course readings, where applicable. But don’t just re-read, cram, and highlight; test yourself with well-crafted, meaningful, and helpful questions. Hopefully you’ve been doing some spaced repetition learning along the way.
It may also be helpful to browse many of the course notes pages on material we did not cover. There is a ton of good stuff there. Sometimes this extra material might expose you to ways in which the basic material you did learn can be used in practice.
Glance over the course textbook, especially in the intro and summary sections for the languages we covered in class. It is filled with stuff I like to think about, and since I will be writing the final, well, it never hurts to have just a little more comfort with the things I feel are important.
Do reinforcement and practice problems.
Course Outline
Reviewing this outline will jog your memory. We had fun didn’t we?
- DATA STRUCTURES AND ALGORITHMS BIG PICTURE
- The Study of Data Structures and Algorithms
- Why are they important
- Why are they studied together
- Connected through data types
- Data Types
- First approximation: set of values
- Better: sets of values plus operations
- All about behavior
- Data Structures
- Three building blocks: arrays, objects, links
- Arrays and objects can be nested inside each other
- Links allow the same object to be used in multiple places
- Null links require great care to manage (Billion Dollar Mistake)
- THE JAVA PROGRAMMING LANGUAGE
- Bootcamp
- Making waves (Getting started)
- Coin flipping (final, printing, random numbers)
- Egg counting (console reading, parseInt, try-catch, formatting)
- Elementary school vibes (command line args, arrays, fields and methods, static, throw)
- Day of the week (import, java.time, Locales,
?:
)
- Nonsense sentences (lists, maps, joining strings)
- JShell interlude (BigInteger, string operations, code points, more on arrays and collections)
- Small countries (sets, splitting strings, loops, break)
- Taking a chance (building your own classes from scratch, multiple-file apps)
- How does this even work? (Unit testing, JUnit 5, Assertions)
- Playing cards (records and arrays)
- Talking animals (inheritance, polymorphishm, abstract classes and methods)
- Basics
- Structure of a command line application
- Writing trivial apps with
public static void main(String[] args)
- Types: primitives (8 of them) and reference types (5 ways to make them)
- Variables and type inference
- Printing
- Numbers
- Integers vs Floats
- Precision and accuracy
- Operations, both built in and from class
Math
- Booleans
&&
and ||
?:
- Strings
- Made up of
char
values
char
s are not really characters
.charAt
, .length
, .subsstring
, and .indexOf
often produce unexpected results
- In general, you need to work with code points
"
for single-line strings, """
for multi-line strings
- isEmpty, isBlank, strip, startsWith, endsWith, contains, repeat, replace,
+
String.valueOf
, Integer.parseInt
- Characters and Unicode code points
- Objects
- What they are for
- Every value in Java is either a primitive or an object
- Fields and methods
- Dot notation
- Some methods are mutators, some are not
- Classes
- How to design software around classes
- Identity vs equality
- Java objects always accessed through references
- Record classes
- Pure awesomeness
- Immutable
- Automatically get private final fields, accessors, value-based
.equals
, hashCode
, and toString
.
- Can define a constructor if you want, which is good for validating arguments
null
and all its nastiness
- Enum classes
- Interfaces
- Basically represent behaviors
- All fields are automatically public and static and final (you don’t have to mark them as such, they just are)
- Methods are automatically public
- Methods are either (1) abstract, (2) static, or (3) default
- Generally are implemented by classes
- Statements
- Empty statements, assignment statements, calls
- If and switch
- For and while
- break, continue
- return
- try, throw
- Arrays
- Fixed size, cannot grow or shrink
- Mutable
- Indexes start at 0
- Not very many operations (e.g., no index-of, no swap)
- A few operations in
Arrays
, such as Array.equals(a,b)
and Arrays.toString(a)
- Lists
- So many variations possible
- Modifiable or unmodifiable
- Fixed-size or variable-size
- Array-based or link-based
- Lots of built-in methods
List.of
makes unmodifiable lists
- For modifiable lists, use
new ArrayList<type>
or new LinkedList<type>
- More operations in
Collections
- Generally preferred to arrays
- Cannot contain primitives, only objects
- Maps
- Key-value pairs are called entries
- Modifiable and unmodifiable versions possible
- For modifiable maps, use
new HashMap<type>
or new TreeMap<type>
- Lots of built-in operations
- Cannot contain primitives, only objects
- Exceptions
- Advanced (not covered early in class but picked up later as needed)
- When to use public and private
- Abstract classes
- Inheritance
- Polymorphism
- Generics
- Arrays are covariant, Collections are invariant
- Sealed classes and interfaces
- Nesting classes (e.g., nodes)
- DATA STRUCTURES FROM SCRATCH
- Stacks
- LIFO, with push and pop
- Implementations: bounded array, singly-linked chain, expandable array
- Push and pop operate at top of stack only
- In linked implementation, nodes can be immutable
- Time needed for push and pop not influenced by size of stack
- Queues
- FIFO, with queue and dequeue
- Implementations: bounded array, singly-linked circular chain, expandable array
- Enqueue at back of queue, dequeue at front
- Bounded array uses “modular indexing”
- Much more complex implementation than stacks
- Nodes in linked implementation generally not immutable
- Time needed for push and pop not influenced by size of stack
- Lists
- Unrestricted sequences, can add and remove anywhere at all
- Large number of operations
- Good mutable implementation is circular doubly-linked
- Good immutable implementation is singly-linked using a sealed interface with two subtypes: EmptyList and ListNode
- Recursion common in immutable singly-linked version
- MATHEMATICAL FOUNDATIONS
- Math
- What math is
- Symbols and notation
- Sets
- Tuples
- Relations
- Functions
- Lambda Notation
- Numbers
- Logarithms
- Algorithm Analysis
- What exactly are we analyzing?
- Why we measure relative, not absolute, time
- The idea of abstract operations
- Choosing $n$, the input size
- Several examples of determining $T(n)$
- The common (and some not-so-common) complexity classes
- Asymptotic Analysis (O, $\Theta$, $\Omega$)
- The effects of increasing input size
- The effects of a faster computer
- Recursion
- What it is and why it is important
- Recursion in Nature, Math, Language, Art, Music, Computer Science
- Recursive algorithms and functions
- Efficiency concerns with recursion
- Tips for when using recursion is okay
- The usefulness of recursive datatypes
- Sorting
- Demos and interactive widgets
- Classification by Mechanism
- By Exchange (e.g., Bubble, Cocktail, Gnome, Comb)
- By Selection (e.g., Selection, Heap)
- By Insertion (e.g., Insertion, Shell, Tree)
- By Merge (e.g., Merge)
- By Distribution (e.g., Radix, Counting)
- Hybrids (e.g., Tim, Intro)
- Classification by Complexity
- $\Theta(n^2)$ comparison sorts
- $\Theta(n \log n)$ comparison sorts
- $\Theta(1)$-ish distribution sorts
- Concurrent sorts
- Impractical sorts
- The surprising fun of the impractical sorts
- Stooge
- Bogo and Bozo and Bogogo
- Bad
- Worst
- Miracle
- Search
- How to think about search
- Binary Search
- Sets
- Maps (Dictionaries)
- How sets and dictionaries are alike
- Representations of sets and dictionaries
- ADVANCED DATA STRUCTURES
- Trees
- Three main application areas
- Networks: for minimal connectivity
- Hierarchy Modeling
- Search: to make it logarithmic!
- Uses in networks (Free Trees)
- Minimal Connectivity applications
- Base vocabulary: Node, Edge, Size, Path, Path Length, Simple Path
- Models of hierarchies (Oriented Trees)
- Hierarchy applications: geography, books, HTML, source code, file systems
- Other representations: tables, Venn diagrams, outline, string with nested parentheses
- New vocabulary: root, leaf, internal node, parent, child, sibling, ancestor, descendant, width, height, depth, level, degree, 0-node, 1-node, 2-node, k-node
- Traversal
- Breadth-First (Implemented with Queue)
- Depth-First (Implemented with Stack or written Recursively)
- Ordered Trees
- Binomial
- Fibonacci
- k-ary
- Have a natural representation
- If dense, these can go in arrays
- Good to know about complete trees
- There is some elegant math around binary trees
- In full trees there are more trees on the last level than there are in the sum total of nodes on all levels above
- Tree-based implementations of sets and dictionaries (Search Trees)
- Classic Search Trees
- Set elements go fully in the nodes themselves
- Elements must be mutually comparable
- Also called navigable sets/maps or sorted sets/maps
- Kinds
- (a,b), B, B+
- Naive Binary Search Trees
- Good Binary Search Trees: AVL, Red-Black, Splay, Scapegoat
- Prefix/Radix Trees (Tries)
- Set elements spread out among nodes
- Elements need NOT be mutually comparable
- Technically, elements are stored in paths (neat, right?)
- Kinds
- Tries with huge numbers of children
- Tries over a restricted alphabet (usually binary)
- “Ternary Search Trees” (not a great name, but whatever)
- Hash Tables
- For sets and dictionaries where elements are not compared with each other
- Apply a hash function to an object to find the index it should be at
- Need to be able to either avoid or deal with collisions
- Hash functions
- need to give good distribution (should not have clustering)
- need to run efficiently
- Best case for add, lookup, and remove is $\Theta(1)$
- Worst case for add, lookup, and remove is $\Theta(n)$
- On average, with good hash function and a good-sized table, essentially $\Theta(1)$
- Hash table resizing
- Expensive, but only extremely rarely done
- Rule of thumb: double hash table size when around 70% full
- Collision resolution strategies
- Linear Probing
- Quadratic Probing
- Cells store linked lists
- Cells store binary search trees
- Bitsets (sadly, not covered this semester)
- Heaps
- For priority queues
- Complete binary search trees with each node having higher priority than its children
- Internal operations are sift up and sift down
- Add and remove are both $\Theta(\log n)$
Things Learned in the Course
You want value from your education, right? So think about the major themes of that course. Think about what you learned. Think about things you can take from the course to be better in everything you do.
We learned a lot of things:
- Much of the science of computation involves storing and retrieving data. We must learn and apply the best practices for arranging data for efficient retrieval and manipulation. We must learn about data structures.
- The buidling blocks of data structures are arrays, objects, and links.
- Java is an interesting language.
- While programming languages often have a rich set of data types including stacks, queues, and lists, learning how to implement these from scratch has an enormous number of benefits.
- For sequences such as stacks, queues, and lists, we can use array-based or linked-based implementations.
- Immutability is a wonderful thing.
- Mutable and immutable versions of a data type often have radically different interfaces (e.g., doubly-linked mutable lists vs. singly-linked immutable lists).
- When implementing a data type with a data structure, we want to expose only the methods and make sure the underlying representation and implementation is completely unknowable and untouchable by any code outside the type. This is not only crucial for security but also gives us flexibility to change our representation or implementation without breaking the code that uses our type.
- There is an amazing and powerful knowledge area called Algorithm Analysis which lets us quantify time and space complexities. The Big-Θ, Big-O, and Big-Ω meanings and notations are absolutely essential for computer scientists.
- Constant time and linear time are great. Linear and Linearithmic (n log n) are good only on small data sets. Beyond that you need to be really careful—your data set needs to be extremely small.
- The philosophical chasm between the polynomial and the exponential is quite something to behold.
- Recursion has a basis in nature, and when programming can help us be extremely expressive and sometimes it’s the only natural way to do something. Just be careful not to be wasteful: if a loop is better, use a loop; if the recursion repeats sub-computations, memoize it!
- There are a gazillion ways to sort, and we should know lots of representative approaches, as this gives us a sense of how different algorithms and structures are applicable in different situations.
- Search is a massive topic. As programmers we tend to use sets and dictionaries for our searching needs; for searching data outside a program (like in files or to search the entire WWW), different techniques are needed that are outside the scope of this course (though many ideas you learn here will still be useful).
- We primarily search with trees (regular or prefix) or hash tables. Other mechanisms do exist but are rarer.
- Search trees can in general have an unbounded number of children, but the binary search tree might be the most common due to its simplicity. BSTs should be balanced or otherwise support efficient operations, hence we learn about AVL, Red-Black, Splay, and Scapegoat trees. Non-binary search trees are often used in database indexing (outside the scope of this course).
- Ternary Search Trees are a super cool kind of trie and useful in spell checking and auto completion applications.
- Hashing is wild, and in some cases gives us “constant” time search. But there are tradeoffs.
- Heaps are freaking amazing and the implementation of priority queues using them is quite clever.
Suggested Practice Problems
Do the course practice problems that relate to the course material over the semester.
Also, test yourself on the problems from the earlier exams in the course. Copy out the problems but not the answers. The next day, test yourself on them. Practice spaced-repetition if possible.
What To Expect
There will be 20 multiple choice and multi-select problems.
The final will be “open” all week, from Monday, December 12, through Friday, December 17 You may take the exam in whatever 2 hour period in this window is convenient for you. You are bound to the LMU Honor Code.
The exam will be on BrightSpace and have multi-select and multiple-choice problems—the kinds you are used to from previous exams.
A list of the main areas of concern for each of the 25 problems will appear on Slack, so stand by for those!