The best way to prepare for your final is to:
You will take the final on BrightSpace during the allotted time for the exam, which is Monday, May 12, 2025, from 11:00 to 1:00pm America/Los Angeles time. You will have 120 minutes to do the exam. All problems will be multiple choice, multiple select, or matching. Submit on time; the penalty for late submissions is 5 points per minute.
Expect 20-30 questions. There will be no nasty all-of-the-above or none-of-the-above options. There will, however, occasionally be questions for which one answer is solid and correct, while others may make true—or nearly-always-true—statements but do not actually fit 100% with the asked question.
The intent of the questions is not to trick you. The intent is to assess whether you have gained a fairly deep understanding of computer science fundamentals (linguistic and theoretical concepts) without having to do a one-hour web search. All students can be this kind of student. If you have read the materials (and there was indeed a ton of reading!) and have participated in the writing of your term project (a real compiler!), you are that kind of student.
As always, you MAY use books, notes, and web searches to look things up. You will not be spied on: there is no browser lock down and hence no need to hide a mobile device in a bag of potato chips. However, you MAY NOT solicit answers in any way. There is to be no asking for help, no posting on forums, no communication with other humans or chat bots in any way; you can only “look things up,” you may never “ask.” You also MAY NOT post answers or help any other test taker either. You are bound by an honor code to follow these rules.
Review the learning objectives from the syllabus now. If there is any unmet objective, let the instructor know. For reference, the objectives are repeated here:
There are several things that all recipients of a computer science degree are expected to know. Ideally, these would be verified by an oral exam with a pass/fail outcome. Don’t worry, such an oral exam is not feasible, and may even be subject to the examinee failing due to nerves, so you’re going to test yourself in the comfort of your own space. Make sure you know the following:
Do you know how to answer each and every question? Can you articulate good answers to each? If so, great! Congratulations! This is the minimal criteria for passing.
You learned some theory and some bits of knowledge. But hopefully, you got so much more. You should have:
Why did you write a compiler?
Back to the academic side of things.
Review the course notes covered during lectures.
CMSI 3801 and 3802 enable you to tell the story of the discipline of computer science and thus prepare you to succeed as a practitioner in the field and even make theoretical contributions to the field. We can understand this story roughly as follows:
The field of theoretical computer science, and its connections to programming language definition, usage, and implementation, is so much broader than what can be fully grasped in one year of college study. You’ve been given a taste of the field, thrown into writing a compiler, and challenged with difficult questions without much preparation other than “hopefully you did all the readings and gain some insight form what you read”. React to these experiences with a desire to learn those things beyond your comfort zone, and not with a depressive feeling that you should have been able to do every assignment or learn every concept.That is not how the world is.
Here is a rough outline of the course material. The outline is not exactly in the order the material was presented during lecture, since the optimal ordering of learning is not the same as the overall outline of a subject.
Note that because our course was only one semester, certain things were not covered, and are so marked with the strikeout decoration.
THEORIES OF COMPUTER SCIENCE What is a theory? Why do we have theories? Historical path to computer science as a discipline Information Computation in Antiquity (early recipes) Early ”Machines” Mathematics and Philosophy Formalization of Computation Turing Electronic Computers Programming Languages Compilers Optimization Computing for People The four major theories of computation and their concerns Language Theory Concerned with how computations are expressed Automata Theory Concerned with how computations are performed Sneak peek: Turing Machines The stunning notion of computational universality Computation Theory Concerned with what can and cannot be computed Sneak peek: the halting problem is undecidable Complexity Theory Concerned with how efficiently computations can be performed Sneak peek: P vs. NP LANGUAGE THEORY Concerned with how computations are expressed Why study language theory? Information representation Formal Language Theory Symbols, Alphabets, Strings, Languages Operators on languages: Union, Intersection, Concatenation, Kleene Star How to formally define a language? Generative Grammars Role of variables Grammar notation How strings are generated Lots of example grammars Parse Trees (aka Derivation Trees) Ambiguity Formal Definition (won’t be on the exam) Restrictions CFG: LHS is only one variable RLG: LHS is only one variable and RHS is symbols + at most one variable ENG: Never shrinks (except you can have the rule s->ε) Language Recognition Automata can be used for this Analytic Grammars Language Classification Chomsky Hierarchy for Formal Languages (original version) Regular (Type 3) Context-Free (Type 2) Type-1 (aka ”Context sensitive”) Unrestricted (Type 0) Larger Chomsky Hierarchy Finite Regular Context-Free “Context-Sensitive” Recursive Recursively Enumerable (r.e.) Finitely Describable Programming Language Theory How PLT differs from formal language theory Concerns (Just a list for now) Syntax Semantics Type Theory Static Analysis Translation Runtime Systems Verification Metaprogramming Classification SYNTAX Motivation (there is a structure underlying all programs) Many ways to express this structure as a string Definition of Syntax Syntax Diagrams Lexical vs. Phrase Syntax Why this is massively important Ways to represent the difference Tokens Parse Trees The frontier of the parse tree is the token stream Dealing with Ambiguity Precedence (and how to capture it in a grammar) Associativity (and how to capture it in a grammar) Parsing (sneak peek only) Hand-crafted, recursive descent Parser generators Analytic Grammars PEGs Ohm The Problem of Context Things you cannot capture in a context-free grammar, incomplete list: No redeclare within scope No use of possibly uninitialized variables Type checking Correct number of arguments must appear in a call Access modifiers must be correct All execution paths through a function must end in a return All abstract methods must be implemented or declared abstract All declared local variables must be used All private methods in a class must be used Is this stuff syntax or semantics? People can disagree Side note: can be formalized in theory but why bother Type inference Abstract Syntax What ASTs look Like Difference between CSTs (Parse trees) and ASTs Tree grammars to formally define ASTs Esprima Examples in JavaScript Examples in Java Aside: Different syntax formalisms in the real world LANGUAGE DESIGN Things to know Major features of existing programming languages Historical Issues What Bret Victor says about the 1960s and 1970s What Alan Kay thinks The process of language design Big picture and big questions Starter set of features Design your abstract syntax Sketch and Prototype with Ohm!!! Start working on lower-level syntax What kind of sugar do you want? Differences between syntax, semantics, pragmatics Ohm for language design Ohm grammar notation Ohm details Examples of Ohm grammars Case study: Astro Case study: Bella Case study: Carlos COMPILERS Translators vs. interpreters Compilers, assemblers, transpilers AOT vs. JIT Overall structure of translation Analysis -> Generation Analysis -> Optimization -> Generation Parsing -> Static Analysis -> Optimization -> Code Generation Lexical Analysis characters to tokens Syntax Analysis = Parsing Tokens to CSTs Semantic Analysis = Static Analysis CSTs to ASTs ASTs are pretty much DAGs, so best to call them program representations Type checking and other semantic analysis Storing types with expressions in the DAG nodes Reps have nodes not in the AST, e.g. actual functions and variables Design decisions: can we simplify the storage of numbers and types? Intermediate Representations Why have them?Sneak peek: later phases of the compilerControl Flow Analysis Data Flow Analysis Optimization of decorated AST Production of high-level language code Production of abstract intermediate structures Production of bytecode Production of abstract assembly language Machine independent optimization Modern compilers are not just one-shot translators How to architect a compiler using Ohm parser.js analyzer.js Representing context Checks, especially type checking optimizer.js generator.js core.js compiler.js <your-language-name>.js Tests for compiler, parser, analyzer, optimizer, generator Why you should write a compiler AUTOMATA THEORY Concerned with how computations can be carried out Broad classification of automata Transducers vs Recognizers/Deciders Tapes vs Registers State Machines vs Instruction Lists Harvard vs von Neumann Architecture Turing Machines How they work Many Examples Variations that neither restrict nor expand computing power Multi-track Multi-head Multi-tape Queue Variations that restrict computing power LBAs: Bounded tape PDAs: Input is read-only, read left-to-right once, memory is a stack FAs: Input is read-only, read left-to-right once, no memory Register Machines Counter machines RAMs Other “Automata-like” Formalisms String rewriting systems λ-Calculus Brainf**k Recursive Functions (not covered in class)Applications to Intermediate RepresentationsWhy have them? Analysis/Synthesis is inherent to translation Break down complex problem Retargetability For machine independent optimizations High-level vs. Medium-level vs. Low-level Styles Abstract assembly language (instructions called tuples) Stack code List of well-known IRs JVM CLR LLVM SIL CIL TuplesApplications to Virtual Machines and Real MachinesMachine Architecture How machines work (review) Intel 64 architecture Review of x86-64 Assembly Language Registers and instructions Calling conventions Parallel instructions ARM 64 (AArch64) Registers and instructions Calling conventions Parallel instructions The nastiness of conditional jumps and how to avoid them Code Generation Goals Translation to JavaScriptTranslation to Assembly LanguageNaïve Interpretive Code generator generatorsGeneration of real assembly languageAddress assignment Instruction selection Register allocation Low-level optimizationUnderstanding the runtime system for block-structured languagesStack frames Dynamic links Static links Register save area Register spilling PARSING THEORY What is parsing? Lexical vs Syntactic parsing Regular expressions In theory (type-3) In practice Common notation for Regexes in modern languages ( ) [ ] { } ^ $ . \ ? * + | Uses: validation, search, extraction, replace Groups Quantifiers Eager: * + ? {} Reluctant: *? +? ?? {}? Possessive: *+ ++ ?+ {}+ Backreferences \1 \2 ... Anchors: ^ $ \A \Z \b \B Lookarounds: ?= ?! ?<= ?<! Performance concernsApproaches to parsingTop-down, LL, Expand-Match Bottom-up, LR, Shift-ReduceRecursive DescentPEGsParsing in the Real worldCOMPUTABILITY THEORY Concerned with what can and cannot be computed History: Hilbert, Gödel, Church, Turing Bernhardt book Wadler video So many equivalent models, all Turing-complete, hence the Church-Turing Thesis There exist noncomputable Functions We show this by diagonalization Halting Problem is undecidable Limits Non-computable functions = Non-recognizable languages Non-decidable problems = Non-decidable languages Reductions Rice’s Theorem Chomsky Hierarchy: The Full Version Finite = S->a|b|c = Non-looping FAs Regular = Right Linear Grammar = Finite Automata Deterministic Context Free = LR = DPDA Context Free = CFG = (N)PDA Type-1 = Linear Bounded Automata Decidable (Recursive) = Turing Machines that always Halt Recognizable = r.e. = Turing Machines Finitely Describable (no machine out here) Beyond Finitely Describable 🤯 COMPLEXITY THEORY Concerned with how expensive certain computations are Time complexity Space complexityTheoryBig-O, Big-Theta, Big-Omega Little-O, Little-Theta, Little-Omega Asymptotic Notation P vs. NP NP-Completeness The Complexity Zoo Practice: Optimization in Compilers Code Optimization Machine independent vs. machine dependent Constant folding Strength reductions Algebraic simplifications Operand reordering Unreachable code elimination Dead code elimination Copy propagation Common subexpression elimination Loop unrollingSpecial purpose instructionse.g. muladd, range, conditional jumpLoop invariant factoringTail recursion elimination Induction variable simplificationStatic frame allocationStack frame simplificationLow-level optimizationsSpecial instructions Alignment Cache Removing conditional jumps Scheduling to remove load delays and similar things
Here are things you should be able to do before taking the final. Quiz yourself. Quiz each other.
You have to put in the time for effortful self-study. Although the exam is open resources, you will not have time to look everything up. Those who come in with a strong comfort level with the material will finish on time. I am assessing your fluency and your proficiency with the material, not your Google-Fu.
There will be 20 questions. They will be scrambled for each student. The focus of each of the questions are as follows:
An education is a long-term life journey. Education goes way beyond your chosen field and way beyond academics in general. That said, there is much to be gained by immersing oneself in the history, theory, and practice, of computer science. Our culture is primarily literary, so to that end, you have been assigned a great deal of reading? Were you able to read or skim everything? I hope so, but if not, find time to catch up (or at least please consider catching up in the near future). Among the readings that will be helpful in your journey to becoming a computer scientist, review: