CMSI 3802
Final Exam Preparation

The best way to prepare for your final is to:

Logistics

The final will be made available on BrightSpace around noon on the Monday of finals week and must be submitted by 11:59pm on the Friday of finals week. You will have 120 minutes to do the exam. Choose any two-hour period in the window to take the exam. You must complete the exam in one sitting. All problems will be multiple choice, multiple select, or matching. Submit on time; the penalty for late submissions is 5 points per minute.

Expect 20-30 questions. There will be no nasty all-of-the-above or none-of-the-above options. There will, however, occasionally be questions for which one answer is solid and correct, while others may make true, or nearly-always-true, statements but do not actually fit 100% with the asked question.

The intent of the questions is not to trick you. The intent is to ensure that students that have gained a fairly deep understanding of computer science fundamentals (linguistic and theoretical concepts) without having to do a one-hour web search, will be rewarded with a higher grade. All students can be this kind of student. If you have read the materials (and there was indeed a ton of reading!) and have participated in the writing of your term project (a real compiler!), you are that kind of student.

As always, you MAY use books, notes, and web searches to look things up. You will not be spied on: there is no browser lock down and hence no need to hide a mobile device in a bag of potato chips. However, you MAY NOT solicit answers in any way. There is to be no asking for help, no posting on forums, no communication with other humans or chat bots in any way; you can only “look things up,” you may never “ask.” You also MAY NOT post answers or help any other test taker either. You are bound by an honor code to follow these rules.

Learning Objectives Review

Review the learning objectives from the syllabus now. If there is any unmet objective, let the instructor know. For reference, the objectives are repeated here:

The Essentials

There are several things that all recipients of a computer science degree are expected to know. Ideally, these would be verified by an oral exam with a pass/fail outcome. Don’t worry, such an oral exam is not feasible, and may even be subject to the examinee failing due to nerves, so you’re going to test yourself in the comfort of your own space. Make sure you know the following:

Do you know how to answer each and every question? Can you articulate good answers to each? If so, great! Congratulations! This is the minimal criteria for passing.

Practical Learnings!

You learned some theory and some bits of knowledge. But hopefully, you got so much more. You should have:

Why did you write a compiler?

Course Notes Review

Back to the academic side of things.

Review the course notes covered during lectures.

Course Outline

Here is a rough outline of the course material.

THEORIES OF COMPUTER SCIENCE
    What is a theory?
    Why do we have theories?
    Historical path to computer science as a discipline
    The four major theories of computation and their concerns
        Language Theory
            Concerned with how computations are expressed
        Automata Theory
            Concerned with how computations are performed
            Sneak peek: Turing Machines
            The stunning notion of computational universality
        Computation Theory
            Concerned with what can and cannot be computed
            Sneak peek: the halting problem is undecidable
        Complexity Theory
            Concerned with how efficiently computations can be performed
            Sneak peek: P vs. NP

LANGUAGE THEORY
    Concerned with how computations are expressed
    Why study language theory?
    Information representation
    Formal Language Theory
        Symbols, Alphabets, Strings, Languages
        Operators on languages: Union, Intersection, Concatenation, Kleene Star
        How to formally define a language?
        Grammars
            Role of variables
            Grammar notation
            How strings are generated
            Lots of example grammars
            Parse Trees (aka Derivation Trees)
            Ambiguity
            Formal Definition
            Restrictions
                CFG: LHS is only one variable
                RLG: LHS is only one variable and RHS is symbols + at most one variable
                Type-1: RHS never shrinks (only one exception if empty string is in the language)
        Language Recognition
            Automata can be used for this
            Analytic Grammars
        Language Classification
            Chomsky Hierarchy for Formal Languages (original version)
                Regular (Type 3)
                Context-Free (Type 2)
                Type-1 (aka ”Context sensitive”)
                Unrestricted
            Larger Chomsky Hierarchy
                Finite
                Regular
                Context-Free
                “Context-Sensitive”
                Recursive
                Recursively Enumerable (r.e.)
                Finitely Describable
    Programming Language Theory
        How PLT differs from formal language theory
        Concerns (Just a list for now)
            Syntax
            Semantics
            Type Theory
            Static Analysis
            Translation
            Runtime Systems
            Verification
            Metaprogramming
            Classification

SYNTAX
    Motivation (there is a structure underlying all programs)
    Many ways to express this structure as a string
    Definition of Syntax
    Syntax Diagrams
    Lexical vs. Phrase Syntax
        Why this is massively important
        Ways to represent the difference
    Tokens
    Parse Trees
        The frontier of the parse tree is the token stream
    Dealing with Ambiguity
        Precedence (and how to capture it in a grammar)
        Associativity (and how to capture it in a grammar)
    Parsing (sneak peek only)
        Hand-crafted, recursive descent
        Parser generators
        Analytic Grammars
        PEGs
        Ohm
    The Problem of Context
        Things you cannot capture in a context-free grammar, incomplete list:
            No redeclare within scope
            No use of possibly uninitialized variables
            Type checking
            Correct number of arguments must appear in a call
            Access modifiers must be correct
            All execution paths through a function must end in a return
            All abstract methods must be implemented or declared abstract
            All declared local variables must be used
            All private methods in a class must be used
        Is this stuff syntax or semantics?
            People can disagree
        Side note: can be formalized in theory but why bother
    Type inference
    Abstract Syntax
        What ASTs look Like
        Difference between CSTs (Parse trees) and ASTs
        Tree grammars to formally define ASTs
        Esprima
        Examples in JavaScript
        Examples in Java
    Aside: Different syntax formalisms in the real world

LANGUAGE DESIGN
    Things to know
    Major features of existing programming languages
    Historical Issues
        What Bret Victor says about the 1960s and 1970s
        What Alan Kay thinks
    The process of language design
        Big picture and big questions
        Starter set of features
        Design your abstract syntax
        Sketch and Prototype with Ohm!!!
        Start working on lower-level syntax
        What kind of sugar do you want?
    Differences between syntax, semantics, pragmatics
    Ohm for language design
        Ohm grammar notation
        Ohm details
        Examples of Ohm grammars
    Case study: Astro
    Case study: Bella
    Case study: Carlos

COMPILERS
    Translators vs. interpreters
    Compilers, assemblers, transpilers
    AOT vs. JIT
    Overall structure of translation
        Analysis -> Generation
        Analysis -> Optimization -> Generation
        Parsing -> Static Analysis -> Optimization -> Code Generation
        Lexical Analysis
            characters to tokens
        Syntax Analysis = Parsing
            Tokens to CSTs
        Semantic Analysis = Static Analysis
            CSTs to ASTs
            But ASTs are not really trees
            Type checking and other semantic analysis
        Intermediate Representations
            Why have them?
        Sneak peek: later phases of the compiler
            Control Flow Analysis
            Data Flow Analysis
            Optimization of decorated AST
            Production of high-level language code
            Production of abstract intermediate structures
            Production of bytecode
            Production of abstract assembly language
            Machine independent optimization
        Modern compilers are not just one-shot translators
        How to architect a compiler using Ohm
            parser.js
            analyzer.js
                Representing context
                Checks, especially type checking
            optimizer.js
            generator.js
            core.js
            compiler.js
            <your-language-name>.js
            Tests for compiler, parser, analyzer, optimizer, generator
      Why you should write a compiler

AUTOMATA THEORY
    Concerned with how computations can be carried out
    Broad classification of automata
        Transducers vs Recognizers/Deciders
        Tapes vs Registers
        State Machines vs Instruction Lists
        Harvard vs von Neumann Architecture
    Turing Machines
        How they work
        Many Examples
        Variations that neither restrict nor expand computing power
            Multi-track
            Multi-head
            Multi-tape
            Queue
        Variations that restrict computing power
            LBAs: Bounded tape
            PDAs: Input is read-only, read left-to-right once, memory is a stack
            FAs: Input is read-only, read left-to-right once, no memory
    Register Machines
        Counter machines
        RAMs
    Other “Automata-like” Formalisms
        String rewriting systems
        λ-Calculus
        Brainf**k
        Recursive Functions (not covered in class)
    Applications to Intermediate Representations
        Why have them?
            Analysis/Synthesis is inherent to translation
            Break down complex problem
            Retargetability
            For machine independent optimizations
        High-level vs. Medium-level vs. Low-level
        Styles
            Abstract assembly language (instructions called tuples)
            Stack code
        List of well-known IRs
            JVM
            CLR
            LLVM
            SIL
            CIL
        Tuples
    Applications to Virtual Machines and Real Machines
        Machine Architecture
        How machines work (review)
        Intel 64 architecture
        Review of x86-64 Assembly Language
            Registers and instructions
            Calling conventions
            Parallel instructions
    Code Generation
        Goals
        Translation to JavaScript
        Translation to Assembly Language
            Naïve
            Interpretive
            Code generator generators
        Generation of real assembly language
            Address assignment
            Instruction selection
            Register allocation
            Low-level optimization
        Understanding the runtime system for block-structured languages
            Stack frames
            Dynamic links
            Static links
            Register save area
            Register spilling

PARSING THEORY
    What is parsing?
    Lexical vs Syntactic parsing
    Regular expressions
        In theory (type-3)
        In practice
            Common notation for Regexes in modern languages
            (   )   [   ]  {   }   ^   $   .   \   ?   *   +   |
            Uses: validation, search, extraction, replace
            Groups
            Quantifiers
                Eager: * + ? {}
                Reluctant: *? +? ?? {}?
                Possessive: *+ ++ ?+ {}+
            Backreferences  \1 \2 ...
            Anchors: ^ $ \A \Z \b \B
            Lookarounds: ?= ?! ?<= ?<!
            Performance concerns
    Approaches to parsing
        Top-down, LL, Expand-Match
        Bottom-up, LR, Shift-Reduce
    Recursive Descent
    PEGs
    Parsing in the Real world

COMPUTABILITY THEORY
    Concerned with what can and cannot be computed
    History: Hilbert, Gödel, Church, Turing
        Bernhardt book
        Wadler video
    So many equivalent models, all Turing-complete, hence the Church-Turing Thesis
    Of course there are uncomputable functions: Diagonalization
    Halting Problem is undecidable
    Limits
        Non-computable functions = Non-recognizable languages
        Non-decidable problems = Non-decidable languages
    Reductions
    Rice’s Theorem
    Chomsky Hierarchy: The Full Version
        Finite = S->a|b|c = Non-looping FAs
        Regular = Right Linear Grammar = Finite Automata
        Deterministic Context Free = LR = DPDA
        Context Free = CFG = (N)PDA
        Type-1 = Linear Bounded Automata
        Decidable (Recursive) = Turing Machines that always Halt
        Recognizable = r.e. = Turing Machines
        Finitely Describable (no machine out here)
        Beyond Finitely Describable 🤯

COMPLEXITY THEORY
    Concerned with how expensive certain computations are
    Time complexity
    Space complexity
    Theory
        Big-O, Big-Theta, Big-Omega
        Little-O, Little-Theta, Little-Omega
        Asymptotic Notation
        P vs. NP
        NP-Completeness
        The Complexity Zoo
    Practice: Optimization in Compilers
        Code Optimization
        Machine independent vs. machine dependent
        Constant folding
        Strength reductions
        Algebraic simplifications
        Operand reordering
        Unreachable code elimination
        Dead code elimination
        Copy propagation
        CSE
        Loop unrolling
        Special purpose instructions
            e.g. muladd, range, conditional jump
        Loop invariant factoring
        Tail recursion elimination
        Induction variable simplification
        Static frame allocation
        Stack frame simplification
        Low-level optimizations
            Special instructions
            Alignment
            Cache
            Removing conditional jumps
            Scheduling to remove load delays and similar things

Knowledge Check

On the course practice page, do the reinforcement problems related to the portions of the course that the exam will cover.

A Skills Check

Here are things you should be able to do before taking the final. Quiz yourself. Quiz each other.

Practice Exam

Do the Practice Final on BrightSpace! It is not as long as the real final, but making sure you know how to use BrightSpace as a platform for taking tests will help make sure you don't have a bad day and get frustrated by the technology during the real exam.

Advice for Success on the Exam

You have to put in the time for effortful self-study. Although the exam is open resources, you will not have time to look everything up. Those who come in with a strong comfort level with the material will finish on time. I am assessing your fluency and your proficiency with the material, not your Google-Fu.

Advice for Success in Computer Science

An education is a long-term life journey. Education goes way beyond your chosen field and way beyond academics in general. That said, there is much to be gained by immersing oneself in the history, theory, and practice, of computer science. Our culture is primarily literary, so to that end, you have been assigned a great deal of reading? Were you able to read or skim everything? I hope so, but if not, find time to catch up (or at least please consider catching up in the near future). Among the readings that will be helpful in your journey to becoming a computer scientist, review: