Do you like spaced repetition learning? Have you used Anki or Quizlet? Whether or not spaced repetition works, or works for you, periodically working on flash-card like questions can be a lot of fun, and just may help you retain information. Here are a few problems tied to the course material. Visit them periodically!
s → s* | "(" s ")" | "[" s "]" | "{" s "}"
define? Exp → Exp addop Term | Term Term → Term mulop Factor | Factor
break
and continue
in a loop (as this would require difference classes of statements)return
in a function body (as this would require difference classes of statements)8 * (13 + 5)
, how many source code characters are there? When tokenized, how many tokens are there? How many nodes are there in the abstract syntax tree? 8 * (13 + 5)
, assuming a grammar with categories named Expression, Term, Factor, Primary, and intlit. Expression | Term / | \ Term * Factor | / | \ Factor ( Expression ) | / | \ Primary Expression + Term | | | intlit Term Factor | | Factor Primary | | Primary intlit | intlit
8 * (13 + 5)
. * / \ 8 + / \ 13 5
/
operator of a PEG and the |
operator of a context-free grammar. (Hint: show a case in which the language defined by two “look-alike” grammars, one PEG, one CFG, are different. A ← "a" / "ab"
only matches "a"
, while the similar CFG defines the language {a, ab}.A = A "+" A --plus | "a"which looks ambiguous. Ohm actually makes the operator here be right associative, so given the fact that Ohm is an implementation and always parses strings exactly one way, then no, its grammars are not ambiguous; however, this isn’t really specified anywhere so maybe yeah, that grammar in some sense is ambiguous. See issues 55 and 56 for more information.
E = E • T --binary | T
E = T • E --binary | T
E = T • T
E = E binaryop "(" E ")" | num
a<b<c
. Some people think it should be automatically expanded to a<b && b<c
Exp7 = "-" Exp8 | Exp8 "**" Exp7 | Exp8
sqrt
into a standard library (as opposed to being wired into the language, or left to a “third party” library? A = B | C
differ from the context-free grammar rule $A \rightarrow B \mid C$? ?
, *
, and +
mean in Ohm? WhileStmt = "while" Exp Block
? How do we fix this problem? while = "while" ~idrest
and redefine the while statement rule as WhileStmt = while Exp Block
."//" (~"\n" any)* "\n"
Factor = "-" Primary -- negation | Primaryis actually an abbreviation for two separate rules. Give those two rules.
Factor = Factor_negation | Primary Factor_negation = "-" Primary
Exp = Term ("+" Term)*
fails to capture what aspect of the +
operator in the syntax? A ~B
matches an A
that is not followed by a B
. How do we match an A
that is followed by a B
(without consuming the B)? &(A B) A
g
is a grammar and s
is a string, what does g.match(s)
return? tsc
a compiler or a transpiler? Do we care? x = y * (2 + z)
. LOAD y LOAD 2 LOAD z ADD MUL STORE x
console.log(() => {if (true) return;})
this
. Also, the current set of imported packages, to resolve uses of imported entities.x ? y : z
of a typical statically-typed language, what type checking and inference rules would a compiler be required to enforce? x
must be boolean and the types of y
and z
must be compatible. Inference: the type of the entire expression is the least general type of both y
and z
.x < y < z
. So what exactly happens when the compiler encounters this code fragment (assuming all variables are in scope)? x < y
is checked. That is either a type error or a perfectly valid comparison assigned the type boolean
. Booleans can’t be compared anyway, so the entire expression is always a type error, detectable at compile time! But it is NOT a syntax error.^ $ . ? * + | ( ) [ ] { } \
^([A-Fa-f\d]).{1,6}\1$
e{25}$
]
, ^
, and -
are interpreted in a regex character class.s
flag affect a regular expression? s
is on, the dot matches the newline character.m
flag affect a regular expression? m
is on, the ^
and $
matches not only the start and end of a string, but the start and end of each line as well.\p{L}
working in JavaScript, what flag do you need? u
flag. This is needed because Unicode support for regular expressions was added late in JavaScript’s life, and had a new flag not been added to enable this interpretation, backward compatibility would have been lost.\p{L}
) or a $
, and the following characters are letters, numbers, dollar signs, or periods, but in which you cannot have two periods in a row.^
and $
stand for either the beginning and end of a line, OR the beginning and ending of the whole string, depending on a certain flag setting. How can we match the beginning and ending of a string, regardless of that flag setting? \A
and \Z
.?:
at the beginning of a parenthesized expression in a regex? (?!)
(?<=)
(?<!)
s.replace(/social(?=\s+distancing)/ig, "physical")
produce? s
with all occurrences of the phrase social distancing
—in any case, with any amount of whitespace between the two words—replaced with the phrase physical distancing
with the same whitespace between the two words.The following problems generally require some research and a bit of time to work out solutions.
Some of the problems may refer to languages you have never heard of! If so, you can try solving the same problem with a language you are familiar with, or, better, look up the basics of the unfamiliar language so that you can take your best shot at the problem.
These problems are courtesy of Phil Dorin.
s = (x y z)* -- repeat xyz 0 or more times x y = y x -- switch them up all possible ways x z = z x y x = x y y z = z y z x = x z z y = y z x = "a" -- erase the variables y = "b" z = "c"
s = (l r)? -- start with left part and right part l = "a" l? x -- generate a's on the left, counting them with x's r = "b" r "d" | y -- generate equal nums of b's and d's leaving one y in the middle x "b" = "b" x -- move the x's to the right, in order to generate c's x y = y "c" -- when the x hits the y, make a c for it y = ε -- erase the y to finish it off
s = "a" x? "b" z "c" -- initial set up x = "a" x "b" y -- if you generate an a, you must do b and c also | x? "b"? y -- generate bc or c (keeping things increasing) y "b" = "b" y -- move y's to the right y z = z "c" -- when y hits the z, make a c z = ε -- when the z is no longer needed, drop it
HELLO
.EVEN,a,a,R,EVEN EVEN,b,b,R,ODD ODD,a,a,R,ODD ODD,b,b,R,EVEN EVEN,#,#,L,ACCEPT
Don’t worry if there are languages here you don’t know. Do some research. Learn something new today.
Some of these problems refer to syntactic forms not covered in lecture, but should be answerable after studying my course notes on Syntax.
S → aab | aba | baa | aaSb | abSa | baSa | aSab | aSba | bSaa | SS
Among the reasons that I have wanted to write is that, many years ago, my colleague, Ray Toal, whom you know, passed along a set of problem solutions that he received while studying at UCLA. They were for the 181 course—his instructor at the time was a fellow named Gabriel Robins, which probably tells you how long ago this was!—and they contained an error that I had always meant to report to you. (It’s been so long now that it has probably been corrected, but I’ll sleep a lot better once I’ve sent this off.) Specifically, the problem was to give a cfg that generated the set of all strings over alphabet $\{a,b\}$ with exactly twice as many $a$s as $b$s, which, I believe, was also a problem in an earlier edition of Hopcroft and Ullman. In any event, he gave the following solution, which he attributed to Lui:
S → SS S → aaSb S → abSa S → baSa S → aSab S → aSba S → bSaa S → aab S → aba S → baa
Now, I am going feel awfully much like an idiot if I am wrong about this, but... how does this grammar produce the string $aaabbbbaaaaa$ (that is, three $a$s, followed by four $b$s, followed by five $a$s)? I have managed to prove to myself that it simply can NOT produce this string, and I wonder if I should trouble you to look at it and let me know. (Technically, the grammar is also missing a rule for producing the empty string, which is also in the language, but that’s another matter.)
I do believe that a correct grammar is:
S → [empty string] S → SS S → aSaSb S → aSbSa S → bSaSa
I’ve also worked the problem from the other direction: I constructed a npda, converted it to a cfg, and simplified it (by removing useless symbols, etc.)—but the resulting grammar doesn’t look anything like the above ones, so this didn’t provide much new insight.
Prof. Greibach gave a nice reply, and as part of it managed to state almost nonchalantly the “obvious” proof (at least to her—in a single sentence!) of non membership of $aaabbbbaaaaa$:
It does indeed fail on the example you gave since the first rule applied could not be any of those starting S → a... or S → b... and S → SS cannot be used because the example is not the concatenation of 2 words in the language.
IDLIST → ID ^ ","
This form can also be used to model a construct representing one or more $A$s, rather than using $AA^*$ or $A^*A$. Show how to do this.
Exp = Exp1 ("and" Exp1)* | Exp1 ("or" Exp1)* Exp1 = Exp2 (relop Exp2)? Exp2 = "-"? Exp3 (addop Exp3)* Exp3 = Exp4 (mulop Exp4)* Exp4 = Exp5 ("**" Exp5)? | "not" Exp5 | "abs" Exp5 comment = "--" ~"\n" any
and
and or
?
X and Y or Z
. (Assume, of course, that an Exp5 can lead to identifiers and numbers, etc.) If this is not possible, prove that it is not possible.
not
operator right associative? Why or why not?
-8 * 5
.
- Exp5
to Exp4. Give the abstract syntax tree for the expression -8 * 5
according to the new grammar.
Suppose I wanted to add a new one:
Show how to write $A \# B \# C$ using only the conventional EBNF markup.
"oo"
(or "oO"
or "Oo"
or "OO"
).
"oo"
.
"oo"
. Do this by matching against a regular expression.
[01]*(10111[01] | 11[01][01][01][01])[01]*
([bc]*a[bc]*a[bc]*)*
0*1 | 0*10
c*a[ac]*b[abc]*
"return"
nor "retry"
"exit"
nor "exec"
"exit"
nor "exec"
\b
for word boundaries) that are preceded by the word “the
”.IFSTMT → 'if' '(' EXP ')' BLOCK ('else' 'if' '(' EXP ')' BLOCK)* ('else' BLOCK)? BLOCK → '{' STMT* '}'What if we tried the same approach in a language with a syntax like Ruby (or Fortran or Modula — languages using a terminating end)? We might get a grammar like this:
IFSTMT → 'if' EXP 'then' STMT+ ('else' 'if' EXP 'then' STMT+)* ('else' STMT+)? 'end'Is this grammar left recursive? Is it $LL(k)$? Why or why not? Is this bad?
A → B C B → a | b?c? C → c | BA
If this grammar is not LL, make one that is (that defines the same language of course). Give a set of syntax diagrams for the original diagram, and if another is needed, for the new grammar as well.
S = A M M = S? A = "a" E | "b" A A E = ("a" B | "b" A)? B = "b" E | "a" B B
"abaa"
Exp ::= id ':=' Exp | Term TermTail TermTail ::= ('+' Term TermTail)? Term ::= Factor FactorTail FactorTail ::= ('*' Factor FactorTail)? Factor ::= '(' Exp ')' | id
Exp
expand to a string beginning with id
.X
, make it so that X^ = X
. If this is not possible, state why it is not possible.
X
, make it so that X.all = X
. If this is not possible, state why it is not possible.
x
of type x
such that x.x = x
, or state why this is impossible.
x
of type x
such that x.x == x
, or state why this is impossible.
(x += 7) *= z
but you can’t say this in C. Explain the reason why, using precise, technical terminology. See if this same phenomenon holds for conditional expressions, too. What other languages behave like C++ in this respect?
continue
statement of C.
type Real_To_Real is access function (Real) return Real; type Foo is access procedure (Integer; in out Boolean); Sine, Cosine: Real_To_Real; P: Foo; Q: Real_To_Real; function Integrate (F: Real_To_Real; A, B: Real); ... function Square (X: Real) return Real is begin return X * X; end; ... Put (Integrate(Square'Access, 3, 10)); Q := Cosine; if Q(Pi) > X then ...
Describe the semantic rules relating to this facility in Ada, and how you would enforce them in a compiler.
A := (3)
gives a static semantic error when $A$ is a one-element array of Integer. Why is this so? Propose a (trivial) syntactic extension to Ada that would remove this irritation.
X: Integer := X + 1; Foo: Foo; Bar: Real := Bar(Foo);
(where global declarations of X
, Foo
and Bar
are visible) are all illegal, since a declaration of an identifier hides global declarations of the same name immediately at the point it appears in the text, but the identifier may not be used until its declaration is complete. Give an alternate interpretation under which these declarations would be legal and explain the advantages and disadvantages of it from both the programmer’s and the compiler writer’s perspectives.
int f(int x, char y); void g(int x) {if (x < 0) f(2, 'c');} int f(int x, char y) {g(randomInteger());}
In C++, when f
is finally declared, the names of the formal parameters don’t have to be repeated exactly as they appeared in the incomplete specification. But in Ada they do. Explain why the Ada rule makes life much easier for the compiler writer.
DESIGNATOR → DESIGNATOR "." IDfor specifying variables made up from a record and a field of the record. But sometimes it can have the additional interpretation that the
DESIGNATOR
to the left of the dot was the name of a (visible) subprogram and the ID was an object declared immediately inside that subprogram. Show how to rearchitect the entity class hierarchy to support this.
let [x, y] = Array.repeat(10|-2, 2); function f({x, y}, ...p) { return q => `"Say ${y/p[0].x} today`; }
static protected synchronized long g(Object... m) { for (int y : f(x)) { x = p.data[0] * (3<< 7|- x---c); } }
for (int i = x-3; q<=4&m.z[r |- 4]&2-8*r>- 5/~x;) { while (a) { y; 2,y; } }
void f(int x,...) { struct e { double x; struct e *c[10]; char* (*f)(); }; struct e p; exit(p.c[1]->f()[6 |~ x+2 >> x]); }
int abc() { return x = 4&x---*&y.m[-9]; }
if (x > 2 || !String.matches(f(x))) { write(-3 * q); } else if (! here || there) { do { while (close) tryHarder(); x = x >>> 3 & 2 * x; } while (false); q[4].g(6) = person.list[2]; } else { throw up; }
package p; class C implements A { public static A x = new t[3]; Socket s () { while (x - 6>p | e || q +- p) { this.x[3] = !v+++t; } } {System.out.println("ooh");} }
(a = 3) >= m >= ! & 4 * ~ 6 || y %= 7 ^ 6 & p
double f(int x, double y) { return 4 * x + y; }
je L6 jmp L4 L6:
with the single instruction jne L4
?
X := Y;
where X
and Y
are both 32-bit integers that are one step down the static chain from the current subprogram, by a code generator which emits access code for the two values independently. Assume $X$ is at offset $-8$ and $Y$ is at offset $-12$. How many registers are used? Then generate code for this statement by hand, intelligently.
type array (21..38) of String(1..10)and happened to have offset $-42$ in the frame of the subprogram in which it was declared. Suppose further that the variable
J
was declared in the same subprogram and had offset $-26$.
A(J-1)
into register eax that would be generated naïvely. Do not forget to show the bounds checking!
A(J-1)
into register eax in which the "-1" computation is folded in to the computation of the base address of A. Note that the bounds checking code will look a little different than in part (a).
mov [ebp-8], eax mov eax, [ebp-8]
The second instruction might be able to be removed. But whether we are able to remove this instruction is undecidable. Why, exactly?
enter
instruction which automatically makes a display. Research this instruction. Suppose a Carlos program had the following structure (indentation determines nesting):
function f, parameters: [x,y], locals: [a] function g, parameters: [c], locals: [p,q,r,s] function h, parameters: [a], locals: [] function k, parameters: [], locals: [z]
enter n, 0 enter n, 1 enter n, 2 enter n, 3 ...
and so on. The second column will be the number of clock cycles required on a Pentium for the particular ENTER instruction. The third column will be code equivalent to the ENTER instruction. For example, ENTER n, 1 is equivalent to:
push ebp mov ebp, esp push ebp sub esp, n
The fourth column will be the number of clocks for the code in column 3.
x / y > (3 * x) || z || x < 3
where the "||" operator is short-circuit, and the variables $x$, $y$, and $z$ are all integer variables. Put the value of the expression in eax. Write the best possible code you can for the Pentium 4 processor.
LEA
instruction for the 3n+1 computation:
int C(int n) { int count = 0; while (n != 1) { n = (n % 2 == 0) ? n / 2 : 3 * n + 1; } return count; }
if (x % 4096 == 0) {printf("Don't say \66;\6f;\6f;!");}
Hint: you need strength reduction, too.
sar eax, 10 to divide by 1024 sar eax, 8 to divide by 256
This optimization is not safe. Explain why. Show how to make it safe, and explain both why your optimization works and why it is safe.
for j := 5 to y do y := j * 7 + c; printInteger(y - 4); end loop;
where y
and c
are local variables in the current procedure at offsets -12 and +16 respectively. Remember that the range is evaluated only once, the whole loop is skipped on the empty range, etc.). Make sure you respect the overflow semantics! Identify any induction expressions and explain how you optimized them. Compare your hand-written code with that generated by a real compiler.
fyl2x
and fldl2t
instructions.
double f(double a, double b);
mov eax, n shl eax, 23 add eax, 3f800000h mov [esp-4], eax fld dword [esp-4]
y := x * 4 + z; z := p * y; y := z; x := z / y << x;
Weekdays := Day_Set(False, True, True, True, True, True, False);
we could construct the aggregate directly in the variable Weekdays
. Give an example of an assignment statement that illustrates the necessity of constructing an aggregate in temporary storage (before copying to the target variable).
for x (a) {printf("*"); x = x++;}
in Hana.
x---y
x-----y
x+++-y
x---+y
null instanceof C
!!x
x < y < z
in Hana, where x and y are ints and z is a boolean.
x < y < z
in C, where x and y and z are all ints.
3[a]
in Hana, where a
is an array variable.
3[a] in C
, where a
is an array variable.
char x = '\a';
in Hana.
char x = '\a';
in C.
sin
standard function to an integer
x < y < z
where $x$ and $y$ are integers, and $z$ is a boolean.
L1: r0 := x z := 6 r1 := 4 - r0 r2 := 3 >= r1 if r2 == 0 goto L2 r3 := y + 4 r4 := *r3 z := r4 L2:
struct s {int x; int y; string s;} s a = new s { codepoint(getChar()), codepoint(getChar()), getString()}; while (a.x++ < a.y) {print($a.s[1]);}
for (int i = 0; i < #a; i++) { print($a[i]); print(", ") if (i != #a-1); }
With optimizations turned off, my compiler produces:
p0: copy 0, i1 L0: copy [i0-4], r0 less i1, r0, r1 jz r1, L1 assert_not_null i0 copy [i0-4], r2 assert_in_range i1, 0, r2 mul i1, 4, r3 add i0, r3, r4 copy [r4], r5 to_string r5, r6 param r6 call __print, 4 copy [i0-4], r7 sub r7, 1, r8 not_equal i1, r8, r9 jz r9, L2 param s0 call __print, 4 L2: inc i1 jump L0 L1: exit s0: [44, 32]
x || y || z
under this assumption.
PROGRAM → (DECL ';')* EXPR DECL → 'val' ID '=' EXPR | fun ID '(' PARAMS? ')' '=' EXPR EXPR → NUMLIT | ID | UOP EXPR | EXPR BOP EXPR | EXPR '?' EXPR ':' EXPR | ID '(' ARGS? ')' | '(' EXPR ')' PARAMS → ID (',' ID)* ARGS → EXPR (',' EXPR)* UOP → '-' | 'abs' | 'not' BOP → '+' | '-' | '*' | '/' | 'mod' | 'and' | 'or' | '==' | '<'
Exp → Exp Exp op | intlit op → "+" | "-" | "*" | "/"
G -> (S s G)? S -> V q e | i f E g | V x V -> i | V d i | V a E a E -> n | V
B -> (a|b)*A | bba*c A -> Ac | d
down deg color 1 0 0 left 90 forward 4 color 0 0 1 [ left 90 forward 1.5 ] right 90 forward 1.5 up
This program draws the letter T with a red vertical line of size 4 units and topped with a 3 unit blue line. A program is a sequence of instructions. The instructions are:
Instruction | Description |
---|---|
deg | switch to degree mode |
rad | switch to radians mode |
down | put the pen down so movements draw lines |
up | pick the pen up so movements don't draw anything |
left θ | turn counterclockwise by angle θ
|
right θ | turn clockwise by angle θ
|
forward n | draw a line by moving forward n units. |
backward n | draw a line by moving backward n units. |
color r g b | set color (r,g,b), values are floats in the range 0 to 1. |
[ | save current state |
] | restore previously saved state |