We’ll start very simply, with a language that just has integers, addition and subtraction, and print statements. Programs look like this:
print(8 - (3 + 21) + 34); print(55 - 3 + 2);
The concrete syntax can be given by this grammar:
Program = Stmt+ Stmt = "print" "(" Exp ")" ";" Exp = (Exp ("+" | "-"))? Term Term = num | "(" Exp ")" num = digit+
When defining a semantics we don’t care about concrete syntax, only abstract syntax. So here is the abstract syntax of this tiny language, which is all we care about moving on:
$\begin{array}{l} n: \mathsf{Nml} \\ e: \mathsf{Exp} = n\;|\;e + e\;|\;e - e \\ s: \mathsf{Stm} = \mathtt{print}\;e \\ p: \mathsf{Pro} = \mathtt{program}\;s^+ \\ \end{array}$
Need Practice?
For the example program in this section, give both the parse tree and the abstract syntax tree.
A denotational semantics assigns to each syntactic phrase, its meaning. Sounds simple right? So what’s the meaning? For now, we will take:
Hmmm, we said “domain” and not “set.” We’ll define what domains are later. You can think of them as sets for now. But “sort-of” sets. As a first pass, think of $\mathsf{Num}$ as something like $\mathbb{R}_\bot$, the “lifted” set of real numbers with a special $\bot$ value. (Yes, we’ll define lifting later.)
Stop right now and make absolutely sure you know the difference between the syntactic domains (Nml, Exp, Stm, Pro) and the semantic domains (Num, Output).
Domains vs. SetsWhile we will clarify the difference later, for now, remember not to use sets like $\mathbb{R}$ and $\mathbb{B}$ directly in denotational semantics. Instead, you will need to use domains like $\mathbb{R_{\bot}}$ and $\mathbb{B_{\bot}}$, which are almost always abbreaviated as $\mathsf{Num}$ and $\mathsf{Bool}$.
When reading a denotational semantics, it is understood that “everything is a domain,” so it is best not to use sets anywhere.
Note that domains are not used in operational semantics.
Now let’s write the semantic functions that give meaning to the abstract syntactic constructs:
Each of the semantic functions map a syntactic construct to a semantic domain:
Defining Semantic FunctionsNote that the following two lines mean the same thing:
$\mathscr{S}\,[\![ \mathtt{print}\;e ]\!]\,o = o\!\cdot\!\mathscr{E}\,e$
$\mathscr{S}\,[\![ \mathtt{print}\;e ]\!] = \lambda o. o\!\cdot\!\mathscr{E}\,e$
Feel free to use whichever style you like.
Remember Astro?
Here’s the abstract syntax:
What do we need for semantics? Well, we have variables, so expressions will have to be evaluated in the context of a memory:
Statements can affect both the memory (via assignment) and the output (via printing), so a statement transforms a memory and output into a new memory and output:
The meaning of a program is the output it produces:
The output is a sequence of numeric values. A memory stores a value for each identifier, as well as attributes of the identifier such as whether it is writable or not. Values can be numbers, in which case they can be marked read-only or read-write, or functions of $n$ numeric values producing a numeric result:
Remember, we are working with domains, not sets, so all of our domains—Num, Output, etc.—have a special $\bot$ value. The $\bot$ can represent an error in an operation, or some undefined or unknown value, or, in the case of a value looked up from memory, a value that was not found. And all of the operations on domains, such as addition and subtraction on the domain $\mathsf{Num}$, always automatically produce $\bot$ when one of their inputs is $\bot$. This allows a surprisingly concise definition of the semantic functions:
It is not as bad as it looks, but it definitely is concise.
We’ve introduce a new mathematical operator, $\mathsf{match}$ which should be familiar to Python, Swift, Rust, or OCaml programmers. The mathematical expression $\mathsf{match}\;e_1=e_2\;\mathsf{in}\;e_3$ evaluates to $e_3$ after binding variables in $e_1$ and $e_2$ with each other, or to $\bot$ if nothing matches. As in Python and other programming languages, $e_2$ can have multiple alternatives.
We will go over each rule in class.
Let’s add a while statement to Astro and try to give its semantics. The naive approach fails, though:
$\mathscr{S} [\![ \mathtt{while}\;e\;s ]\!](m,o) = \mathsf{if}\;\mathscr{E}\;e\;m = 0\;\mathsf{then}\;(m,o)\;\mathsf{else}\; \mathscr{S} [\![ \mathtt{while}\;e\;s ]\!](\mathscr{S}\;s\;(m,o))$
This definition is non-compositional. Not a good look. We can get rid of the self-reference like this:
or a little more formally:
But the problem is that there may be many such functions $w$ that satisfy the desired condition. But when we give a semantics, we want to be able to say there is one, and exactly one, such function. Let’s assume the existence of a function we will call $\mathit{fix}$ that will give us not just a fixed point of a function, but exactly the fixed point we want. If this function could actually be shown to exist, we could write:
While it may look odd at first, but once you get used to it, it’s not so bad.
But what the heck? Why in the world would we ever assume such a function $\mathit{fix}$ even exists, let alone how would we ever go about finding it? That’s a deep, but extremely important question. Short answer is “because we are using domains and not not sets” but this supposes we’re all comfortable with what domains are, and that will wait until we cover domain theory below. We are not ready though. Let’s go back into our comfort zone and look at a more complex language.
Remember Bella? We saw the official definition of the language earlier with its formal syntax and formal operational semantics.
It’s time for a denotational semantics. We’ll repeat the abstract syntax here for reference:
Let’s see what the semantic functions should look like:
Here we rolled up everything manipulated by a statement (a memory and an output) into a new domain called $\mathsf{State}$ (Turing would be proud: programs are like state machines, right?). We also need a new kind of value for Bella: the user-defined function. Such functions are a list of parameters and an expression for the body. So we define:
Here is a fragment of the definitions of the semantic functions:
TODO
Finish the semantic function definitions for Bella.
TODO
TODO
TODO
TODO
TODO
Many real programming languages have sophisticated type systems. How do we define what they mean?
Suppose we wanted to define Bella so that expressions would produce not only numbers, but numbers or booleans. Suppose we also added lists to the language. As in Python. Dynamic types, and all that. And strong typing, too: so type mismatches cause errors rather than type conversions. We begin with a slight change to the signature for the semantic function for expressions:
TODO
And then we enhance the semantic domain of values:
TODO
TODO
We made a lot of assumptions in the semantic definitions above, among them:
If we’re going to be able to use the mechanisms of denotational semantics, all this stuff has to be grounded in rigorous mathematical foundations. Fortunately, such foundations do exist. First, let’s motivate them.
TODO
TODO
TODO
We’ve covered: