Programs, are constructed form entities. Entities refer to, store, and manipulate values. Some entities are pure language structures, like declarations, expressions, and statements. Literals are entities that directly represent values. But other entities can be named.
A binding is the attachment of a name to an entity.
Examples of names | Examples of entities |
---|---|
|
|
Names allow us to refer to entities.
Note how names can be alphanumeric or symbolic.
Fun fact: In most languages, the set of symbolic names are fixed, but in a few, like Scala, Swift, Haskell, and Standard ML, you can define your own.
A declaration brings a name into existence, but a definition creates an entity and the initial binding.
extern int z; // just a declaration class C; // just a declaration double square(double); // just a declaration void f() { // declaration plus definition together int x; // also a declaration plus definition int y = x + z /square(3); // so is this } double square(double x) { // definition completing earlier declaration return x * x; }
The purpose of a definition-less declaration is to allow the name to used, even though the definition will be somewhere else (in a place you might not even know).
Here is something very tricky but very, very important. What do you think the meaning of the following is?
x ← 3 x ← 5
Is there one entity here? Or two? If you are a language designer, it’s your choice!
If one, we are changing the value of an entity, which we call assignment, and in which case we call the entity an assignable. (Assignment can be dangerous, because it makes programs harder to reason about. But that’s a different story.)
Here are two languages that do things differently. JavaScript does assignment:
let x = 3 // create entity, value is 3, bind x const f = () => x // f returns value of entity bound to x x = 5 // assignment: replace value of entity f() // ==> 5
And Standard ML does rebinding:
val x = 3; (* create entity, bind to x *) val f = fn () => x; (* f returns value of entity NOW bound to x *) val x = 5; (* COMPLETELY NEW BINDING *) f(); (* ==> 3, because value of original binding *)
Instead of rebinding a name to a new entity, what if we allowed the same name to refer to different entities at the same time? Oooooh. And, and.... turning things around, what if we allowed multiple names to be bound to the same entity at the same time?
We have words for these things:
Same name bound to multiple entities at the same time
Multiple names bound to the same entity at the same time
And pictures:
A precise way to say “at the same time” is “within the same referencing environment.” Technically, a referencing environment is the set of bindings in effect at a given point.
In a few languages, multiple functions can have the same name at the same time provided their signatures (parameter names, types, orders, etc.), or their receiver class, differs. Check out this C++ example:
bool f(string s) { return s.empty(); } int f(int x, char c) { return x * c; } int main() { cout << f("hello") << '\n'; // 0 cout << f(75, 'w') << '\n'; // 8925 }
Polymorphism is interesting because, when given a name, a program has to find the right entity for the name to use! If the right entity can be determined at compile time, like in the example above, we have static polymorphism. If it can only be determined at run time, we have dynamic polymorphism, like in this Ruby example:
class Animal; end class Dog < Animal; def speak() = 'woof' end class Rat < Animal; def speak() = 'squeak' end class Duck < Animal; def speak() = 'quack' end animals = (1..50).map{[Dog.new, Rat.new, Duck.new].sample} animals.each {|a| puts a.speak}
Here’s another example of dynamic polymorphism, this time in Julia. It may look like static polymorphism, but Julia checks those types at run time:
function f(s::String) reverse(s) end function f(x::Int, y::Char) x * Int(y) end print(f("Hello")) # "olleH" print(f(75, 'π')) # 72000
Evaluating that script in the console shows you Julia has “methods,” a word commonly associated with dynamic lookup (we’ll hear more about this when we cover OOP):
julia> function f(s::String) reverse(s) end f (generic function with 1 method) julia> function f(x::Int, y::Char) x * Int(y) end f (generic function with 2 methods) julia> f("Hello") "olleH" julia> f(75, 'π') 72000
Aliasing is pretty common in language with explicit pointers:
#include <iostream> using namespace std; int main() { int x = 5; int* p = &x; cout << p << '\n'; // 0x7fff5febb0fc cout << &x << '\n'; // 0x7fff5febb0fc cout << x << '\n'; // 5 cout << *p << '\n'; // 5 }
A pointer is the name of a reference object that has a numeric address that you can manipulate with arithmetic operations. At least, that’s the general definition—different languages tend to tweak the vocabulary.
C++ has explicit aliases! Perhaps unfortunately, they are called references. This is kind of annoying because in basic PL speak, a reference is a value that refers to another value. But C++ does not use the regular terminology. Instead it confuses you by taking a perfectly good term and uses it for what everyone else calls an alias.
#include <iostream> using namespace std; int main() { int x = 5; int& y = x; // x and y refer to the same entity x = 3; cout << x << ' ' << y << '\n'; // 3 3 y = 8; cout << x << ' ' << y << '\n'; // 8 8 }
The picture for this one is easy:
Bindings have (generally spatial) scope and (temporal) extent. The scope of a binding is the region in a program in which the binding is active. We say scope is “generally” spatial because:
Regardless of whether scoping is static or dynamic, the deal is this: when you leave a scope, the binding is deactivated.
Static scoping is determined by the structure of the source code:
┌──────────────────────────────────────────────┐ procedure P │ (X: Integer) is │ ┌─────────────┘ │ │ A, B: Integer; │ │ ┌────────────────────────────────────────┐ │ │ procedure Q │ (Y: Integer) is │ │ │ ┌─────────────┘ ┌───────────────────────────────────┐ │ │ │ │ function R │ (X, Y: Integer) return Integer is │ │ │ │ │ ┌────────────┘ ┌───────────────┐ │ │ │ │ │ │ procedure T │ is │ │ │ │ │ │ │ ┌─────────────┘ │ │ │ │ │ │ │ │ T: Integer; │ │ │ │ │ │ │ │ begin │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ │ │ end T; │ │ │ │ │ │ │ └─────────────────────────────┘ │ │ │ │ │ │ W: Float; │ │ │ │ │ │ begin │ │ │ │ │ │ ... │ │ │ │ │ │ end R; │ │ │ │ │ └────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────┐ │ │ │ │ procedure S │ (Y: Integer) is │ │ │ │ │ ┌─────────────┘ │ │ │ │ │ │ begin │ │ │ │ │ │ ... │ │ │ │ │ │ end S; │ │ │ │ │ └────────────────────────────────────────────────┘ │ │ │ │ begin │ │ │ │ ... │ │ │ │ declare X: Integer; begin ... end; │ │ │ │ ... │ │ │ │ end Q; │ │ │ └──────────────────────────────────────────────────────┘ │ │ M: Integer; │ │ begin │ │ ... │ │ end P; │ └────────────────────────────────────────────────────────────┘
The general rule is that bindings are determined by finding the “innermost” declaration, searching outward through the enclosing scopes if needed.
Interestingly, not everything is obvious! There are reasonable questions regarding:
Where exactly a static scope begins and ends has at least four reasonable possibilities:
let
or const
), or the whole function (like JavaScript var
)
var x = 3 var y = 5 function f() { print(y) var x = x + 1 var y = 7 print(x) } f()What is the effect of compiling and running the code under each of the four possibilities above?
In static scoping, “inner” declarations hide, or shadow outer declarations of the same identifier, causing a hole in the scope of the outer identifier’s binding. But when we’re in the scope of the inner binding, can we still see the outer binding? Sometimes, we can:
In C++:
int x = 1; namespace N { int x = 2; class C { int x = 3; void f() {int x = 4; cout << ::x << N::x << this->x << x << '\n'; } }
In Ada:
procedure P is X: Integer := 1; procedure Q is X: Integer := 2; begin Put(X); -- writes 2 Put(P.X); -- writes 1 end Q; end P;
However, some languages have restrictions on hiding declarations:
void f() { int x = 1; if (true) { int x = 2; } }
is fine in C, but illegal in Java, because Java does not allow more than one declaration of a local variable within a method.
JavaScript has a somewhat interesting take on the interplay of scopes and nested blocks, with very different semantics between declarations introduced with var
, const
, or let
.
var
and let
in JavaScript, as it pertains to scope. You should run across something called the temporal dead zone (TDZ). Explain the TDZ in your own words.
Generally, languages indicate declarations with a keyword, such as var
, val
, let
, const
, my
, our
, or def
. Or sometimes we see a type name, as in:
int x = 5;
or
X: Integer := 3;
When we use these explicit declarations in a local scope, things are pretty easy to figure out, as in this JavaScript example:
let x = 1, y = 2; function f() { let x = 3; // local declaration console.log(x); // uses local x console.log(y); // uses global y, it’s automatically visible } f(); // writes 3 and 2
Easy to understand, for sure, but be super careful, leaving off the declaration keyword in JavaScript updates, and introduces if necessary, a global variable:
let x = 1; function f() { x = 2; // DID YOU FORGET let? } // Updated global variable! Accident or intentional? f() console.log(x);
Forgetting the declaration-introducing keywords has happened in the real world, with very sad consequences.
But in Python, there are no keywords and no types. Declarations are implicit. How does this work?
x = -5 def f(): print(abs(x)) f() # prints 5
Sure enough we can see the globals abs
and x
inside f
. Now consider:
x = -5 def f(): x = 3 f() print(x) # prints -5
Aha, so if we assign to a variable inside a function, that variable is implicitly declared to be a local variable. Now how about:
x = -5 def f(): print(x) x = 3 f() print(x)
This gives UnboundLocalError: local variable 'x' referenced before assignment
. The scope of the local x
is the whole function body!
Now Ruby’s declarations are also implicit. Let’s see what we can figure out here:
x = 3 def f() = puts x # Error when called! f()
WOAH! NameError: undefined local variable or method `x' for main:Object
. We can’t see that outer x
inside of f
! If we want to play this game, we must use global variables, which in Ruby start with a $
:
$x = 3 def f(); x = 5; y = 4; p [$x, x, y] end # writes [3, 5, 4] f()
So assigning to a local declares it, and variables starting with a $
are global.
CoffeeScript also has no explicit declarations. But it has a controversial design. If you assign to a variable inside a function, and that variable does not already have a binding in an outer scope, a new local variable is created. But if an outer binding does exist, you get an assignment to the outer variable.
x = 3 f = () -> x = 5 f() # Plasters the global x console.log(x) # Writes 5 g = () -> y = 8 g() # Defines a local y which then goes away console.log(y) # Throws ReferenceError: y is not defined
WTF? So if I write a little utility function that uses what I think are local variables, then I had better be damn sure that function does not find itself used in a context where it might actually affect global variables?
If an identifier is redeclared within a scope, a language designer can choose to:
Perl and ML actually retain the old bindings; in other words, redeclarations introduce new entities:
# Perl my $x = 3; # declare $x and bind it sub f {return $x + 5;} # uses the existing binding of $x my $x = 2; # declare a NEW entity and thus a new binding print f(), "\n";
(* SML *) val x = 3; (* declare x and bind it *) fun f() = x + 5; (* uses existing binding of x *) val x = 2; (* declare a NEW entity and thus a new binding *) f();
Both of these programs print 8, because the bindings are new:
But in some languages, the second declaration is really just an assignment, rather than the creation of a new binding. In such a language, the code implementing the above example would print 7, not 8.
With dynamic scoping, the current binding for a name is the one most recently encountered during execution. That is, scope becomes temporal rather than spatial. The following script prints 2, not 1:
# Perl our $x = 1; sub f { print "$x\n"; } sub g { local $x = 2; # `local` makes dynamic scope (use `my` for static) f(); } g(); print "$x\n";
Dynamic scopes tend to require a lot of run time overhead. The runtime system has to keep track of names in a structure that can be searched at run time to find the desired entity. There are two kinds of structures typically used to manage such scoping:
We’ll illustrate both with the following example:
// An example in which main calls q, then q calls p.
var i;
var j;
function p(var i) { ... }
function q(var j) {var x; ... p(j); ...}
q();
The association list, or A-list, works like this:
Generally speaking, scope entry and exit is fast, lookup is slow.
The central reference table works like this:
Generally speaking, scope entry and exit is slow, lookup is blazingly fast (assuming your hash function is cheap).
The principal argument in favor of dynamic scoping is the way they allow functions to be customized by using an “environment variable” style that allows selective overriding of a default (more or less). Michael Scott gives this example:
# Perl our $print_base = 10; sub print_integer { ... # some code using $print_base ... } # So generally we want to use 10, but say at one point # we were in a hex mood. We wrap the call in a block like this: { local $print_base = 16; print_integer($n); } # At the end of the block the old value is essentially restored
The lifetime of an entity (also known as its extent) is just what it sounds like: the period (of time) from its creation (allocation) to its destruction (deallocation). With regard to lifetime, we can imagine different kinds of entities:
In the world of programming language implementation, we often see storage partitioned into regions for different types of entities based on their lifetimes:
Region | Entities are... | Examples | Notes |
---|---|---|---|
Static | Allocated at fixed location in memory when program starts and live forever |
|
|
Stack | Allocated and deallocated automatically at subroutine calls and returns |
|
|
Heap | Allocated and deallocated dynamically |
|
|
Here’s an example showing each kind of storage class in action. We’ll work through it in class.
int x = 5; // Static void f() { static float y = 0; // Static - this is pretty fancy, eh? int z; // Stack y++; printf("abcdef"); // Literal probably static } // z deallocated here void h(int* a) { // a on stack, *a don’t care! . . . *a = 5; a[6] = 3; a = new int; // doesn’t affect old entity } // a goes away here, but not *a void g() { int* p = new int; // p on stack, *p on heap . . . delete p; // explicit deallocation of *p int* q = new int[10]; // q on stack, *q on heap . . . h(q); . . . delete[] q; } int main() { f(); f(); g(); }
Do not confuse the lifetime of an entity (entity lifetime) with the lifetime of a binding (binding lifetime). Yes, bindings have lifetimes too: a binding lifetime is the duration in which the binding is active.
And very important: The lifetime of a binding can be different from the lifetime of the entity to which the name is bound. The next two sections will explore this in more detail.
Entities outlive bindings all the time, and is usually normal. Here’s an example:
void f(int& x) {x = 5;} . . . int y; // create an entity and bind y to it f(y); // during execution of f, x is also bound to the entity cout << y; // x is no longer bound to the entity, but y still is, fine
But sometimes it can be a problem:
void f() { char* p = malloc(1024); } // Ooops!
This function allocates space for a new entity that lives, well, in C at least, (i.e., until the end of the process). But the binding of p
to that entity is gone when the function returns. In C, this is called a memory leak and it is a Bad Thing. In (most) other languages, a (built-in) garbage collector can determine when dynamically allocated entities are no longer reachable, and will destroy those entities on its own. C programmers do not have a built-in garbage collector, so they have to:
malloc
/free
; or
Okay, if you think about it, bindings outliving entities is almost always a serious bug. One way it can happen is by explicitly deleting the entity while a binding is still active. Some languages prohibit explicit deallocation, but C allows it, making possible the following problem, the dangling pointer (also known as a dangling reference):
int* a; // create pointer entity and bind name 'a' to it a = malloc(sizeof(int)); // create int entity and bind name '*a' to it *a = 2; // assign to the int entity through the name '*a' free(a); // this destroys the int entity *a = 3; // NOOOOO! Binding still there; name *a attached to garbage
This bad enough, but there’s another way you can get dangling pointers; it’s one of the most evil and horrible and sinister things that happen in C programs. It goes like this:
char* get_some_data(FILE* f) { char buffer[1024]; read_at_most_n_bytes_from_file(f, &buffer, 1024); return &buffer; // NOOOOOOOOOOOOOO!!!!!!! }
The problem here is that, when a function finishes execution in C, all of its internal storage, that is, all of the entities bound to its local variables, are destroyed when the function exits. This code is returning a pointer to a destroyed entity. So this disaster can occur:
char* text = get_some_data(myfile); char secondCharacter = text[1]; // UH-OH, text[1] is bound to ... what?
get_some_data
(Hint: define buffer
differently).
Returning pointers to local variables is a special case of a more general problem: pointer variables in long-lived contexts pointing to data in shorter-lived contexts. Here’s another example:
int* p; void f() { int x; p = &x; // Assigning pointer-to-local to a global pointer, BAD } int main() { f(); // NOOOOOOOO!! At this point, what is *p? }
Dangling pointers are hideous. Evil. For more ways to create them, and for more problems they can cause, see the Wikipedia article on dangling pointers.
Knowing that the primary ways that dangling pointers occur are (1) via pointers to locals and via (2) permitting programmers to explicitly deallocation objects through a pointer, we can start to imagine how, as language, designers, we might prevent them from ever occurring.
How can we make it impossible for a pointer variable with a long lifetime to point to an object with a shorter lifetime? Here are some approaches:
Woah, had you even thought of Option 4? The language Go does this!
package main import "fmt" type Point struct{x, y int} func ExampleEscape() { var g *int // g means "global(ish)" f := func(x int) *Point { a := 5 // stays local b := 8 c := &b g = c // b escapes! return &Point{x, a * 5} // this escapes too! } fmt.Print(*f(2)) fmt.Println(*g) // Output: {2 25}8 }
If you only knew C, you might think this is terrible code! Pointers to both the local variable b
, and a locally-allocated point object within the inner function f
are passed out of the function: they escape. But this is not a problem in Go. The compiler recognizes this, and simply allocates b
and the point object on the heap.
Note the thinking used by Go’s designers. They did not accept the fact that all local variables had to be allocated on the stack, even though that’s how things were always done in the past (in C and C++, that is). Always be amenable to questioning the conventional wisdom, and don’t trust those that say “this is the way we do things so shut up and conform.”
To avoid the other kind of dangling pointer creation, using a pointer variable after its target has been explicitly destroyed, you could:
Option 3 gives us safety and efficiency. Rust is famous for this. Let’s look at the most basic of basic examples only. The full story is very rich.
This Rust code does not even compile (the type Vec
is not copyable):
fn main() { let a = vec![0, 1, 2]; // owned by variable a let b = a; // owned by variable b println!("{:?}", b); // works great, b is the owner println!("{:?}", a); // WILL NOT COMPILE }
Every time you assign, or pass as an argument, or return from a function, a non-copyable value, ownership
is transferred. We call this a move
. Non-copyable objects are implicitly pointed to. Rust prohibits non-owners from accessing the entity in any way. In addition to simply transferring ownership, Rust provides for borrowing:
fn main() { let a = vec![0, 1, 2]; // owned by variable a let b = &a; // b is just borrowing println!("{:?}", b); // works great, b is borrowing println!("{:?}", a); // works great, a still owns it }
delete
operator. Research it. It’s not the same as the C++ delete
, right? Can that cause a dangling pointer? Why or why not?
Here’s a situation that looks like a cause of dangling references:
let x = 5; function a(i) { let y = 3; function b() { console.log(i + y + x); } y = 10; i = 2; return b; } p = a(4); // After this, a’s context goes away, or does it? p(); // But wait, what does p even do?
This looks bad because the returned function references variables from a function which has already finished. Generally, when a function finishes, don’t all of its local variables go away? What should happen in this case? A language could:
a
, so that the returned function (b
) can still use its data. In this case, where we keep the data from the finished function around for the returned function to use, the returned function is called a closure.
JavaScript’s designers chose option #3 because it is incredibly powerful, but at a price: you can’t allocate the function call contexts of the enclosing function on a (super-efficient) stack. Not every language is willing to give up stack allocation! What are the options for these less adventurous languages?
Here’s a fun case to consider:
function second(f) { const name = 'new' f() } function first() { const name = 'old' const printName = () => console.log(name) second(printName) } first()
In the function printName
, the variable name
is a free variable. When printName
is passed to the function second
and called there, it is running in a context that has its own name
variable.
Question: what does this output? Does the function, when called, use the binding to name
it received at the point it was defined ("old"
) or the one within the context it which it was finally executed ("new"
)?
The former behavior is known as deep binding (the binding happened early, back in the past, in the depths of time) and the latter as shallow binding (it happened later, more recently, in the shallow past).
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
global
keyword, JavaScript updates the global variable, and Ruby requires the use of the $
prefix.var x = 2; function f() { print x; } function g() { var x = 5; f(); print x; } g(); print x;
var x = 1; function f() {return x;} function g() {var x = 2; return f();} print g() + x;
let x = 3; let x = 3;
We’ve covered: