Binding

You really need to understand names to read and write code. Names stand for things—and you need to know exactly what they stand for to use them.

Names, Entities, and Bindings

Programs, ultimately, manipulate entities.

A binding is the attachment of a name (also called an identifier) to an entity.

Examples of namesExamples of entities
  • x
  • split
  • determine_best_move
  • determineBestMove
  • IllegalArgumentException
  • Integer
  • org.joda.time
  • /usr/bin/python
  • 先生
  • +
  • <=>
  • |-*-|
  • Types
  • Variables
  • Constants
  • Labels
  • Blocks
  • Functions
  • Fields
  • Methods
  • Classes
  • Modules
  • Operators
  • Threads
  • Tasks
  • Parameters
  • Packages

Names allow us to refer to entities.

Exercise: Imagine trying to program without naming anything. Imagine trying to refer to yourself, your friends, your things, or anything, without the benefit of names. How might you compensate for a lack of names?

Note how names can be alphanumeric or symbolic.

Fun fact: In most languages, the set of symbolic names are fixed, but in a few, like Scala, Swift, Haskell, and Standard ML, you can define your own.

Declarations vs. Definitions

A declaration brings a name into existence, but a definition creates an entity and the initial binding.

extern int z;               // just a declaration
class C;                    // just a declaration
double square(double);      // just a declaration

void f() {                  // declaration plus definition together
  int x;                    // also a declaration plus definition
  int y = x + z /square(3); // so is this
}

double square(double x) {   // definition completing earlier declaration
  return x * x;
}

The purpose of a definition-less declaration is to allow the name to used, even though the definition will be somewhere else (in a place you might not even know).

Assignments vs. Bindings

Here is something very tricky but very, very important. What do you think the meaning of the following is?

    x ← 3
    x ← 5

Is there one entity here? Or two? If you are a language designer, it’s your choice!

assignmentvsrebinding.png

If one, we are changing the value of an entity, which we call assignment, and in which case we call the entity an assignable. (Assignment can be dangerous, because it makes programs harder to reason about. But that’s a different story.)

Here are two languages that do things differently. JavaScript does assignment:

let x = 3            // create entity, value is 3, bind x
const f = () => x    // f returns value of entity bound to x
x = 5                // assignment: replace value of entity
f()                  // ==> 5

And Standard ML does rebinding:

val x = 3;           (* create entity, bind to x *)
val f = fn () => x;  (* f returns value of entity NOW bound to x *)
val x = 5;           (* COMPLETELY NEW BINDING *)
f();                 (* ==> 3, because value of original binding *)
Exercise: Do this little assignment vs. rebinding experiment in Python, Perl, Lua, Julia, Erlang, Go, Swift, Rust, etc., etc.

Polymorphism and Aliasing

Instead of rebinding a name to a new entity, what if we allowed the same name to refer to different entities at the same time? Oooooh. And, and.... turning things around, what if we allowed multiple names to be bound to the same entity at the same time?

We have words for these things:

Polymorphism

Same name bound to multiple entities at the same time

Aliasing

Multiple names bound to the same entity at the same time

And pictures:

polymorphismandaliasing.png

A precise way to say “at the same time” is “within the same referencing environment.” Technically, a referencing environment is the set of bindings in effect at a given point.

Examples of Polymorphism

In a few languages, multiple functions can have the same name at the same time provided their signatures (parameter names, types, orders, etc.), or their receiver class, differs. Check out this C++ example:

bool f(string s) { return s.empty(); }
int f(int x, char c) { return x * c; }

int main() {
  cout << f("hello") << '\n';            // 0
  cout << f(75, 'w') << '\n';            // 8925
}

Polymorphism is interesting because, when given a name, a program has to find the right entity for the name to use! If the right entity can be determined at compile time, like in the example above, we have static polymorphism. If it can only be determined at run time, we have dynamic polymorphism, like in this Ruby example:

class Animal; end
class Dog < Animal; def speak() = 'woof' end
class Rat < Animal; def speak() = 'squeak' end 
class Duck < Animal; def speak() = 'quack' end

animals = (1..50).map{[Dog.new, Rat.new, Duck.new].sample}
animals.each {|a| puts a.speak}

Here’s another example of dynamic polymorphism, this time in Julia. It may look like static polymorphism, but Julia checks those types at run time:

function f(s::String) reverse(s) end
function f(x::Int, y::Char) x * Int(y) end

print(f("Hello"))        # "olleH"
print(f(75, 'π'))        # 72000

Evaluating that script in the console shows you Julia has “methods,” a word commonly associated with dynamic lookup (we’ll hear more about this when we cover OOP):

julia> function f(s::String) reverse(s) end
f (generic function with 1 method)

julia> function f(x::Int, y::Char) x * Int(y) end
f (generic function with 2 methods)

julia> f("Hello")
"olleH"

julia> f(75, 'π')
72000

Examples of Aliasing

Aliasing is pretty common in language with explicit pointers:

#include <iostream>

using namespace std;

int main() {
  int x = 5;
  int* p = &x;

  cout << p << '\n';         // 0x7fff5febb0fc
  cout << &x << '\n';        // 0x7fff5febb0fc
  cout << x << '\n';         // 5
  cout << *p << '\n';        // 5
}

aliaswithpointers.png

C++ has explicit references (woah):

#include <iostream>

using namespace std;

int main() {
  int x = 5;
  int& y = x;  // x and y refer to the same entity

  x = 3;
  cout << x << ' ' << y << '\n';    // 3 3
  y = 8;
  cout << x << ' ' << y << '\n';    // 8 8
}

The picture for this one is easy:

xyreference.png

Lifetimes

The lifetime of an entity is just what it sounds like: the period from its creation (allocation) to its destruction (deallocation). With regard to lifetime, we can imagine different kinds of entities:

  1. Global entities that live forever (“forever” means from the time of its creation to the end of the entire program)
  2. Local entities created automatically when their enclosing subroutine/block is called and destroyed when their containing subroutine/block exits
  3. Entities explicitly created by the programmer that must be explicitly destroyed by the programmer.
  4. Entities explicitly created by the programmer and destroyed automatically when the runtime system determines they can no longer be referenced.

In the world of programming language implementation, we often see storage partitioned into regions for different types on entities based on their lifetimes:

Allocation Type Entities are... Examples Notes
Static Allocated at fixed location in memory when program starts and live forever
  • Global variables
  • String and floating point constants
  • May or may not be present in binary image of program
Stack Allocated and deallocated automatically at subroutine calls and returns
  • Locals
  • Parameters
  • Temporaries
  • Unless part of closure, can be stack-managed because of the way subroutine calls and returns behave
Heap Allocated and deallocated dynamically
  • Things made with new or malloc
  • Object and arrays made from literals or other expressions
  • Allocation can be implicit or explicit
  • Deallocation can be implicit (e.g., garbage collection) or explicit
  • Watch out for memory leaks and dangling references

Here’s an example showing each kind of storage class in action. We’ll work through it in class.

int x = 5;                      // Static

void f() {
  static float y = 0;           // Static - this is pretty fancy, eh?
  int z;                        // Stack
  y++;
  printf("abcdef");             // Literal probably static
}                               // z deallocated here

void h(int* a) {                // a on stack, *a don’t care!
  .
  .
  .
  *a = 5;
  a[6] = 3;
  a = new int;                  // doesn’t affect old entity
}                               // a goes away here, but not *a

void g() {
  int* p = new int;             // p on stack, *p on heap
  .
  .
  .
  delete p;                     // explicit deallocation of *p
  int* q = new int[10];         // q on stack, *q on heap
  .
  .
  .
  h(q);
  .
  .
  .
  delete[] q;
}

int main() {
  f(); f(); g();
}
Exercise: Draw pictures that show what’s going on in this example.

Do not confuse the lifetime of an entity (entity lifetime) with the lifetime of a binding (binding lifetime). Yes, bindings have lifetimes too: a binding lifetime is the duration in which the binding is active.

And very important: The lifetime of a binding can be different from the lifetime of the entity to which the name is bound.

Entities Outliving Bindings

Entities outlive bindings all the time, and is usually normal. Here’s an example:

void f(int& x) {x = 5;}
.
.
.
int y;                 // create an entity and bind y to it
f(y);                  // during execution of f, x is also bound to the entity
cout << y;             // x is no longer bound to the entity, but y still is, fine

But sometimes it can be a problem:

void f() {
    char* p = malloc(1024);
}                               // Ooops!

This function allocates space for a new entity that lives, well, in C at least, (i.e., until the end of the process). But the binding of p to that entity is gone when the function returns. In C, this is called a memory leak and it is a Bad Thing. In (most) other languages, a (built-in) garbage collector can determine when dynamically allocated entities are no longer reachable, and will destroy those entities on its own. C programmers do not have a built-in garbage collector, so they have to:

Bindings Outliving Entities

Okay, if you think about it, bindings outliving entities is almost always a serious bug. One way it can happen is by explicitly deleting the entity while a binding is still active. Some languages prohibit explicit deallocation, but C allows it, making possible the following problem, the dangling pointer (also known as a dangling reference):

int* a;                   // create pointer entity and bind name 'a' to it
a = malloc(sizeof(int));  // create int entity and bind name '*a' to it
*a = 2;                   // assign to the int entity through the name '*a'
free(a);                  // this destroys the int entity
*a = 3;                   // NOOOOO! Binding still there; name *a attached to garbage

This bad enough, but there’s another way you can get dangling pointers; it’s one of the most evil and horrible and sinister things that happen in C programs. It goes like this:

char* get_some_data(FILE* f) {
    char buffer[1024];
    read_at_most_n_bytes_from_file(f, &buffer, 1024);

    return &buffer;       // NOOOOOOOOOOOOOO!!!!!!!
}

The problem here is that, when a function finishes execution in C, all of its internal storage, that is, all of the entities bound to its local variables, are destroyed when the function exits. This code is returning a pointer to a destroyed entity. So this disaster can occur:

char* text = get_some_data(myfile);
char secondCharacter = text[1];         // UH-OH, text[1] is bound to ... what?
Exercise: Fix get_some_data (Hint: define buffer differently).
Exercise: Research the security exploit known as the Buffer Overflow.

Returning pointers to local variables is a special case of a more general problem: pointer variables in long-lived contexts pointing to data in shorter-lived contexts. Here’s another example:

int* p;

void f() {
    int x;
    p = &x;  // Assigning pointer-to-local to a global pointer, BAD
}

int main() {
    f();
    // NOOOOOOOO!! At this point, what is *p?
}

Dangling pointers are pretty bad. For more ways to create them, and for more problems they can cause, see the Wikipedia article on dangling pointers.

Language Design Solutions to Binding Problems

Let’s recap. We saw two ways to generate dangling pointers: (1) pointers to locals and (2) explicit deallocation of an object through a pointer.

Now suppose you are language designer. How would you prevent dangling pointers from ever occurring in your language?

Exercise: Think of ways to do this before reading on.

Avoiding Pointers to Locals

How can we make it impossible for a pointer variable with a long lifetime to point to an object with a shorter lifetime? We can assume for now that the shorter-lived object is a local variable within an “inner” function, and a pointer to it escapes the inner function. Here are some approaches to staying out of dangling pointer land:

  1. Have no explicit pointers in your language (with no explicit pointers, you can’t even make a pointer to a local variable);
  2. Disallow pointers to anything except heap-allocated storage;
  3. Allow pointers to locals, but only if a compile-time check determines it is safe to do so (a compiler can easily keep track of nesting levels); or
  4. Move to the heap any local that becomes the target of a pointer.

Woah, had you even thought of Option 4? The language Go does this!

package main

import "fmt"

type Point struct{x, y int}

func ExampleEscape() {

    var g *int                    // g means "global(ish)"

    f := func(x int) *Point {
        a := 5                    // stays local
        b := 8
        c := &b
        g = c                     // b escapes!
        return &Point{x, a * 5}   // this escapes too!
    }

    fmt.Print(*f(2))
    fmt.Println(*g)
    // Output: {2 25}8
}

If you only knew C, you might think this is terrible code! Pointers to both the local variable b, and a locally-allocated point object within the inner function f are passed out of the function: they escape. But this is not a problem in Go. The compiler recognizes this, and simply allocates b and the point object on the heap.

Note the thinking used by Go’s designers. They did not accept the fact that all local variables had to be allocated on the stack, even though that’s how things were always done in the past (in C and C++, that is). Always be amenable to questioning the conventional wisdom, and don’t trust those that say “this is the way we do things so shut up and conform.”

Preventing Explicit Deallocation

To avoid the other kind of dangling pointer creation, using a pointer variable after its target has been explicitly destroyed, you could:

  1. Disallow all forms of explicit deallocation and rely on a garbage collector.
  2. Check all memory accesses through pointers at run time, ensuring deallocated memory is never being accessed (but this can be expensive!)
  3. Allow explicit deallocation only by the owner of the resource.

Option 3 gives us safety and efficiency. Rust is famous for this. Let’s look at the most basic of basic examples only. The full story is very rich.

This Rust code does not even compile (the type Vec is not copyable):

fn main() {
    let a = vec![0, 1, 2];  // owned by variable a
    let b = a;              // owned by variable b
    println!("{:?}", b);    // works great, b is the owner
    println!("{:?}", a);    // WILL NOT COMPILE
}

Every time you assign, or pass as an argument, or return from a function, a non-copyable value, ownership is transferred. We call this a move. Non-copyable objects are implicitly pointed to. Rust prohibits non-owners from accessing the entity in any way. In addition to simply transferring ownership, Rust provides for borrowing:

fn main() {
    let a = vec![0, 1, 2];  // owned by variable a
    let b = &a;             // b is just borrowing
    println!("{:?}", b);    // works great, b is borrowing
    println!("{:?}", a);    // works great, a still owns it
}
Exercise: Speaking of explicit deallocation, JavaScript has this delete operator thing. Research it. Can that cause a dangling pointer? Why or why not?

Can Closures Cause Dangling References?

Here’s a situation that looks like a cause of dangling references:

let x = 5;
function a(i) {
  let y = 3;
  function b() { console.log(i + y + x); }
  y = 10;
  i = 2;
  return b;
}
p = a(4);    // After this, a’s context goes away, or does it?
p();         // But wait, what does p even do?

This isn’t exactly a pointer-to-local variable problem, because in JavaScript, functions, like all objects, are dynamically allocated on the heap and garbage collected later, so returning b is perfectly fine. The problem is that the returned function references variables from a function which has already finished. What should happen in this case? A language could:

  1. Check for a case in which a returned function would reference local variables that might not be around, and give a compile-time or run-time error.
  2. Destroy the contexts of all functions that return (as is normal in many languages), and let the programmer do whatever, and just crash or returned undefined garbage.
  3. Retain the context of the "outer" function, in this case a, so that the returned function (b) can still use its data. The returned function is called a closure.

JavaScript’s designers chose option #3 because it is incredibly powerful, but at a price: you can’t allocate the function call contexts of the enclosing function on a (super-efficient) stack. Not every language is willing to give up stack allocation! What are the options for these less adventurous languages?

Exercise: Rewrite this example in Perl, Python, Ruby, Scala, and ML and check the output value against the value written by the JavaScript fragment. Then write the equivalent code in Ada, and make a note of the compile-time error.

Deep and Shallow Binding

Here’s a fun case to consider:

function second(f) {
  const name = 'new'
  f()
}
function first() {
  const name = 'old'
  const printName = () => console.log(name)
  second(printName)
}
first()

In the function printName, the variable name is a free variable. When printName is passed to the function second and called there, it is running in a context that has its own name variable.

Question: what does this output? Does the function, when called, use the binding to name it received at the point it was defined ("old") or the one within the context it which it was finally executed ("new")?

The former behavior is known as deep binding (the binding happened early, back in the past, in the depths of time) and the latter as shallow binding (it happened later, more recently, in the shallow past).

Exercise: What does JavaScript use?

Summary

We’ve covered:

  • Examples of names and entities
  • Definition of entity
  • Declarations vs. Definitions
  • Assignables
  • Polymorphism, Aliasing, Overloading
  • Lifetimes
  • Language design solution to possible lifetime problems
  • Closures
  • Dangling references
  • Deep vs. Shallow Binding