Mutual Exclusion

"The primary challenge of concurrency is managing access to shared, mutable state" — R. Mark Volkmann. Volkmann is right.

Background

Mutually exclusive access to shared data is often required to avoid problematic race conditions. What is meant by this? How can we make it happen?

Requirements:

Approaches:

Bare Machine Approaches

Simple Shared Variables

In most cases a single one-word variable can be read or written atomically. To share such a variable between threads you just have to guarantee the value really is in memory, and not optimized into a register. In Java and C, just mark the variable volatile.

Read/Writes on multi-word variables (like long and double in Java) are not guaranteed to be atomic. Code like x++ is almost definitely not atomic.

The Wrong Way

If multiple threads execute this code

1:    while (something) {
2:        if (inUse) {          // THIS CODE IS ALL WRONG!
3:            inUse = true;

4:            doSomethingWithTheSharedResource();

5:            inUse = false;
6:        } else {
7:            doSomethingElse();
8:        }
9:    }

there's no mutual exclusion — two threads A and B can suffer this interleaving: A2 B2 A3 B3, etc. Now both are executing their critical sections at the same time.

Atomic Test and Set

To fix the problem above the test (line 2) and the set (line 3) have to be atomic! On many hardware architectures there exists an atomic testAndSet instruction. It examines a variable and if it is false, it sets it to true and returns true. Otherwise it leaves the variable true and returns false. This is all done as a single atomic hardware instruction. With it we could write:

1:    while (something) {
2:        if (atomicTestAndSet(&inUse)) {

3:            doSomethingWithTheSharedResource();

4:            inUse = false;
5:        } else {
6:            doSomethingElse();
7:        }
8:    }

Other Atomic Operations

Sometimes with advanced hardware, or with support from an operating system, larger words of memory can be protected with a very simple API. (Example: Win32 InterlockedIncrement and the Java classes in java.util.concurrent.atomic.)

Peterson's Algorithm

If you don't have a hardware atomic test and set, then enforcing mutual exclusion on a bare machine that meets all the requirements above takes a surprising amount of work.

// Thread 0                         // Thread 1

claim0 = false;                     claim1 = false;
while (true) {                      while (true) {
    claim0 = true;                      claim1 = true;
    turn = 1;                           turn = 0;
    while (claim1 && turn != 0) {}      while (claim0 && turn != 1) {}
    CRITICAL SECTION                    CRITICAL SECTION
    claim0 = false;                     claim1 = false;
    NON_CRITICAL_SECTION                NON_CRITICAL_SECTION
}                                   }

Problems with this:

Bakery Algorithm

The bakery algorithm solves the mutual exclusion problem for multiple threads on a bare machine.

Scheduler-Assisted Mutual Exclusion

It's nice if the operating system can provide some way to make a thread go to sleep (i.e. remove it from the pool of ready threads that get scheduled on a processor) and wake it up when something happens.

Primitives to make this happen include barriers (like latches, countdowns, and cyclic barriers), mutexes, semaphores, futexes, and exchangers.

Barriers

Barriers are simple objects used to allow one or more threads to wait until other threads have completed some work. They can be used for mutual exclusion if you like. Java has a few of these in its java.util.concurrent library.

Semaphores

Semaphores are pretty much designed to implement mutual exclusion. They work like this:

Semaphore s = new Semaphore(MAX, true);
.
.
.
s.acquire();
CRITICAL_SECTION
s.release();
.
.
.

The idea is that the thread will block on the semaphore if the maximum number of threads have already successfully "acquired." When a thread does a "release" then one of the block threads will wake up (meaning the acquire call finally returns).

The internal implementation is something like this:

Semaphore(max, fifo)
Creates a new semaphore with the given maximum value and fifo flag. If fifo is true, threads block in a FIFO queue so a release always wakes the one blocked the longest; otherwise they block in a set and a release wakes an arbitrary blocked thread.
s.acquire()
atomically {if (s.value > 0) s.value--; else block on s}
s.release()
atomically {
    if (there are threads blocked on s)
        wake one of them
    else if (s.value == s.MAX)
        fail
    else
        s.value++
}
A semaphore with a maximum count of 1 is called a binary semaphore or sometimes just a mutex.

Semaphores are an "unstructured" low-level primitive; you might not use them directly in high-level applications:

So the modern approach is for resources to protect themselves.

Data-Oriented Synchronization

A simple way to put the resource itself in charge of the protection (instead of the threads) is to have a lock associated with every object, like in Java:

public class Account {
    private BigDecimal balance = BigDecimal.ZERO;

    public synchronized BigDecimal getBalance() {
        return balance;
    }

    public synchronized void deposit(BigDecimal amount) {
        balance = balance.add(amount);
    }
}

Every object in Java has an associated monitor which a thread can lock and unlock. Only one thread at a time can hold a monitor's lock. A synchronized method or synchronized statement does an implicit lock at the beginning and an unlock at the end. An attempt to lock a monitor that is already locked causes the thread to wait until the lock is free (unless the lock owner itself is trying to lock again).

By making the account responsible for managing its own synchronization we free the gazillions of threads that wish to use accounts from having to remember to lock and unlock themselves!

Note: You don't have to make the scope of the lock be an entire method:

    .
    .
    .
    synchronized (x) {
       ...
    }
    .
    .
    .

Notification

Simply using the synchronized keyword might not be sufficient. Consider a message board where threads can post messages and other threads can take them out. The message buffer has a fixed, bounded size. Adds can only be done when the buffer isn't full, and removes when not empty.

public class UnsynchronizedBuffer {
    private Object[] data;
    private int head = 0;
    private int tail = 0;
    private int count = 0;
    public Buffer(int capacity) {
        data = new Object[capacity];
    }
    public boolean add(Object item) {
        if (count < data.length) {
            data[tail] = item;
            tail = (tail + 1) % data.length;
            count++;
            return true;
        }
        return false;
    }
    public Object remove() {
        if (count > 0) {
            Object item = data[head];
            head = (head + 1) % data.length;
            count--;
            return item;
        }
        return null;
    }
    public int getSize() {
        return count;
    }
}

Notice the race conditions! The buffer could have room for only one more item, but two threads both try to add at the same time. They each execute the size test before either completes the adding of an item. There's a similar race when multiple threads try to remove from a buffer with only one message.

Exercise: Explain the race conditions a bit more precisely.

What if we just synchronize add and remove? Yeah, that'll remove the races, but threads wanting to add (or remove) before proceeding will have to busy wait! NOOOOOOOO! Can't they just go to sleep until some other thread makes room (in the wants-to-add case) or puts something in (in the wants-to-remove case)? Yep. Like this:

public class SynchronizedBuffer {
    private Object[] data;
    private int head = 0;
    private int tail = 0;
    private int count = 0;
    public Buffer(int capacity) {
        data = new Object[capacity];
    }
    public synchronized void add(Object item) {
        while (count == data.length) {
            try {wait();} catch (InterruptedException e) {}
        }
        data[tail] = item;
        tail = (tail + 1) % data.length;
        count++;
        notifyAll();
    }
    public synchronized Object remove() {
        while (count == 0) {
            try {wait();} catch (InterruptedException e) {}
        }
        Object item = data[head];
        head = (head + 1) % data.length;
        count--;
        notifyAll();
        return item;
    }
    public int getSize() {
        return count;
    }
}

wait() moves a thread into the buffer's wait set (making them blocked), and notifyAll removes all the threads in the object's wait set (and makes them runnable again).

This makes the buffer a blocking queue, which is pretty nice; the API for it is quite clean. But, doing things this way in Java still leaves room for improvement: all threads here are waiting on the buffer object only, we're not distinguishing between threads waiting for "not full" and those waiting for "not empty".

Exercise: What if we placed two new fields within the SynchronizedBuffer class
    private Object notFull = new Object();
    private Object notEmpty = new Object();
and then rewrote the body of add like this:
        synchronized (notFull) {
            while (count == data.length) {
                try {notFull.wait();} catch (InterruptedException e) {}
            }
        }
        data[tail] = item;
        tail = (tail + 1) % data.length;
        count++;
        synchronized (notEmpty) {notEmpty.notifyAll();}
and made similar changes to remove(). Does this work? If so, is it better than the previous version? How? If not, how does it fail? (Describe any race conditions, deadlock or starvation in detail.)

By the way, Java does have its own blocking queue class, but you wanted to learn about wait and notifyAll, right?

Ada Protected Objects

Ada's protected objects are like monitors but a bit more sophisticated — most everything you normally do in these cases is taken care of declaratively! The bounded buffer example (shown in the context in which Index is defined as a modular type and Item is the type of objects to be stored in a buffer):

protected type Buffer is              -- SPECIFICATION
    entry Put(I: Item);               --
    entry Get(I: out Item);           --   Public part specifies:
    function Size return Natural;     --   entries, procs, fns
private
    Data: array (Index) of Item;      --   Private part holds the shared
    Head, Tail: Index := 0;           --   data (this is in the spec so
    Count: Natural := 0;              --   the compiler knows how much space
end Buffer;                           --   to allocate for an object)

protected body Buffer is              -- BODY (operation bodies ONLY; no data)

    entry Put(I: Item) when Count < Index'Modulus is
    begin
        Data(Tail) := I;
        Count := Count + 1;
        Tail := Tail + 1;
    end Put;

    entry Get(I: out Item) when Count > 0 is
    begin
        I := Data(Head);
        Count := Count - 1;
        Head := Head + 1;
    end Get;

    function Size return Natural is
        return Count;
    end Size;

end Buffer;

How it works:

It's the programmer's responsibility to see that all barriers execute quickly and don't call a potentially blocking operation!

Advantages of protected objects (please read the Ada 95 Rationale section on protected types):

Explicit Locking Reconsidered

Hang on — explicit locking isn't really evil, is it? After all, if it's the case that you want

the code might be clearer with explicit locks.

Java provides a large number of locks and lock-like classes, such as barriers and exchangers. One of the advantages of explicit locks in Java over the synchronized statement is that a finally clause can be used to clean up data that might be in an inconsistent state if an exception were thrown in the critical section.