If a program is supposed to terminate, there are two kinds of correctness criteria:
For concurrent programs that are supposed to terminate, we prefix these definitions with "For all possible interleaved execution sequences."
For programs that are not supposed to terminate, we have to write correctness criteria in terms of properties that must always hold (safety properties) and those that must eventually hold (liveness properties). Both are important: a program that does nothing is safe: it never screws up.
Here's an example of how some interleavings can produce correct results and others produce incorrect results. First consider two threads:
Thread 1 | Thread 2 |
---|---|
Transfer $50 from account A to account B | Deposit $100 to account B. |
A: load value of acctA into reg R0 B: subtract 50 from R0 C: store value in R0 into acctA D: load value of acctB into reg R0 E: add 50 to R0 F: store R0 into acctB |
x: load value of acctB into reg R2 y: add 100 to R2 z: store R2 into acctB |
Suppose account A initially has $500 and B has $700. We expect, after the transfer and the deposit, for A to have $450 and B to have $850. Now consider the interleaving [A B x C D E y F z]:
Instruction | Account A value | Account B value | Value in R0 | Value in R2 |
---|---|---|---|---|
— | 500 | 700 | ? | ? |
A | 500 | |||
B | 450 | |||
x | 700 | |||
C | 450 | |||
D | 700 | |||
E | 750 | |||
y | 800 | |||
F | 750 | |||
z | 800 |
Yikes. $50 completely disappeared. Why? There was contention for a shared resource. Thread-2 wiped out Thread-1's transfer.
A race condition (a.k.a. race hazard, or just race) is a property of multithreaded code in which the result depends on the relative timing of events in the threads. If all of the possible results are okay, the race condition is benign.
Races are almost always caused by threads stepping on each other, and are fixed by making sure threads have mutually exclusive access to shared data. We need to be able to ensure a (safety) property that "multiple threads will never update the same account at the same time."
In terms of the interleaved instruction sequence model, we avoid races by chunking subsequences into a single atomic instruction.
loop x += dx; y += dy; ... end loopand another thread draws the ship
loop clear(); drawBackground(); drawShipAt(x, y); swapBuffers(); ... end loopIf Thread-2's
drawShipAt
gets executed between Thread-1's
adjusting x and adjusting y, the ship gets drawn at a bad
position. We have to do both updates in a critical region.
We often see this kind of code:
. . . doNonCriticalStuff ENTRY_PROTOCOL doCriticalRegion EXIT_PROTOCOL doNonCriticalStuff . . .
where
Critical regions can prevent race conditions, but introduce a bunch of potential problems of their own:
Two threads are deadlocked if each is blocked waiting for an event that only the other blocked thread can generate.
Livelock occurs when no progress is made because each thread tries to enter its critical section, then times out, then tries again, then times out, and so on, forever.
A thread is starved when it is prevented from entering its critical section because the other threads conspire (or appear to be conspiring) to lock it out.
A (poorly designed) system is said to fail in the abscence of contention when a thread can't enter its critical section unless another thread is waiting to enter its critical section.
What if a thread is interrupted, is suspended, or crashes inside its critical section? In the middle of the critical section, the system may be in an inconsistent state (remember the example of updating a ship's coordinates?). Not only that, the thread is holding a lock and if it dies no other thread waiting on that lock can proceed! Critical sections have to be treated as transactions and must always be allowed to finish.
Developers must ensure critical regions are very short and always terminate.
Some properties we may like to prove to show a system is progressing (not starving):
Illustration of a weakly fair but not strongly fair system: