Trivial Multithreaded Examples

Here are three multithreaded programs implemented in several different languages.

The Plan

We'll look at three different simple programs; each implemented five different ways:

The First Example

The first program runs three threads in parallel:

  1. One that writes the numeral '0' 5000 times
  2. One that writes the numeral '1' 5000 times
  3. One that writes the numeral '2' 5000 times

The purpose is only to show how to set up and run threads, not how to use them effectively.

Ada

In Ada, tasks run in parallel with the unit in which they are declared. The tasks and their parent (more or less) begin together and terminate together. There are no explicit activation or termination calls.

trinary.adb
------------------------------------------------------------------------------
-- trinary.adb
--
-- This program illustrates three simple tasks running concurrently: one that
-- writes zeros, one that writes ones, and one that writes twos.  Each task
-- writes to standard output.  The result should be a large trinary number.
------------------------------------------------------------------------------

with Ada.Text_IO;
use Ada.Text_IO;

procedure Trinary is

  task Write_Zeros;
  task Write_Ones;

  task body Write_Zeros is
  begin
    for I in 1..5000 loop
      Put ('0');
    end loop;
  end Write_Zeros;

  task body Write_Ones is
  begin
    for I in 1..5000 loop
      Put ('1');
    end loop;
  end Write_Ones;

begin
    for I in 1..5000 loop
      Put ('2');
    end loop;
end Trinary;

$ gnatmake trinary && ./trinary
02102102102102102102102102102221112222222221111110010120101010101010210210210210
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021010102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021000122201210120202020220202021010101010101201201201201201201201
.
.
.

12012012012012012012012012012012012012012012012012012012012012012012012012012012
01201201201201201201201201201201201201201201201201201201201222101010101010101010
12012012012021022222221111111111111122222222221021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102110101010101010101010101010
10101010101010202020202020202020202020202020202020101010101010101010101010101010
10101010101002020210210210210210210210210210210210210210210210210210210210210202
10101201202002101010101010101010101010101010101010101010101010101010101010101010
21021021021021021021021021021021021021021021021021021021021021021021021021021021
.
.
.
20120120120120120120120120120120120120120120120120120120120120120120120120120120
12012012012012012012012012012012012012012012012212121212121212121212121212121212
11212121212121212121212121212121212121212121212121212121212121212121212121212121
22121212121212121212121212121212121212122222222221222111111111212121221212121212
21212121212121212121212121212121212121212121212121212121212121212121212121212121
22121212121212121212121212121212121212121212121212121212112121212121212121212121
12121212121212121121212121212121212121212121212121212121221212121212121212121212
11212121212121212121212121212121212121212121212121212121212121212212121212121212
21212121212121212121212121212121212121212121212121212121212121212121222212121212
21212121212121212121212121221222121212121212121212121212121212121212121212121212
21212121212212121212121212121212121212121212121212121212121212121212121212121212
12222222222222222222222222222222222222222222222222222222222222222222222222222222
2222222222222222222222222222222222222222

Java

In Java we make thread objects. We have to start them explicitly, but we don't have to do anything special to stop them. The program will not exit until all non-daemeon threads have finished (and all of the threads in this example are non-daemon threads).

Trinary.java
/**
 * Application class with a main() method that illustrates three simple threads
 * running concurrently: one that writes zeros, one that writes ones, and one
 * that writes twos. Each thread writes to standard output. The result should be
 * a large trinary number.
 */
public class Trinary {

    private static Thread zeros = new Thread() {
        public void run() {
            for (int i = 0; i < 5000; i++) {
                System.out.print("0");
            }
        }
    };

    private static Thread ones = new Thread() {
        public void run() {
            for (int i = 0; i < 5000; i++) {
                System.out.print("1");
            }
        }
    };

    public static void main(String[] args) {
        zeros.start();
        ones.start();
        for (int i = 0; i < 5000; i++) {
            System.out.print("2");
        }
    }
}
$ javac Trinary.java && java Trinary
20120120120120120120120120120120120120120120120120120120120120120120120120120120
12012012012012012012012012012012012012012012012012012012012012012012012012012012
01201201201201201201201201212121201201201201201201201201201201201201201201201201
20120120120120120120120120120120120120120120120120120120120120120120120120120120
12012012012012012012012012012012012012012012012012012012012012012012012012012012
01201201201201201201201201201201201201201201201201201201201201201201201201201201
.
.
.
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102101012012012012012012012
01201201201201201201201201201201201201201201201201201201201201201201201201201201
20120120120120120120120120120120120120120120120120120120120120120120120120120120
.
.
.
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
1021021021021021021021021021021021020000

Perl

Perl has more-than-one-way-to-do-threads (surprised?) but the modern way, using interpreter threads, is the preferred way. Threads start running as soon as they are created with create (or its alias new). You should wait for threads to finish by calling join.

trinary.pl
# ----------------------------------------------------------------------------
# trinary.pl
#
# This script illustrates three simple threads running concurrently: one that
# writes zeros, one that writes ones, and one that writes twos.  Each thread
# writes to standard output.  The result should be a large trinary number.
# ----------------------------------------------------------------------------

use strict;
use warnings;
use threads;

sub print_zeros { print "0" for (1..5000); }
sub print_ones { print "1" for (1..5000); }

my $zero_thread = threads->new(\&print_zeros);
my $one_thread = threads->new(\&print_ones);
print "2" for (1..5000);

$zero_thread->join;
$one_thread->join;

On my machine I got very, very little interleaving:

$ perl trinary.pl
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
.
.
.
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000002222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
.
.
.
11111111111111111111111111111111111111111111111111111111111111111111111111111111
11111111111111111111111111111111111111111111111111111111111111111111111111111111
11111111111111112222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
22222222222222222222222222222222222222222222222222222222222222222222222222222222
2222222222222222222222222222222222222222
Exercise: Change the code as follows: (1) write the numerals 0 and 1 a few million times, (2) write the numeral '2' only 10 times, and (3) comment out the join calls. Run the modified program. What happens?

C++ with Posix Threads

The Posix threads library, a.k.a pthreads, has been ported to many operating systems. It's quite capable.

trinary.cpp
/*****************************************************************************
*
* trinary.cpp
*
* This program illustrates three simple threads running concurrently: one that
* writes zeros, one that writes ones, and one that writes twos.  Each thread
* writes to standard output.  The result should be a large trinary number.
*
*****************************************************************************/

#include <iostream>
#include <pthread.h>

using namespace std;

void* print_zeros(void* ignored) {
    for (int i = 0; i < 5000; i++) {
        cout << '0';
    }
    return NULL;
}

void* print_ones(void* ignored) {
    for (int i = 0; i < 5000; i++) {
        cout << '1';
    }
    return NULL;
}

int main() {
    pthread_t zero_thread;
    pthread_t one_thread;

    // Create two threads and run them immediately
    int rc = pthread_create(&zero_thread, NULL, &print_zeros, NULL);
    if (rc) cout << "Cannot create the zero-writing thread", exit(rc);
    rc = pthread_create(&one_thread, NULL, &print_ones, NULL);
    if (rc) cout << "Cannot create the one-writing thread", exit(rc);

    // Run this in parallel with the other two threads
    for (int i = 0; i < 5000; i++) {
        cout << '2';
    }

    // Wait for the other two threads to finish
    int result;
    rc = pthread_join(zero_thread, (void**)&result);
    if (rc) cout << "Cannot join the zero-writing thread", exit(rc);
    rc = pthread_join(one_thread, (void**)&result);
    if (rc) cout << "Cannot join the one-writing thread", exit(rc);
}
$ g++ trinary.cpp -lpthreadGC1 && ./a.out
00120120120120120120120120120120120120120120120120120120120120120120120120120120
12012012012012012012012012012012012012012012012012012012012012012012012012012012
01201201201201201201201201010210210210210210210210210210210210210210210210210210
21021021021021021021021021021021021021021021021021021021021021021021021000120120
12010210222102102102102102102102102102102102102102102102102102102102102102102102
.
.
.
01201201201201201202101201201112221111111111111210101201201201201201201201201201
20120120120120120120120201201201201201201201201201201201201201201201201201201201
20122211111111111112111111122222222222222210222222222222221211201201201201201201
20120120120120120120120120120120120120120120120120120120120120120120120120120120
12012012012012012012012012012012012012012012012012012012012012012012012012012012
.
.
.
01201201201201201201201201201201201202020101012012012012012012012012012012012012
01201201201201201201201201201201201201201201201201201201201201201201201201201201
20120120120120120120120120121212122121212121212121212121212121212112121212121212
11121212121212121212121212212212121212121212121212121212122121212121221212121212
11121212121212212121212121211212111212121212122121121212121212121212121212121212
21121212122122212221222122222211111112222222222222111111111111111122211111222222
2222222222222222222222222222222222222222
Exercise: Ahh, interesting. The library calls have return codes! Why did the Ada, Java, and Perl codes not have them? Did I just forget them? Or are these thread implementations so confident that nothing will ever go wrong? Or are they purposely omitted from the API? Explain.

C++ with Win32

If you're using Windows and you want to go really low level, you can use the Win32 directly. The Win32 interface looks a bit like that of pthreads; the main difference being CreateThread takes six parameters instead of four. Also, Win32 threads have both a handle and a thread id.

trinary.cpp
/*****************************************************************************
*
* trinary.cpp
*
* This program illustrates three simple threads running concurrently: one that
* writes zeros, one that writes ones, and one that writes twos.  Each thread
* writes to standard output.  The result should be a large trinary number.
*
*****************************************************************************/

#include <iostream>
#include <windows.h>

using namespace std;

const char* THREAD_CREATE_ERROR = "Could not create both threads";
const char* THREAD_WAIT_ERROR = "Failure waiting for a thread to finish";
const char* CLOSE_HANDLE_ERROR = "Failure closing a thread handle";

// In Win32, functions to be run on threads must take in a LPVOID and
// return a DWORD and have calling convention WINAPI.  These are the
// rules whether you care about the parameter or the return value or not.

DWORD WINAPI print_zeros(LPVOID ignored) {
    for (int i = 0; i < 5000; i++) {
        cout << '0';
    }
    return 0;
}

DWORD WINAPI print_ones(LPVOID ignored) {
    for (int i = 0; i < 5000; i++) {
        cout << '1';
    }
    return 0;
}

int main(int argc, char** argv) {

    // CreateThread will give you back thread ids whether or not you care
    DWORD zeroThreadId, oneThreadId;

    // Create first thread and run it immediately
    HANDLE zeroThreadHandle = CreateThread(
        0,                              // ignore security
        0,                              // use default stack size
        print_zeros,                    // run the function evens
        0,                              // pass a null pointer to print_zeros
        0,                              // run immediately upon creation
        &zeroThreadId                   // will store the thread ID here
    );

    // Create second thread and run it immediately
    HANDLE oneThreadHandle = CreateThread(
        0,                              // ignore security
        0,                              // use default stack size
        print_ones,                     // run the function odds
        0,                              // pass a null pointer to print_ones
        0,                              // run immediately upon creation
        &oneThreadId                    // will store the thread ID here
    );

    // If we did not create both threads, exit with an error message
    if (!zeroThreadHandle || !oneThreadHandle) {
        cout << THREAD_CREATE_ERROR;
        return -1;
    }

    // Here is the code for the "main" thread.  It runs concurrently
    // with the two threads we just created.  It has to wait for the
    // others to finish.
    for (int i = 0; i < 5000; i++) {
        cout << '2';
    }

    // Wait for the other threads to finish
    if (WaitForSingleObject(zeroThreadHandle, INFINITE) != WAIT_OBJECT_0
    && WaitForSingleObject(oneThreadHandle, INFINITE) != WAIT_OBJECT_0) {
        cout << THREAD_WAIT_ERROR;
        return -1;
    }

    // Clean up operating system resources
    if (!CloseHandle(zeroThreadHandle) || !CloseHandle(oneThreadHandle)) {
        cout << CLOSE_HANDLE_ERROR;
        return -1;
    }

    return 0;
}
C:> g++ trinary.cpp
C:> a
02020202020202020202020202020202020202020202022101010210210210210210210210210210
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10210210210210210210210210210210210210210210210210210210210210210210210002102102
10210210210210210210210210210210210210210210211010120120120120120120120120120120
.
.
.
01201201201201201201201201201201201201201201201201201201201201201201201201201202
10210210210210210210210210210210210210210210210210210210210210210210210210210210
21021021021021021021021021021021021021022201222222222222222102222222222222222222
10210222222222222222222121112222222020120202101000010120101111111021021000011111
10210210120210210210202102102102102102102102102102102021021010120202020202021020
20202020201010210101010101012012012012012012012012012012012012012012012012012012
.
.
.
21021021021021021021021021021021021021021021021021021021021021021021021021021021
02102102102102100001202021021021021021021021021021021021021021021021021021021021
02102102102102102102102102102102102102102102102102102102102102102102102102102102
10212121212212121212121212121212121212121211212212121212121212121212121212121211
21221211211121112111211121121211211121112111211121112122121212121212121212121212
12112122121121212121212121212121212121212122121212121212121212121212121212112122
12112122122212222222122222222221111111112222222222121222121121112111211121121211
21221211212121211212121212121222121212121211111111111111111111111111111111111111
1111111111111111111111111111111111111111

The Second Example

This next example illustrates a function to sum all the values in an array, using one thread to sum up the first half of the values and the other thread to sum the rest. Here the threads must wait for each other to complete before the function can combine the results from each thread and return the sum.

Ada

Again we take advantage of implicit synchronization and termination: by nesting the tasks inside a block, we ensure the block itself is not left until both its body and its nested tasks terminate. After both tasks terminate, we can add their results.

sum_demo.adb
------------------------------------------------------------------------------
-- sum_demo.adb
--
-- This program demonstrates a function to compute the sum of an array by
-- summing each half of the array concurrently.  We have a task type and
-- create multiple tasks, each parameterized a little differently.  Tasks
-- in Ada can directly access variables in surrounding scopes.
------------------------------------------------------------------------------

with Ada.Text_IO;
use Ada.Text_IO;

procedure Sum_Demo is

  type Vector is array (Integer range <>) of Integer;

  function Sum (A: Vector) return Integer is
    First_Sum  : aliased Integer := 0;
    Second_Sum : aliased Integer := 0;
  begin
    declare
      task type Sum_Slice(First: Integer; Last: Integer; Sum: access Integer);

      task body Sum_Slice is
      begin
        for I in First..Last loop
          Sum.all := Sum.all + A(I);
        end loop;
      end Sum_Slice;

      Mid: constant Integer := (A'First + A'Last) / 2;
      First_Half_Task: Sum_Slice(A'First, Mid, First_Sum'Access);
      Second_Half_Task: Sum_Slice(Mid + 1, A'Last, Second_Sum'Access);

    begin
      null;
    end;

    -- Both tasks are guaranteed to finish before we get here.
    return First_Sum + Second_Sum;
  end Sum;

begin
  declare
    Assertion_Error: exception;
    U: Vector(103..110) := (9,9,3, others => 6);
    V: Vector(1..19) := (3,6,1,9,8,4,5,7,6,4,3,4,5,6,7,8,1,0,8);
    W: Vector := (1,2,3,4,5,6,7,8,9,10);
    X: Vector(1..20000) := (others => 7);
    Y: Vector(1..0);
  begin
    if Sum(U) /= 51 then raise Assertion_Error; end if;
    if Sum(V) /= 95 then raise Assertion_Error; end if;
    if Sum(W) /= 55 then raise Assertion_Error; end if;
    if Sum(X) /= 140000 then raise Assertion_Error; end if;
    if Sum(Y) /= 0 then raise Assertion_Error; end if;
    Put_Line("All tests passed");
  end;
end Sum_Demo;
$ gnatmake sum_demo && ./sum_demo
All tests passed

Java

Here we actually define a thread class with a constructor to take in the bounds of the slices to sum.

Summer.java
/**
 * Utility class containing a method to sum up an integer array using threads.
 */
public class Summer {

    /**
     * Returns the sum of all the values in an array. Internally it spawns two
     * threads to sum up each half of the array. Note how the body of this method (a
     * third thread!) waits for the two threads to complete using the Thread.join
     * method.
     */
    public static int sum(int[] data) throws InterruptedException {
        RangeSummer r1 = new RangeSummer(data, 0, data.length / 2);
        RangeSummer r2 = new RangeSummer(data, data.length / 2, data.length);
        r1.start();
        r2.start();
        r1.join();
        r2.join();
        return r1.getValue() + r2.getValue();
    }

    /**
     * A thread that sums up a slice of an integer array, specifically the slice
     * [low, high>. An empty slice will have a zero sum. You can call getValue() at
     * any time to get the value that has been computed so far. Calling getValue()
     * after the run() method completes will return the correct sum of the slice.
     */
    private static class RangeSummer extends Thread {
        private int[] data;
        private int low;
        private int high;
        private int value = 0;

        public RangeSummer(int[] data, int low, int high) {
            this.data = data;
            this.low = low;
            this.high = high;
        }

        public int getValue() {
            return value;
        }

        public void run() {
            for (int i = low; i < high; i++) {
                value += data[i];
            }
        }
    }

    /**
     * Makes calls to sum just for fun.
     */
    public static void main(String[] args) throws Exception {
        assert sum(new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }) == 55;
        assert sum(new int[] { 100, 200, 300, 400 }) == 1000;
        assert sum(new int[] {}) == 0;

        int[] a = new int[20000];
        java.util.Arrays.fill(a, 7);
        assert sum(a) == 140000;
        System.out.println("All tests passed");
    }
}
$ javac Summer.java && java -ea Summer
All tests passed

Perl

By default, Perl threads do not share data. Thus we can't let the spawned thread write into a variable that we expect the main thread to pick up. We have to get information out of this thread via its return value, which is obtained as the return value of join.

summer.pl
# Script illustrating a concurrent list summer.

use strict;
use warnings;
use threads;

# Returns the sum of the values in its first argument which
# should be an array reference.

sub sum {
    my $array = shift;
    my $length = scalar @$array;
    my $mid = $length/2;

    # Sum up the first half of the values on a thread
    my $first_half_thread = threads->new(
        sub {
            my $partial_sum = 0;
            $partial_sum += $array->[$_] for (0 .. $length/2-1);
            return $partial_sum;
        }
    );

    # Sum up the second half in parallel with the thread
    my $sum2 = 0;
    $sum2 += $array->[$_] for ($length/2 .. $length-1);

    # Wait until the thread finishes and capture its return value.
    my $sum1 = $first_half_thread->join;

    # Now we can leave the subroutine.
    return $sum1 + $sum2;
}

# Test
(sum [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) == 55 or die;
(sum [100, 200, 300, 400]) == 1000 or die;
(sum []) == 0 or die;
(sum [(7) x 20000]) == 140000 or die;
print "All tests passed\n";
$ perl summer.pl
All tests passed

C++ with Posix Threads

Like a lot of C APIs, pthreads uses void* a lot, so you end up casting a lot. It makes the code a bit ugly, but that's what you get with a language whose compiler insists that all the types work out but then lets to cast it all away.

Note that the pthreads API, like Perl, also lets you get the "return" value of a thread by calling the waiting (join) function.

summer.cpp
/*****************************************************************************
*
* summer.cpp
*
* Short program illustrating a function that sums up an integer array by
* summing each half concurrently.
*
*****************************************************************************/

#include <iostream>
#include <vector>
#include <cassert>
#include <pthread.h>

using namespace std;

struct Slice {
    vector<int> data;
    unsigned low;
    unsigned high;
    Slice(vector<int> d, unsigned l, unsigned h): data(d), low(l), high(h) {}
};

// Takes in a pointer to a slice and sums the values in the slice from
// the low index (inclusive) to the high index (exclusive) and returns
// the sum.
void* sum_slice(void* arg) {
    Slice* s = (Slice*)arg;
    int sum = 0;
    for (int i = s->low; i < s->high; i++) {
        sum += s->data[i];
    }
    return (void*)sum;
}

// Returns the sum of the values in vector a.  It does this internally
// using two threads; the main thread and a newly created one.
int sum_vector(const vector<int>& a) {
    int sum1;
    int sum2 = 0;

    // A thread gets only one argument, so manufacture one.
    pthread_t thread;
    Slice first_half(a, 0, a.size() / 2);

    // Sum half on a new thread; the other half on the current thread.
    int rc = pthread_create(&thread, NULL, &sum_slice, &first_half);
    if (rc) cout <<"Cannot create thread", exit(rc);
    for (int i = a.size() / 2; i < a.size(); i++) sum2 += a[i];

    // Wait until the thread finishes and capture its return value.
    rc = pthread_join(thread, (void**)&sum1);
    if (rc) cout << "Cannot join thread", exit(rc);

    return sum1 + sum2;
}

// Test vectors
int array[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100, 200, 300, 400};
vector<int> a(array, array + 10);
vector<int> b(array + 10, array + 14);
vector<int> c;
vector<int> d(20000, 7);

int main() {
    assert(sum_vector(a) == 55);
    assert(sum_vector(b) == 1000);
    assert(sum_vector(c) == 0);
    assert(sum_vector(d) == 140000);
    cout << "All tests passed\n";
}
$ g++ summer.cpp -lpthreadGC1 && ./a.out
All tests passed

C++ with Win32

This code looks almost identical to the pthreads example above; the main difference is that you get the return value from a thread function in a separate call from the wait function.

summer.cpp
/*****************************************************************************
*
* summer.cpp
*
* Short program illustrating a function that sums up an integer array by
* summing each half concurrently.
*
*****************************************************************************/

#include <iostream>
#include <vector>
#include <cassert>
#include <windows.h>

using namespace std;

const char* THREAD_CREATE_ERROR = "Could not create a threads";
const char* RETRIEVAL_ERROR = "Failure getting the thread's return value";
const char* THREAD_WAIT_ERROR = "Failure waiting for a thread to finish";
const char* CLOSE_HANDLE_ERROR = "Failure closing a thread handle";

struct Slice {
    vector<int> data;
    unsigned low;
    unsigned high;
    Slice(vector<int> d, unsigned l, unsigned h): data(d), low(l), high(h) {}
};

// Takes in a pointer to a slice and sums the values in the slice from
// the low index (inclusive) to the high index (exclusive) and returns
// the sum.
DWORD WINAPI sum_slice(LPVOID arg) {
    Slice* s = (Slice*)arg;
    int sum = 0;
    for (int i = s->low; i < s->high; i++) {
        sum += s->data[i];
    }
    return sum;
}

// Returns the sum of the values in vector a.  It does this internally
// using two threads; the main thread and a newly created one.
int sum_vector(const vector<int>& a) {
    int sum1;
    int sum2 = 0;

    // A thread gets only one argument, so manufacture one
    Slice first_half(a, 0, a.size() / 2);

    // Sum the first half on a new thread
    DWORD threadId;
    HANDLE threadHandle = CreateThread(0, 0, &sum_slice, &first_half, 0,
        &threadId);
    if (!threadHandle) {
        cout << THREAD_CREATE_ERROR;
        return -1;
    }

    // Sum the second half on this thread
    for (int i = a.size()/2; i < a.size(); i++) {
        sum2 += a[i];
    }

    // Wait until the thread finishes
    if (WaitForSingleObject(threadHandle, INFINITE) != WAIT_OBJECT_0) {
	    cout << THREAD_WAIT_ERROR;
	    return -1;
	}

    // Retrieve the return value into the variable sum1
    if (!GetExitCodeThread(threadHandle, (LPDWORD)&sum1)) {
	    cout << RETRIEVAL_ERROR;
	    return -1;
    }

    // Clean up operating system resources
    if (!CloseHandle(threadHandle)) {
        cout << CLOSE_HANDLE_ERROR;
        return -1;
    }

    // Now, finally, we can bring the partial results together and return
    return sum1 + sum2;
}

// Test vectors
int array[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100, 200, 300, 400};
vector<int> a(array, array + 10);
vector<int> b(array + 10, array + 14);
vector<int> c;
vector<int> d(20000, 7);

int main() {
    assert(sum_vector(a) == 55);
    assert(sum_vector(b) == 1000);
    assert(sum_vector(c) == 0);
    assert(sum_vector(d) == 140000);
    cout << "All tests passed\n";
}
C:> g++ summer.cpp && a.exe

The Third Example

Here we want to write, to standard output, all the prime numbers in the range 2..n, where n is a command line argument. We do this using n-1 threads, each with the responsibility of determining whether or not a different value in the range is prime and writing it if it is prime. The interesting thing here is to make sure that a thread gets exclusive access to standard output to write out the whole number; otherwise with interleaved output we could get composite numbers written out.

Ada

This little program shows off how Ada deals with dynamic threads (you allocate them with new), and how you get synchronization basically for free by declaring your resource protected.

print_primes.adb
------------------------------------------------------------------------------
-- print_primes.adb
--
-- A simple concurrent solution to the problem of finding prime numbers.  Here
-- the user enters on the command line
--
--   print_primes <n>
--
-- and the program writes out all the prime numbers up to and including n.
-- The program works by running a task for each k in 2..n to determine if
-- k is prime, and if it is, writes the value of k.
------------------------------------------------------------------------------

with Ada.Text_IO, Ada.Command_Line;
use Ada.Text_IO, Ada.Command_Line;

procedure Print_Primes is

  procedure Find_Primes (Up_To: Natural) is

    -- Printing to standard output is synchronized.

    protected Printer is
      procedure Print(I: Integer);
    end Printer;

    protected body Printer is
      procedure Print(I: Integer) is
      begin
        Put(Integer'Image(I));
      end;
    end Printer;

    -- Each task is parameterized with the number it is to test for
    -- primality.

    task type Checker(Candidate: Natural);

    task body Checker is
    begin
      for Divisor in 2 .. Candidate loop         -- look for divisors
        if Divisor = Candidate then              -- all divisors tried?
          Printer.Print(Candidate);              -- prime: print it
        elsif Candidate rem Divisor = 0 then     -- divisor found
          exit;                                  -- not prime: stop looking
        end if;
      end loop;
    end Checker;

    -- When tasks are allocated at runtime, Ada requires that a task
    -- type be declared.  Also, one cannot throw away the result of
    -- an allocation with the 'new' operator, so a useless variable
    -- must also be declared.

  begin
    declare
      type Checker_Pointer is access Checker;
      C: Checker_Pointer;
    begin
      for I in reverse 2..Up_To loop
        C := new Checker(I);
      end loop;
    end;
  end Find_Primes;

begin
  Find_Primes (Up_To => Natural'Value(Argument(1)));
exception
  when Storage_Error => Put_Line ("Bummer -- out of memory");
  when others => Put_Line ("This program requires exactly one argument");
end Print_Primes;

Java

Like Ada, Java allows very elegant data-oriented synchronization. One simply writes a synchronized statement for the object you want to "lock" for the duration of the statement.

PrimePrinter.java
/**
 * A utility class containing a method to print all primes up to a given
 * integer. The method internally invokes several threads to do its work, one
 * for each candidate prime.
 */
public class PrimePrinter {

    /**
     * Prints all primes upto and including n. Internally it spawns threads for
     * testing the primality of 2..n. Each of the threads will print the value it is
     * testing iff it is prime.
     */
    public static void printPrimes(int n) {
        for (int i = 2; i <= n; i++) {
            final int candidate = i;
            Thread thread = new Thread() {
                public void run() {
                    for (int divisor = 2; divisor <= candidate; divisor++) {
                        if (divisor == candidate) {
                            synchronized (System.out) {
                                System.out.print(" " + candidate);
                            }
                        } else if (candidate % divisor == 0) {
                            break;
                        }
                    }
                }
            };
            thread.start();
        }
    }

    /**
     * Executes printPrimes on its sole command line argument.
     */
    public static void main(String[] args) {
        if (args.length != 1) {
            System.out.println("Needs one integer parameter");
        } else {
            try {
                printPrimes(Integer.parseInt(args[0]));
            } catch (NumberFormatException e) {
                System.out.println("Argument must be an integer");
            }
        }
    }
}

(Actually the println method is already threadsafe — the example just serves to show how one blocks in general.)

Perl

Perl has a simple lock function in the threads module. You lock a scalar explicitly; the lock is implicitly unlocked at the end of the scope in which it was locked.

Oh, and don't forget — the lock had better be a shared variable. Obvious, I would hope.

primes.pl
# ----------------------------------------------------------------------------
# primes.pl
#
# This script runs a thread for all values in 2..n where n is the first
# command line argument.  Each thread writes its value to STDOUT if it is
# prime.
# ----------------------------------------------------------------------------

use strict;
use warnings;
use threads;
use threads::shared;

my $output_lock: shared;

sub print_primes {
    my $n = shift;
    my @checkers;
    for my $i (2 .. $n) {
        $checkers[$i-2] = threads->new(
            sub {
                for my $divisor (2 .. $i) {
                    if ($divisor == $i) {
                        lock $output_lock;
                        print $divisor . " ";
                    } elsif ($i % $divisor == 0) {
                        last;
                    }
                }
            }
        );
    }
    for my $t (@checkers) {
        $t->join;
    }
}

die "Supply exactly one argument\n" unless @ARGV == 1;
die "Supply a positive integer\n" unless $ARGV[0] =~ /^\d+$/;
print_primes $ARGV[0];

C++ with Posix Threads

Here's a first look at the pthreads mutex. Unlike Ada, Java, and Perl, unlocking is done with a library call. Don't forget it!

primes.cpp
/*****************************************************************************
*
* primes.cpp
*
* This program prints all primes up to and including its integer argument
* n.  It works by running threads for all values k in 2..n which determine
* if k is prime and print k if it is.
*
*****************************************************************************/

#include <iostream>
#include <pthread.h>

using namespace std;

#define TRY(code, message) if (code) {cout << message << '\n'; exit(-1);}

// The mutex to use to print safely.
pthread_mutex_t writing;

// Takes in an integer and prints it if and only if it is prime.
// The printing needs to be done within a critical section.
void* check(void* candidate) {
    for (int divisor = 2; divisor <= (int)candidate; divisor++) {
        if (divisor == (int)candidate) {
            TRY(pthread_mutex_lock(&writing), "Can't lock mutex");
            cout << (int)candidate << ' ';
            TRY(pthread_mutex_unlock(&writing), "Can't unlock mutex");
        } else if ((int)candidate % divisor == 0) {
            break;
        }
    }
    return NULL;
}

// Prints primes up to n.  It stores all the threads in an array so
// that it can wait on them all to finish.  Then it cleans up.
void findPrimes(int n) {
    pthread_t* checkers;

    if (n < 2) return;

    checkers = (pthread_t*)malloc((n-1) * sizeof(pthread_t));
    TRY(pthread_mutex_init(&writing, NULL), "Can't create mutex");
    for (int i = 2; i <= n; i++) {
        TRY(pthread_create(&checkers[i-2], NULL, &check, (void*)i),
	        "Cannot create one or more of the threads");
    }

    for (int i = 0; i < n-1; i++) {
        int ignored;
        TRY(pthread_join(checkers[i], (void**)&ignored), "Join failed");
    }

    TRY(pthread_mutex_destroy(&writing), "Can't destroy mutex");
    free(checkers);
}

// Invokes findPrimes with a command line argument.
int main(int argc, char** argv) {
    if (argc != 2) {
        cout << "This application requires exactly one parameter\n";
    } else {
        findPrimes(atoi(argv[1]));
    }
    return 0;
}

C++ with Win32

The CRITICAL_SECTION datatype is a cute little Win32 struct. Most C++ programmers would probably have made a class to encapsulate the critical section, but I'm leaving it here in its raw form so you can get a feel for the API.

print_primes.cpp
/*****************************************************************************
*
*  print_primes.cpp
*
*  This program prints all primes up to and including its integer argument
*  n.  It works by running threads for all values k in 2..n which determine
*  if k is prime and print k if it is.
*
*****************************************************************************/

#include <windows.h>
#include <iostream>
using namespace std;

CRITICAL_SECTION writing;

// This function takes in an integer and prints it if and only if it is prime.
// Becuase we want to run it on a thread, we have to specify the parameter
// as having type LPVOID, give it a return value that we do not care about,
// and use the WINAPI calling convention.

DWORD WINAPI check(LPVOID candidate) {
    for (int divisor = 2; divisor <= int(candidate); divisor++) {
        if (divisor == int(candidate)) {
            EnterCriticalSection(&writing);
            cout << int(candidate) << '\n';
            LeaveCriticalSection(&writing);
        } else if (int(candidate) % divisor == 0) {
            break;
        }
    }
    return 0;
}

// This is the routine that does all the work.  It stores all the thread
// handles in an array so that it can wait on them all to finish.  Then
// it closes all the handles and deallocates the array so that we don't
// have any memory leaks.

void findPrimes(int n) {
    if (n < 2) return;

    HANDLE* checkers = new HANDLE[n-1];
    DWORD junk;
    InitializeCriticalSection(&writing);
    for (int i = 2; i <= n; i++) {
        checkers[i-2] = CreateThread(0, 0, check, LPVOID(i), 0, &junk);
    }

    for (int j = 0; j < n-1; j += MAXIMUM_WAIT_OBJECTS) {
        WaitForMultipleObjects (
            min(MAXIMUM_WAIT_OBJECTS, n-j-1),
            &checkers[j], TRUE, INFINITE);
    }

    DeleteCriticalSection(&writing);
    for (int i = 2; i <= n; i++) {
        CloseHandle(checkers[i-2]);
    }
    delete[] checkers;
}

// The main program tests findPrimes with a commandline argument.

int main(int argc, char** argv) {
    if (argc != 2) {
        cout << "This application requires exactly one parameter";
    } else {
        findPrimes(atoi(argv[1]));
    }
    return 0;
}
Exercise: Remove the code that protects the output and run the modified program. On a fast machine the garbage should be very noticeable.

What Did We Learn?

Basic stuff about writing threaded code on five different platforms. On each platform you learned at least one way to: