Windows Concurrent and Distributed Programming

OLD NOTES WARNING. Someday I will have to modernize these.

Overview

It's instructive to see how processes, threads, communication and synchronization are handled in the low-level operating system API. These notes cover the Windows API.

Micorosoft's reference documentation is extensive. Some required reading:

The Windows API facilities for concurrent and distributed programming allow you to:

Create, manage, set priorities of, and terminate processes
Crate, suspend, resume, set priority of, and terminate threads, and manage thread local storage
Use high-performance synchronization mechanisms like the interlocked functions and critical sections
Do data-oriented synchronization with kernel objects — mutexes, semaphores and events
Perform overlapped I/O
Have threads communicate with each other directly by sending and posting messages in each other's message queues, even if the threads are in remote processes
Do interprocess communication at least nine different ways (clipboard, file mapping, data copy, pipes, mailslots, sockets, DDE, RPC, and COM)

Definitions

Process

An executing application consisting of

A private, virtual address space
Code and data
Other resources such as files, pipes, synchronization objects, brushes, dialogs, etc.
One or more threads

Thread

The basic entity to which the O.S. allocates CPU time. A thread can execute any part of an application's code.

Synchronization Object

A kernel object that is either in a signaled or non-signaled state and can be passed to a wait function for the purpose of coordinating multiple threads. Since they are kernel objects they can be shared among processes and thus used for IPC.

Processes

A process is an instance of a running program. A process contains

A private address space with code and data for an EXE and associated DLLs
File resources (and other system resources)
One or more threads
A command line
A set of environment variables
An error mode
A current drive and current directory
Various attributes, including
- Security attributes
- Execution context
- Scheduling priority
- Processor affinity

Process Creation

Create a process with an API call that takes a log of arguments

    BOOL result = CreateProcess(
        applicationName,
        commandLine,
        processSecurityAttributes,
        threadSecurityAttributes,
        shouldInheritHandles,
        creationFlags,
        environmentPointer,
        currentDirectoryPathname,
        startupInfo,
        &processInfo);

This returns whether or not the process was created. The actual handle of the new process is returned in the process information structure.

    typedef struct {
        HANDLE hProcess,
        HANDLE hThread,
        DWORD dwProcessId,
        DWORD dwThreadId
    }

When a process and its main thread are created with CreateProcess() they get an initial usage count of 2. So the creator should always call CloseHandle() on these things.

The processId and threadId are system-wide unique ids that provide alternate ways of manipulating processes and threads.

Creation flags: DEBUG_PROCESS, CREATE_SUSPENDED, DETACHED_PROCESS, CREATE_NEW_CONSOLE, CREATE_NO_WINDOW, ...

Startup info contains: desktop, window title, x, y, width, height, flags, showFlags, standard input handle, standard output handle, standard error handle, ...

Objects and Handles

Remember some objects are owned by the kernel (processes, threads, modules, files, mailslots, pipes, semaphores, mutexes, events, timers, ...), and some are owned by the process (brush, pen, bitmap, font, cursor, caret, window, ...).

Objects are manipulated via their handle. The Windows API exposes low-level functions for this. Use like this:

    HANDLE h = CreateWindow(...);
    MoveWindow(h, ...);

Objects are chunks of memory and handles are (more or less) pointers to the memory (better: "opaque references"). Objects owned by the process are stored inside the process memory, so they are automatically destroyed when the process dies, but you should destroy them yourself. Kernel objects are stored in kernel memory:

TODO Picture goes here

All kernel objects have a

Security descriptor
name
usage count

Most have additional attributes, depending on what kind of object it is.

You never destroy kernel objects because you do not own them. They are destroyed by the O.S. when the usage count goes to 0.

    Process A                              Process B
    ---------------------                  -----------------------

    HANDLE h1 = CreateMutex(0, FALSE, "dog");

                   (* Mutex "dog" created with usage
                      count = 1 *)

                                           HANDLE h2 = OpenMutex(0, FALSE, "dog");

                   (* Now usage count of dog is 2 *)

    CloseHandle(h1);

                   (* Usage count of dog is 1 *)

                                           CloseHandle(h2)

                   (* Usage count is now 0, the kernel will
                      destroy the mutex *)

If you forget to call CloseHandle() the system will close the handle when the process terminates. Not before, of course.

Exercise: So what's the big deal about closing handles?

Objects can be shared between processes:

By name, as in the example above
By creating it with an inheritable handle so that processes you create can get their own copy of the handle
By calling DuplicateHandle() directly

Process Termination

Three ways to terminate a process

One of its threads calls ExitProcess(). Nice and clean.
One of its threads calls TerminateProcess(). Avoid this if you can. Not a nice clean up. DLLs are not notified to detach. However you are guaranteed that: kernel object usage counts get decremented, allocated memory is freed, files get closed, user and GDI objects get freed, and some more nice things happen.
All threads die on their own, cleanly or uncleanly. If there ain't no threads to run, why keep the process around?

Threads

Each thread in a process has access to the address space, global variables, and all the resources of that process. A thread does have its very own context, and (sometimes) its very own message queue.

Windows schedules threads, not processes.

Why?

Use multiple threads within a process to do several concurrent tasks, like repagination, or spell checking, or hyphenation in a word processor.

Threads are incredibly cheap — way cheaper than Windows processes. (Traditionally Unix processes have been somewhere in the middle.) Threads are fast to start up and shut down and don't impact system resources like processes do. Use multiple threads for:

Client handling in servers, where you just fufill simple requests and don't want the overhead of creating new processes for each client.
Applications where you wish to share a lot of data, since sharing things like database records and window handles between processes is a reasonable security issue. Processes want to be protected from each other.

Thread Contexts

A thread has its own context which includes

its set of machine registers
kernel stack and user stack
thread envioronment block
program counter (I suppose that's just one of the registers)

The O.S. does the thread scheduling and context switching so you write threads that appear to be independent of each other. In old versions of Windows (up to 3.1), you had to shove in a bunch of PeekMessage() calls to satisfy the limitations of cooperative multitasking. (Old timers will remember the whole system mercilessly hanging when connecting to a busy server with the File Manager.)

Thread Creation

The API function is

    HANDLE h = CreateThread(
        securityAttributes,
        stackSize,
        function,
        parameter,
        creationFlags,
        &threadId);

(There's also a CreateRemoteThread function to create a thread in another process.)

Almost all API calls use the handle; the only functions I know that use the thread id are AttachThreadInput() and PostThreadMessage() because these functions can manipulate the message queues of other threads in remote processes.

Useful Thread Functions

GetCurrentThread
GetCurrentThreadId
Sleep
SuspendThread
ResumeThread
GetThreadTimes(hThread, &creationTime, &exitTime, &kernelDuration, &userDuration)

Thread Termination

Three ways that I know of:

Thread calls ExitThread(). Nice and clean. This actually happens automatically! CreateThread() actually launches this code:

    try {
        int exitCode = yourFunction(yourParameter);
        ExitThread(exitCode);
    } catch (...) {
        do some clean up stuff
        ExitProcess(...);
    }

TerminateThread() — drastic! Avoid if possible: no DLL notification, no stack cleanup, critical sections stay permanently locked.
The process that owns it dies.

When a thread terminates, regardless of how:

Any user objects owned by the thread are freed (Owned by the thread? Yep. Most are indeed owned by the process, but the thread owns windows and hooks.)
The thread's kernel object becomes signaled.
The exit code changes from STILL_ACTIVE to whatever you set for it
If this is the last active thread in the process, the process ends
Kernel objects usage counts are decremented

Scheduling

Modern Windows uses preemptive multitasking. The actual algorithm is implementation dependent and subject to tweaking at Microsoft's whim. It sort of works like this:

Each thread has a dynamic priority in the range 0..31. (The system has a single thread — the zero page thread — with level 0. You cannot make another thread at level 0.)
When it's time to schedule a thread, the system picks the highest priority thread that is ready to run.
Within a priority level the threads take turns in time slices.
A high priority thread becoming ready to run immediately preempts the current thread.

Thread States

Thread Priority Computation

A thread's priority level is computed by combining its process's priority class with its own relative priority and a possible boost.

Process Priority Class:
- IDLE_PRIORITY_CLASS = 4
- NORMAL_PRIORITY_CLASS = 8
- HIGH_PRIORITY_CLASS = 13
- REALTIME_PRIORITY_CLASS = 24
Thread Relative Priority:
- THREAD_PRIORITY_LOWEST = -2
- THREAD_PRIORITY_BELOW_NORMAL = -1
- THREAD_PRIORITY_NORMAL = 0
- THREAD_PRIORITY_ABOVE_NORMAL = 1
- THREAD_PRIORITY_HIGHEST = 2
- THREAD_PRIORITY_IDLE
- THREAD_PRIORITY_TIME_CRITICAL

The computation is:

    Priority Level = base priority + boost

    where

    base priority =
        if thread relative priority == idle then
            if process priority = realtime then
                16
            else
                1
        else if thread relative priority == time critical then
            if process priority == realtime then
                31
            else
                15
        else
            process priority + thread relative priority

Relative Priority	Process Priority Class
Relative Priority	Idle	Normal	High	Real Time
Idle	1	1	1	16
Lowest	2	6	11	22
Below Normal	3	7	12	23
Normal	4	8	13	24
Above Normal	5	9	14	25
Highest	6	10	15	26
Time Critical	15	15	15	31

To get/set the priority of a process

GetPriorityClass()
SetPriorityClass()

To get/set the relative priority of a thread

GetThreadPriority()
SetThreadPriority()

Boosts

When a window message is received, I/O is completed, a wait is satisfied, etc., the priority is boosted by 2 for the next slice, then dropped by one for the following slice, then dropped back to the base level
Priority will never drop below the base priority
Priority of non-real time threads will never get boosted above 15
Priority of real-time threads (base in 16..31) are never boosted at all

The system like to tweak priorities whenever it thinks everyone will be happier.

To turn boosting on and off

SetProcessPriorityBoost()
SetThreadPriorityBoost()

To determine whether boosting is enabled or not

GetProcessPriorityBoost()
GetThreadPriorityBoost()

Processes with NORMAL_PRIORITY_CLASS get helped (its threads either get boosts or extended quanta) when a window is brought to the foreground.

Message Queues

If a thread makes a call to a function in user32.dll, the thread gets its very own message queue.

TODO

Synchronization

The two approaches to synchronization are to use kernel objects or some other mechanism. Non-kernel object mechanisms include the interlocked functions and critical sections, which are super high-performance (WAY better than kernel objects) but can only be used in limited contexts).

Kernel Objects and Wait Functions

One way to do syncronization is to wait for a kernel object to be signaled. There are ten things you can wait for:

Kernel Object	Signaled When...
Process	It terminates
Thread	It terminates
Job
Console Input	There is unread input in the console's input buffer
Change Notification	The specified change occurs within a specified directory or directory tree
Memory Resource Notification
Event	You call SetEvent() or PulseEvent(). It is also possible to create an event in the signaled state.
Mutex	It is not owned by any thread. If a wait is successful then the waiting thread gets ownership and the mutex goes back to being unsignaled.
Semaphore	Its count is greater than 0. If a wait is successful then the count is decremented by 1.
Waitable Timer	The timer reaches the due time.

Events

Mutexes

Semaphores

Waitable Timers

Other Mechanisms

Interlocked Variables

Interlocked Lists

Critical Sections

Asynchronous Procedure Calls

Overlapped I/O

Timer Queues

Interprocess Communication

Mechanisms for IPC

Clipboard
File Mapping
Data Copy (WM_COPY_DATA)
Pipes
Mailslots
Windows Sockets
DDE
RPC
COM