Chapter 2. Threads and Thread Switching

Table of Contents

Threads
Switching Threads
Kernel Stack Models
Per-Processor Kernel Stack
Per-Thread Kernel Stack
Sample Code

Threads are one of the abstractions offered by the L4 microkernel. They abstract from the CPU and thus allow to share a single CPU among multiple activities (tasks, programs, jobs, ...). The thread abstraction enables multi-programming systems by switching between threads whenever another activity is to be executed on the CPU.

Threads

To this end, threads must be represented by a data structure that can hold all the CPU state, including the current instruction pointer, the stack pointer, the contents of general purpose registers and special purpose registers (e.g., floating point or status registers) contents. Additionally, to enable controlled switching between threads (scheduling), scheduling parameters such as thread priorities, time slice lengths, and thread status (e.g., running, ready, blocked, invalid) are also required. All this information is stored in so-called TCBs, which will be discussed in more detail in Chapter 4, TCBs.

Switching Threads

Switching threads is conceptually easy:

  • Store the CPU state of the currently running thread in its TCB
  • Load the CPU state of the thread to resume from its TCB into the CPU

However, two problems arise: First, TCBs must be protected kernel objects, so that no thread on user-level can manipulate other threads' states. Furthermore, threads must not be allowed to modify even their own TCB, if/as the TCBs contain scheduling information (e.g., the remaining timeslice length).

As a consequence, thread switches must (most of the time, see Chapter 10, Local IPC) be performed by the kernel, which means that the kernel must run in order to do so. If the kernel is running, the user-level state of the thread must be preserved somewhere in order to be able to correctly resume it. For this second problem, some hardware support on entering the kernel is required: On kernel entry, the user-level CPU state could be saved on a stack (x86), or the kernel can use a different set of CPU state (register banks, found in MIPS or Itanium machines).

On stack-based machines (x86), a thread switch thus requires the following steps:

  • A thread executes in user mode
  • Some event triggers a thread switch (interrupt, exception, syscall, ...)
  • The system enters kernel mode

    • The hardware automatically stores the user-level stack pointer ...
    • ... loads a kernel stack pointer
    • and saves the user-level instruction pointer and stack pointer on the kernel stack
    • The hardware loads a kernel instruction pointer based on the cause of the kernel entry (interrupt, exception, syscall, ...)
  • The kernel code saves the remaining CPU state into the current thread's TCB before overwriting it
  • The kernel loads the CPU state of the next thread from its TCB ...
  • ... and returns to user-mode, causing the hardware to

    • load the user-level instruction pointer and
    • the user-level stack pointer from the current stack

Kernel Stack Models

Concerning the stack being used in kernel mode, different models are possible. The kernel stack should always be different from the user-level stack for security reasons: First, the user can set up an invalid stack pointer before entering kernel mode, loading a well-known, valid stack pointer on kernel entry is likely to be easier than validating the current value. Secondly, the kernel entry might be caused by a page fault on the user-level stack. Storing user-level state on this stack in order to handle the page fault will thus raise further page faults (accesses the same non-mapped memory) which can never be handled.

The presented models differ in the number of kernel-level stacks.

Per-Processor Kernel Stack

All threads executing on one CPU can use the same kernel stack. [2] As a consequence, either only one thread can execute in kernel mode at any time (i.e., threads executing in kernel mode cannot be preempted), or unusual approaches such as continuations must be used.

This model is efficient with respect to cache lines used during a context switch (only one stack), but uncomfortable from the kernel developer's point of view.

Per-Thread Kernel Stack

Alternatively, each thread can have its own kernel-level stack. This model resembles user-level code and makes kernel mode nothing special from the kernel developer's point of view. However, this model is more costly both with respect to virtual memory use (only one stack must be addressable in the previous model, whereas this model requires (number of threads) stacks to be present in virtual memory) and with respect to cache line use on thread switch: Both the stacks of the current thread an the stack of the next thread are accessed, effectively doubling the number of cache lines used for them.

Due to its simplicity, L4 employs the per-thread stack model and integrates the kernel stacks into the TCBs. (TCB plus stack fit into a single page to avoid additional TLB misses.)

Sample Code

Code to cooperatively switch threads in kernel mode on x86:

/* save all registers on the stack */
pusha

/* compute TCB address of current thread
 * (esp already points to the end of the TCB,
 * i.e., the top of the kernel stack)
 */
mov     esp, ebp
and     -sizeof_tcb, ebp

/* assume: edi = address of TCB of B */
mov     esp, TCB_Off_esp(ebp)
mov     TCB_Off_esp(edi), esp

/* setup kernel stack for next kernel entry */
add     sizeof_tcb, edi
mov     edi, 0(esp0_ptr)

/* restore all registers from the stack */
popa

Note:

  • TCBs comprise 2k bytes (k in [9..12]) and are naturally aligned
  • -2k can be used to mask out the offset inside a TCB and compute its start address: (any address inside a TCB) AND -2k results in the 2k-aligned base address of the TCB
  • On kernel entry, the hardware loads the stack pointer from esp0
  • Stack on x86 grow downwards, hence on leaving the kernel, esp0 is set to the end of the TCB


[2] Threads on different CPUs should always have separate kernel stacks in order to exploit possible parallelism and to avoid synchronization issues among the CPUs on kernel entries.