Multiprocessors and Threads

Multiprocessors and Threads Lecture 3 CS523S: Operating Systems

Motivation for Multiprocessors • Enhanced Performance - • Concurrent execution of tasks for increased throughput (between processes) • Exploit Concurrency in Tasks (Parallelism within process) • Fault Tolerance - • graceful degradation in face of failures CS523S: Operating Systems

Basic MP Architectures • Single Instruction Single Data (SISD) - conventional uniprocessor designs. • Single Instruction Multiple Data (SIMD) - Vector and Array Processors • Multiple Instruction Single Data (MISD) - Not Implemented. • Multiple Instruction Multiple Data (MIMD) - conventional MP designs CS523S: Operating Systems

MIMD Classifications • Tightly Coupled System - all processors share the same global memory and have the same address spaces (Typical SMP system). • Main memory for IPC and Synchronization. • Loosely Coupled System - memory is partitioned and attached to each processor. Hypercube, Clusters (Multi-Computer). • Message passing for IPC and synchronization. CS523S: Operating Systems

CPU CPU CPU CPU cache cache cache cache MMU MMU MMU MMU Interconnection Network MM MM MM MM MP Block Diagram CS523S: Operating Systems

Memory Access Schemes • Uniform Memory Access (UMA) • Centrally located • All processors are equidistant (access times) • NonUniform Access (NUMA) • physically partitioned but accessible by all • processors have the same address space • NO Remote Memory Access (NORMA) • physically partitioned, not accessible by all • processors have own address space CS523S: Operating Systems

Other Details of MP • Interconnection technology • Bus • Cross-Bar switch • Multistage Interconnect Network • Caching - Cache Coherence Problem! • Write-update • Write-invalidate • bus snooping CS523S: Operating Systems

MP OS Structure - 1 • Separate Supervisor - • all processors have their own copy of the kernel. • Some share data for interaction • dedicated I/O devices and file systems • good fault tolerance • bad for concurrency CS523S: Operating Systems

MP OS Structure - 2 • Master/Slave Configuration • master monitors the status and assigns work to other processors (slaves) • Slaves are a schedulable pool of resources for the master • master can be bottleneck • poor fault tolerance CS523S: Operating Systems

MP OS Structure - 3 • Symmetric Configuration - Most Flexible. • all processors are autonomous, treated equal • one copy of the kernel executed concurrently across all processors • Synchronize access to shared data structures: • Lock entire OS - Floating Master • Mitigated by dividing OS into segments that normally have little interaction • multithread kernel and control access to resources (continuum) CS523S: Operating Systems

MP Overview MultiProcessor SIMD MIMD Shared Memory (tightly coupled) Distributed Memory (loosely coupled) Symmetric (SMP) Clusters Master/Slave CS523S: Operating Systems

SMP OS Design Issues • Threads - effectiveness of parallelism depends on performance of primitives used to express and control concurrency. • Process Synchronization - disabling interrupts is not sufficient. • Process Scheduling - efficient, policy controlled, task scheduling (process/threads) • global versus per CPU scheduling • Task affinity for a particular CPU • resource accounting and intra-task thread dependencies CS523S: Operating Systems

SMP OS design issues - 2 • Memory Management - complicated since main memory is shared by possibly many processors. Each processor must maintain its own map tables for each process • cache coherence • memory access synchronization • balancing overhead with increased concurrency • Reliability and fault Tolerance - degrade gracefully in the event of failures CS523S: Operating Systems

50ns Main Memory Typical SMP System CPU CPU CPU CPU 500MHz cache MMU cache MMU cache MMU cache MMU System/Memory Bus • Issues: • Memory contention • Limited bus BW • I/O contention • Cache coherence I/O subsystem Bridge INT ether System Functions (timer, BIOS, reset) scsi • Typical I/O Bus: • 33MHz/32bit (132MB/s) • 66MHz/64bit (528MB/s) video CS523S: Operating Systems

Some Definitions • Parallelism: degree to which a multiprocessor application achieves parallel execution • Concurrency: Maximum parallelism an application can achieve with unlimited processors • System Concurrency: kernel recognizes multiple threads of control in a program • User Concurrency: User space threads (coroutines) provide a natural programming model for concurrent applications. Concurrency not supported by system. CS523S: Operating Systems

Process and Threads • Process: encompasses • set of threads (computational entities) • collection of resources • Thread: Dynamic object representing an execution path and computational state. • threads have their own computational state: PC, stack, user registers and private data • Remaining resources are shared amongst threads in a process CS523S: Operating Systems

Threads • Effectiveness of parallel computing depends on the performanceof the primitives used to express and control parallelism • Threads separate the notion of execution from the Process abstraction • Useful for expressing the intrinsic concurrency of a program regardless of resulting performance • Three types: User threads, kernel threads and Light Weight Processes (LWP) CS523S: Operating Systems

User Level Threads • User level threads - supported by user level (thread) library • Benefits: • no modifications required to kernel • flexible and low cost • Drawbacks: • can not block without blocking entire process • no parallelism (not recognized by kernel) CS523S: Operating Systems

Kernel Level Threads • Kernel level threads - kernel directly supports multiple threads of control in a process. Thread is the basic scheduling entity • Benefits: • coordination between scheduling and synchronization • less overhead than a process • suitable for parallel application • Drawbacks: • more expensive than user-level threads • generality leads to greater overhead CS523S: Operating Systems

Light Weight Processes (LWP) • Kernel supported user thread • Each LWP is bound to one kernel thread. • a kernel thread may not be bound to an LWP • LWP is scheduled by kernel • User threads scheduled by library onto LWPs • Multiple LWPs per process CS523S: Operating Systems

First Class threads (Psyche OS) • Thread operations in user space: • create, destroy, synch, context switch • kernel threads implement a virtual processor • Course grain in kernel - preemptive scheduling • Communication between kernel and threads library • shared data structures. • Software interrupts (user upcalls or signals). Example, for scheduling decisions and preemption warnings. • Kernel scheduler interface - allows dissimilar thread packages to coordinate. CS523S: Operating Systems

Scheduler Activations • An activation: • serves as execution context for running thread • notifies thread of kernel events (upcall) • space for kernel to save processor context of current user thread when stopped by kernel • kernel is responsible for processor allocation => preemption by kernel. • Thread package responsible for scheduling threads on available processors (activations) CS523S: Operating Systems

Support for Threading • BSD: • process model only. 4.4 BSD enhancements. • Solaris:provides • user threads, kernel threads and LWPs • Mach: supports • kernel threads and tasks. Thread libraries provide semantics of user threads, LWPs and kernel threads. • Digital UNIX: extends MACH to provide usual UNIX semantics. • Pthreads library. CS523S: Operating Systems

Solaris Threads • Supports: • user threads (uthreads) via libthread and libpthread • LWPs, acts as a virtual CPU for user threads • kernel threads (kthread), every LWP is associated with one kthread, however a kthread may not have an LWP • interrupts as threads CS523S: Operating Systems

Solaris kthreads • Fundamental scheduling/dispatching object • all kthreads share same virtual address space (the kernels) - cheap context switch • System threads - example STREAMS, callout • kthread_t, /usr/include/sys/thread.h • scheduling info, pointers for scheduler or sleep queues, pointer to klwp_t and proc_t CS523S: Operating Systems

Solaris LWP • Bound to a kthread • LWP specific fields from proc are kept in klwp_t (/usr/include/sys/klwp.h) • user-level registers, system call params, resource usage, pointer to kthread_t and proc_t • klwp_t can be swapped with LWP • LWP non-swappable info kept in kthread_t CS523S: Operating Systems

Solaris LWP (cont) • All LWPs in a process share: • signal handlers • Each may have its own • signal mask • alternate stack for signal handling • No global name space for LWPs CS523S: Operating Systems

Solaris User Threads • Implemented in user libraries • library provides synchronization and scheduling facilities • threads may be bound to LWPs • unbound threads compete for available LWPs • Manage thread specific info • thread id, saved register state, user stack, signal mask, priority*, thread local storage • Solaris provides two libraries: libthread and libpthread. • Try man thread or man pthreads CS523S: Operating Systems

Solaris Thread Data Structures proc_t p_tlist kthread_t t_procp t_lwp klwp_t t_forw lwp_thread lwp_procp CS523S: Operating Systems

L L L L ... ... ... P P P Solaris: Processes, Threads and LWPs Process 2 Process 1 user Int kthr kernel hardware CS523S: Operating Systems

Solaris Interrupts • One system wide clock kthread • pool of 9 partially initialized kthreads per CPU for interrupts • interrupt thread can block • interrupted thread is pinned to the CPU CS523S: Operating Systems

Solaris Signals and Fork • Divided into Traps (synchronous) and interrupts (asynchronous) • each thread has its own signal mask, global set of signal handlers • Each LWP can specify alternate stack • fork replicates all LWPs • fork1 only the invoking LWP/thread CS523S: Operating Systems

Mach • Two abstractions: • Task - static object, address space and system resources called port rights. • Thread - fundamental execution unit and runs in context of a task. • Zero or more threads per task, • kernel schedulable • kernel stack • computational state • Processor sets - available processors divided into non-intersecting sets. • permits dedicating processor sets to one or more tasks CS523S: Operating Systems

Mach c-thread Implementations • Coroutine-based - multiples user threads onto a single-threaded task • Thread-based - one-to-one mapping from c-threads to Mach threads. Default. • Task-based - One Mach Task per c-thread. CS523S: Operating Systems

Digital UNIX • Based on Mach 2.5 kernel • Provides complete UNIX programmers interface • 4.3BSD code and ULTRIX code ported to Mach • u-area replaced by utask and uthread • proc structure retained CS523S: Operating Systems

Digital UNIX threads • Signals divided into synchronous and asynchronous • global signal mask • each thread can define its own handlers for synchronous signals • global handlers for asynchronous signals CS523S: Operating Systems

Pthreads library • One Mach thread per pthread • implements asynchronous I/O • separate thread created for synchronous I/O which in turn signals original thread • library includes signal handling, scheduling functions, and synchronization primitives. CS523S: Operating Systems

Mach Continuations • Address problem of excessive kernel stack memory requirements • process model versus interrupt model • one per process kernel stack versus a per thread kernel stack • Thread is first responsible for saving any required state (the thread structure allows up to 28 bytes) • indicate a function to be invoked when unblocked (the continuation function) • Advantage: stack can be transferred between threads eliminating copy overhead. CS523S: Operating Systems

Threads in Windows NT • Design driven by need to support a variety of OS environments • NT process implemented as an object • executable process contains >= 1 thread • process and thread objects have built in synchronization capabilitiesS CS523S: Operating Systems

NT Threads • Support for kernel (system) threads • Threads are scheduled by the kernel and thus are similar to UNIX threads bound to an LWP (kernel thread) • fibers are threads which are not scheduled by the kernel and thus are similar to unbound user threads. CS523S: Operating Systems

4.4 BSD UNIX • Initial support for threads implemented but not enabled in distribution • Proc structure and u-area reorganized • All threads have a unique ID • How are the proc and u areas reorganized to support threads? CS523S: Operating Systems

Multiprocessors and Threads