Multithreading Overview: Advanced Programming Lecture 7

Advanced Programming Rabie A. Ramadan Lecture 7

Multithreading An Overview Some of the slides are exerted from Jonathan Amsterdam presentation

Processing Elements Architecture

Simple classification by Flynn: (No. of instruction and data streams) SISD - conventional SIMD - data parallel, vector computing MISD - systolic arrays MIMD - very general, multiple approaches. Current focus is on MIMD model, using general purpose processors. (No shared memory) Processing Elements

Speed is limited by the rate at which computer can transfer information internally. Instructions Processor Data Output Data Input SISD : A Conventional Computer Ex: PC, Macintosh, Workstations

More of an intellectual exercise than a practical configuration. Few built, but commercially not available The MISD Architecture

Ex: CRAY machine vector processing, Instruction Stream Data Output stream A Data Input stream A Processor A Data Output stream B Processor B Data Input stream B Data Output stream C Processor C Data Input stream C SIMD Architecture Ci<= Ai * Bi

Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Instruction Stream A Instruction Stream B Instruction Stream C Data Output stream A Processor A Data Input stream A Data Output stream B Processor B Data Input stream B Data Output stream C Processor C Data Input stream C

Comm: Source PE writes data to GM & destination retrieves it Easy to build, conventional OSes of SISD can be easily be ported Limitation : reliability & expandability. A memory component or any processor failure affects the whole system. Increase of processors leads to memory contention. Ex. : Silicon graphics supercomputers.... MEMORY MEMORY MEMORY BUS BUS BUS Shared Memory MIMD machine Processor A Processor B Processor C Global Memory System

Communication : based on High Speed Network. Network can be configured to ... Tree, Mesh, Cube, etc. Unlike Shared MIMD easily/ readily expandable Highly reliable (any CPU failure does not affect the whole system) MEMORY MEMORY MEMORY BUS BUS BUS Memory System A Memory System B Memory System C Distributed Memory MIMD Processor A Processor B Processor C

Q Please Serial Vs. Parallel COUNTER 2 COUNTER COUNTER 1

Single and Multithreaded Processes Single-threaded Process Multiplethreaded Process Threads of Execution Multiple instruction stream Single instruction stream Common Address Space

Application Application Application Application CPU CPU CPU CPU CPU CPU OS:Multi-Processing, Multi-Threaded Threaded Libraries, Multi-threaded I/O Better Response Times in Multiple Application Environments Higher Throughput for Parallelizeable Applications

Multi-threading, continued...Multi-threaded OS enables parallel, scalable I/O Application Application Application Multiple, independent I/O requests can be satisfied simultaneously because all the major disk, tape, and network drivers have been multi-threaded, allowing any given driver to run on multiple CPUs simultaneously. OS Kernel CPU CPU CPU

Program in Execution Consists of three components An executable program Associated data needed by the program Execution context of the program All information the operating system needs to manage the process Applications Could have One or More Process

Thread is a piece of code that can execute in concurrence with other threads. It is a schedule entity on a processor Hardware Context Registers Status Word Program Counter What are Threads? • Local state • Global/ shared state • PC • Hard Context Thread Object

A single sequential flow of control A unit of concurrent execution. Multiple threads can exist within the same process and share memory resources (on the other hand, processes have each its own process space) All programs have at least one thread called “main thread” What is a Thread ?

Each thread has its own Program Counter (point of execution) Control Stack (procedure call/return) Data Stack (local Variables) All threads share Heap (objects) – dynamic allocated memory for the process Program code Class and instance variables Thread Resources

Threaded Process Model THREAD STACK SHARED MEMORY THREAD DATA Threads within a process THREAD TEXT • Independent executables • All threads are parts of a process hence communication easier and simpler.

The Multi-Threading Concept Task A T2 UniProcessor T1 T0 A Threading library creates threads and assigns processor time to each thread

The Multi-Threading in Multi-Processors Task A T2 Processor 1 T1 T0 Processor 2 Processor 3 Processor 4

Speeding up the computations Two threads , each solve half of the problem then combine their results Improving Responsiveness One thread computes while other handles the user interface One thread loads an image from the net while the other computes Why multiple Threads?

Performing house keeping tasks One thread does garbage collection while other computes One thread rebalances the search tree while the other uses the tree. Performing multiuser tasks Several threads run animation simultaneously (as an example) Why multiple Threads?

main : Run thread2 Forever Print 1 thread2 Forever Print 2 Simple Example

Scheduler is part of the Operating System that determines which thread to run next Two types of schedulers Pre-emptive – can interrupt the running thread Cooperative – a thread must voluntarily yield Most modern O.S. are pre-emptive Scheduling

New state: At this point, the thread is considered not alive. Runnable (Ready-to-run) state :� invoked by the start() method but not actually. The scheduler is aware of the thread but may be scheduled sometimes later Runningstate: � The thread is currently executing. Dead state: � If any thread comes on this state that means it cannot ever run again. Blocked - A thread can enter in this state because of waiting the resources that are hold by another thread. Thread Life cycle

Boss/worker model Work crew model Pipelining model Combinations of models Software Models for Multithreaded Programming

One thread functions as the boss It assigns tasks to worker threads for them to perform. Each worker performs a different task until it has finished, at which point it notifies the boss that it is ready to receive another task. Alternatively, the boss polls workers periodically to see whether or not each worker is ready to receive another task. A variation of the boss/worker model is the work queue model. The boss places tasks in a queue, and workers check the queue and take tasks to perform Boss/Worker Model

Multiple threads work together on a single task. The task is divided horizontally into pieces that are performed in parallel Each thread performs one piece. Work Crew Model • Example: • Group of people cleaning a building. Each person cleans certain rooms or performs certain types of work (washing floors, polishing furniture, and so forth), and each works independently.

A task is divided vertically into steps. The steps must be performed in sequence to produce a single instance of the desired result. The work done in each step (except for the first and last) is based on the previous step and is a prerequisite for the work in the next step. Pipelining Model

You may find it appropriate to combine the software models in a single program if your task is complex. Combinations of Models

Multithreaded programs are hard to write Hard to Understand They are incredibly hard to debug Anyone thinks that concurrent programming is easy should have his/her thread examined Bad News

Threads are executed in any order Not necessarily to alternate line by line Bugs may show up rarely Bugs may be hard to repeat More than one thread try to change memory at the same time Assumptions about the execution does not apply (E. G.) What is the value of i after i=1? Threads Assumptions

Two threads access the same memory location; they can conflict with each other The resulting state may be expected wrong E.g. Two states may try to increment a counter Memory Conflicts

Terminology Critical section: a section of code which reads or writes shared data Race condition: potential for interleaved execution of a critical section by multiple threads Results are non-deterministic Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections Deadlock: permanent blocking of threads Starvation: one or more threads denied resources;without those resources, the program can never finish its task.

Mutual exclusion Only one thread at a time can use a resource. Hold and wait Thread holding at least one resource is waiting to acquire additional resources held by other threads No preemption Resources are released only voluntarily by the thread holding the resource, after thread is finished with it Circular wait There exists a set {T1, …, Tn} of waiting threads T1 is waiting for a resource that is held by T2 T2 is waiting for a resource that is held by T3 … Tn is waiting for a resource that is held by T1 Four requirements for Deadlock

Memory Synchronization

Mutex Locks Condition Variables Semaphore Thread Synchronization methods

Mutex Locks

If a data item is shared by a number of threads, race conditions could occur if the shared item is not protected properly. The easiest protection mechanism is a lock For every thread, before it accesses the set of data items, it acquires the lock. Once the lock is successfully acquired, the thread becomes the owner of that lock and the lock is locked. Then, the owner can access the protected items. After this, the owner must release the lock and the lock becomes unlocked. Another thread can acquire the lock Mutex Locks

the use of a lock simply establishes a critical section. Before entering a critical section, a thread acquires a lock. If it is successful, this thread enters the critical section and the lock is locked. As a result, all subsequent acquiring requests will be queued until the lock is unlocked. Mutex Locks

Only the owner can release the lock Imagine the following situation. Suppose thread A is the current owner of lock L and thread B is a second thread who wants to lock the lock. If a non-owner can unlock a lock, thread B can unlock the lock that thread A owns, and, hence, either both threads may be executing in the same critical section, or thread B preempts thread A and executes the instructions of the critical section. Recursive lock acquisition is not allowed The current owner of the lock is not allowed to acquire the same lock again. Mutex Locks Restrictions

Imagine that five philosophers who spend their lives just thinking and eating. In the middle of the dining room is a circular table with five chairs. The table has a big plate of spaghetti. However, there are only five chopsticks available. Each philosopher thinks. When he gets hungry, he sits down and picks up the two chopsticks that are closest to him. If a philosopher can pick up both chopsticks, he eats for a while. After a philosopher finishes eating, he puts down the chopsticks and starts to think. Mutex ExampleThe Dining Philosophers Problem

The Dining Philosophers ProblemAnalysis Philosopher Cycle Philosopher flow

Languages with exceptions like C++ Languages that support exceptions are problematic (easy to make a non-local exit without releasing lock) Consider: void Rtn() { lock.acquire(); … DoFoo(); … lock.release(); } void DoFoo() { … if (exception) throw errException; … } C++ Language Support for Synchronization • Notice that an exception in DoFoo() will exit without releasing the lock

Must catch all exceptions in critical sections Catch exceptions, release lock, and re-throw exception:void Rtn() { lock.acquire();try { … DoFoo(); …} catch (…) { // catch exception lock.release(); // release lock throw; // re-throw the exception } lock.release(); } void DoFoo() { … if (exception) throw errException; … } C++ Language Support for Synchronization (con’t) • Even Better: auto_ptr<T> facility. See C++ Spec. • Can deallocate/free lock regardless of exit method

Java has explicit support for threads and thread synchronization Bank Account example:class Account { private int balance; // object constructor public Account (int initialBalance) { balance = initialBalance; } public synchronized int getBalance() { return balance; } public synchronizedvoid deposit(int amount) { balance += amount; } } Java Language Support for Synchronization • Every object has an associated lock which gets automatically acquired and released on entry and exit from a synchronizedmethod.

Condition Variables

A condition variableallows a thread to block its own execution until some shared data reaches a particular state. A condition variable is a synchronization object used in conjunction with a mutex. A mutexcontrols access to shared data; A condition variable allows threads to wait for that data to enter a defined state. A mutex is combined with CV to avoid the race condition. Condition Variables (CV)

Condition Variable Waiting and signaling on condition variables Routines pthread_cond_wait(condition, mutex) Blocks the thread until the specific condition is signalled. Should be called with mutex locked Automatically release the mutex lock while it waits When return (condition is signaled), mutex is locked again pthread_cond_signal(condition) Wake up a thread waiting on the condition variable. Called after mutex is locked, and must unlock mutex after pthread_cond_broadcast(condition) Used when multiple threads blocked in the condition

Multithreading Overview: Advanced Programming Lecture 7

Multithreading Overview: Advanced Programming Lecture 7

Presentation Transcript

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming

Advanced Programming