Introduction to Concurrent Programming with Threads

Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL: http://kovan.ceng.metu.edu.tr/~erol/Courses/CENG334 Threads Topics • Concurrent programming • Threads CENG334Introduction to Operating Systems Some of the following slides are adapted from Matt Welsh, Harvard Univ.

Concurrent Programming Many programs want to do many things “at once” Web browser: Download web pages, read cache files, accept user input, ... Web server: Handle incoming connections from multiple clients at once Scientific programs: Process different parts of a data set on different CPUs In each case, would like to share memory across these activities Web browser: Share buffer for HTML page and inlined images Web server: Share memory cache of recently-accessed pages Scientific programs: Share memory of data set being processes Can't we simply do this with multiple processes?

Why processes are not always ideal... Processes are not very efficient Each process has its own PCB and OS resources Typically high overhead for each process: e.g., 1.7 KB per task_struct on Linux! Creating a new process is often very expensive Processes don't (directly) share memory Each process has its own address space Parallel and concurrent programs often want to directly manipulate the same memory e.g., When processing elements of a large array in parallel Note: Many OS's provide some form of inter-process shared memory cf., UNIX shmget() and shmat() system calls Still, this requires more programmer work and does not address the efficiency issues.

Can we do better? What can we share across all of these tasks? Same code – generally running the same or similar programs Same data Same privileges Same OS resources (files, sockets, etc.)‏ What is private to each task? Execution state: CPU registers, stack, and program counter Key idea of this lecture: Separate the concept of a process from a thread of control The process is the address space and OS resources Each thread has its own CPU execution state

Processes and Threads Each process has one or more threads “within” it Each thread has its own stack, CPU registers, etc. All threads within a process share the same address space and OS resources Threads share memory, so they can communicate directly! The thread is now the unit of CPU scheduling A process is just a “container” for its threads Each thread is bound to its containing process Address space Thread 0 Thread 0 Thread 1 Thread 2

(Old) Process Address Space 0xFFFFFFFF (Reserved for OS)‏ Stack Stack pointer Address space Heap Uninitialized vars (BSS segment)‏ Initialized vars (data segment)‏ Code (text segment)‏ Program counter 0x00000000

(New) Address Space with Threads Stack pointer for thread 0 Stack pointer for thread 1 Stack pointer for thread 2 0xFFFFFFFF (Reserved for OS)‏ Stack for thread 0 Stack for thread 1 Stack for thread 2 Address space Heap Uninitialized vars (BSS segment)‏ Initialized vars (data segment)‏ PC for thread 1 Code (text segment)‏ PC for thread 0 0x00000000 PC for thread 2 All threads in a single process share the same address space!

Implementing Threads Given what we know about processes, implementing threads is “easy” Idea: Break the PCB into two pieces: Thread-specific stuff: Processor state Process-specific stuff: Address space and OS resources (open files, etc.)‏ PCB Thread ID 4 State: Ready TCB PID 27682 User ID PC Group ID Registers Addr space Thread ID 5 State: Ready TCB Open files PC Net sockets Registers

Thread Control Block (TCB)‏ TCB contains info on a single thread Just processor state and pointer to corresponding PCB PCB contains information on the containing process Address space and OS resources ... but NO processor state! PCB Thread ID 4 State: Ready TCB PID 27682 User ID PC Group ID Registers Addr space Thread ID 5 State: Ready TCB Open files PC Net sockets Registers

Thread Control Block (TCB)‏ TCB's are smaller and cheaper than processes Linux TCB (thread_struct) has 24 fields Linux PCB (task_struct) has 106 fields PCB Thread ID 4 State: Ready TCB PID 27682 User ID PC Group ID Registers Addr space Thread ID 5 State: Ready TCB Open files PC Net sockets Registers

Context Switching TCB is now the unit of a context switch Ready queue, wait queues, etc. now contain pointers to TCB's Context switch causes CPU state to be copied to/from the TCB Context switch between two threads in the same process: No need to change address space Context switch between two threads in different processes: Must change address space, sometimes invalidating cache This will become relevant when we talk about virtual memory. PID 4277, T0 State: Ready PID 4391, T2 State: Ready PC PC Registers Registers Ready queue

User-Level Threads Early UNIX designs did not support threads at the kernel level OS only knew about processes with separate address spaces However, can still implement threads as a user-level library OS does not need to know anything about multiple threads in a process! How is this possible? Recall: All threads in a process share the same address space. So, managing multiple threads only requires switching the CPU state(PC, registers, etc.)‏ And this can be done directly by a user program without OS help!

Implementing User-Level Threads Alternative to kernel-level threads: Implement all thread functions as a user-level library e.g., libpthread.a OS thinks the process has a single thread Use the same PCB structure as in the last lecture OS need not know anything about multiple threads in a process! How to create a user-level thread? Thread library maintains a TCB for each thread in the application Just a linked list or some other data structure Allocate a separate stack for each thread (usually with malloc)‏

User-level thread address space Stack pointer for thread #1 Stack pointer for thread #2 PC for thread #1 PC for thread #2 (Reserved for OS)‏ Original stack (provided by OS)‏ Stack (for thread #1)‏ Additional thread stacks allocated by process Stack (for thread #2)‏ Heap Uninitialized vars (BSS segment)‏ Initialized vars (data segment)‏ Code (text segment)‏ Stacks must be allocated carefully and managed by the thread library.

User-level Context Switching How to switch between user-level threads? Need some way to swap CPU state. Fortunately, this does not require any privileged instructions! So, the threads library can use the same instructions as the OS tosave or load the CPU state into the TCB. Why is it safe to let the user switch the CPU state?

setjmp() and longjmp() C standard library routines for saving and restoring processor state. int setjmp(jmp_buf env); Save current CPU state in the “jmp_buf” structure If the return is from a direct invocation, setjmp returns 0. If the return is from a call to longjmp, setjmp returns a nonzero value. void longjmp(jmp_buf env, int returnval); Restore CPU state from “jmp_buf” structure, causing corresponding setjmp()call to return with return value “returnval” The value specified by value is passed from longjmp to setjmp. After longjmp is completed, program execution continues as if the corresponding invocation of setjmp had just returned. If the value passed to longjmp is 0, setjmp will behave as if it had returned 1; otherwise, it will behave as if it had returned value. struct jmp_buf { ... } Contains CPU-specific fields for saving registers, program counter, etc.

setjmp/longjmp example int main(int argc, void *argv) { int i, restored = 0; jmp_buf saved; for (i = 0; i < 10; i++) { printf("Value of i is now %d\n", i); if (i == 5) { printf("OK, saving state...\n"); if (setjmp(saved) == 0) { printf("Saved CPU state and breaking from loop.\n"); break; } else { printf("Restored CPU state, continuing where we saved\n”); restored = 1; } } } if (!restored) longjmp(saved, 1); }

setjmp/longjmp example Value of i is now 0 Value of i is now 1 Value of i is now 2 Value of i is now 3 Value of i is now 4 Value of i is now 5 OK, saving state... Saved CPU state and breaking from loop. Restored CPU state, continuing where we saved Value of i is now 6 Value of i is now 7 Value of i is now 8 Value of i is now 9

Preemptive vs. nonpreemptive threads How to prevent a single user-level thread from hogging the CPU? Strategy 1: Require threads to cooperate Called non-preemptive threads Each thread must call back into the thread library periodically This gives the thread library control over the thread's execution yield() operation: Thread voluntarily “gives up” the CPU Pop quiz: What happens when a thread calls yield() ??

Preemptive vs. nonpreemptive threads How to prevent a single user-level thread from hogging the CPU? Strategy 1: Require threads to cooperate Called non-preemptive threads Each thread must call back into the thread library periodically This gives the thread library control over the thread's execution yield() operation: Thread voluntarily “gives up” the CPU Pop quiz: What happens when a thread calls yield() ?? Strategy 2: Use preemption Thread library tells OS to send it a signal periodically A signal is like a hardware interrupt Causes the process to jump into a signal handler The signal handler gives control back to the thread library Thread library then context switches to a new thread

Kernel-level threads Pro: OS knows about all the threads in a process Can assign different scheduling priorities to each one Kernel can context switch between multiple threads in one process Con: Thread operations require calling the kernel Creating, destroying, or context switching require system calls

User-level threads Pro: Thread operations are very fast Typically 10-100x faster than going through the kernel Pro: Thread state is very small Just CPU state and stack, no additional overhead Con: If one thread blocks, it stalls the entire process e.g., If one thread waits for file I/O, all threads in process have to wait Con: Can't use multiple CPUs! Kernel only knows about one CPU context Con: OS may not make good decisions Could schedule a process with only idle threads Could deschedule a process with a thread holding a lock

Threads programming interface Standard API called POSIX threads int pthread_create(pthread_t * thread, pthread_attr_t * attr, void *(*start_routine)(void *), void * arg); thread: Returns a pointer to the new TCB attr: Set of attributes for the new thread Scheduling policy, etc. start_routine: Function pointer to “main function” for new thread arg: Argument to start_routine()‏ void pthread_exit(void *retval); Exit with the given return value int pthread_join(pthread_t thread, void **thread_return); Waits for “thread” to exit, returns return val of the thread

#include <pthread.h> #include <stdio.h> int sum; /* this data is shared by the thread(s) */ void *runner(void *param); /* the thread */ int main(int argc, char *argv[]){ pthread_t tid; /* the thread identifier */ pthread_attr_t attr; /* set of attributes for the thread */ pthread_attr_init(&attr);/* get the default attributes */ pthread_create(&tid,&attr,runner,argv[1]);/* create the thread */ pthread_join(tid,NULL); /* now wait for the thread to exit */ printf("sum = %d\n",sum); } void *runner(void *param){ /* The thread will begin control in this function int i, upper = atoi(param); sum = 0; if (upper > 0) { for (i = 1; i <= upper; i++) sum += i; } pthread_exit(0); } Using Pthreads

Thread Issues All threads in a process share memory: What happens when two threads access the same variable? Which value does Thread 2 see when it reads “foo” ? What does it depend on? Address space write Thread 0 Thread 0 read Thread 1 Thread 2 foo

Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY CENG334Introduction to Operating Systems Threads and Synchronization Topics: • Using threads • Implementation of threads • Synchronization problem • Race conditions and Critical Sections • Mutual exclusion • Locks • Spinlocks • Mutexes

Single and Multithreaded Processes

Synchronization • Threads cooperate in multithreaded programs in several ways: • Access to shared state • e.g., multiple threads accessing a memory cache in a Web server • To coordinate their execution • e.g., Pressing stop button on browser cancels download of current page • “stop button thread” has to signal the “download thread” • For correctness, we have to control this cooperation • Must assume threads interleave executions arbitrarily and at different rates • scheduling is not under application’s control • We control cooperation using synchronization • enables us to restrict the interleaving of executions

Shared Resources • We’ll focus on coordinating access to shared resources • Basic problem: • Two concurrent threads are accessing a shared variable • If the variable is read/modified/written by both threads, then access to the variable must be controlled • Otherwise, unexpected results may occur • We’ll look at: • Mechanisms to control access to shared resources • Low-level mechanisms: locks • Higher level mechanisms: mutexes, semaphores, monitors, and condition variables • Patterns for coordinating access to shared resources • bounded buffer, producer-consumer, … • This stuff is complicated and rife with pitfalls • Details are important for completing assignments • Expect questions on the midterm/final!

Shared Variable Example • Suppose we implement a function to withdraw moneyfrom a bank account: int withdraw(account, amount) { balance = get_balance(account); balance = balance - amount; put_balance(account, balance); return balance; } • Now suppose that you and your friend share a bank account with a balance of $1500.00 • What happens if you both go to separate ATM machines, and simultaneously withdraw $100.00 from the account?

int withdraw(account, amount) { balance = get_balance(account); balance -= amount; put_balance(account, balance); return balance; } int withdraw(account, amount) { balance = get_balance(account); balance -= amount; put_balance(account, balance); return balance; } Example continued • We represent the situation by creating a separate thread for each ATM user doing a withdrawal • Both threads run on the same bank server system Thread 1 Thread 2 • What’s the problem with this? • What are the possible balance values after each thread runs?

balance = get_balance(account); balance -= amount; put_balance(account, balance); put_balance(account, balance); Interleaved Execution • The execution of the two threads can be interleaved • Assume preemptive scheduling • Each thread can context switch after eachinstruction • We need to worry about the worst-case scenario! • What’s the account balance after this sequence? • And who's happier, the bank or you??? balance = get_balance(account); balance -= amount; Execution sequence as seen by CPU context switch context switch

balance = get_balance(account); balance -= amount; put_balance(account, balance); put_balance(account, balance); Interleaved Execution • The execution of the two threads can be interleaved • Assume preemptive scheduling • Each thread can context switch after each instruction • We need to worry about the worst-case scenario! • What’s the account balance after this sequence? • And who's happier, the bank or you??? Balance = $1500 balance = get_balance(account); balance -= amount; Local = $1400 Execution sequence as seen by CPU Local = $1400 Balance = $1400 Balance = $1400!

Race Conditions • The problem is that two concurrent threads access a shared resource without any synchronization • This is called a race condition • The result of the concurrent access is non-deterministic • Result depends on: • Timing • When context switches occurred • Which thread ran at context switch • What the threads were doing • We need mechanisms for controlling access to shared resources in the face of concurrency • This allows us to reason about the operation of programs • Essentially, we want to re-introduce determinism into the thread's execution • Synchronization is necessary for any shared data structure • buffers, queues, lists, hash tables, …

Which resources are shared? • Local variables in a function are not shared • They exist on the stack, and each thread has its own stack • You can't safely pass a pointer from a local variable to another thread • Why? • Global variables are shared • Stored in static data portion of the address space • Accessible by any thread • Dynamically-allocated data is shared • Stored in the heap, accessible by any thread (Reserved for OS) Unshared Stack for thread 0 Stack for thread 1 Stack for thread 2 Shared Heap Uninitialized vars (BSS segment) Initialized vars (data segment) Code (text segment)

Thread 1 Adapted from Matt Welsh’s (Harvard University) slides. Mutual Exclusion • We want to use mutual exclusion to synchronize access to shared resources • Meaning: When only one thread can access a shared resource at a time. • Code that uses mutual exclusion to synchronize its execution is called a critical section • Only one thread at a time can execute code in the critical section • All other threads are forced to wait on entry • When one thread leaves the critical section, another can enter Critical Section (modify account balance)

Thread 1 Thread 2 Adapted from Matt Welsh’s (Harvard University) slides. Mutual Exclusion • We want to use mutual exclusion to synchronize access to shared resources • Meaning: When only one thread can access a shared resource at a time. • Code that uses mutual exclusion to synchronize its execution is called a critical section • Only one thread at a time can execute code in the critical section • All other threads are forced to wait on entry • When one thread leaves the critical section, another can enter Critical Section (modify account balance) 2nd thread must wait for critical section to clear

Thread 1 Thread 2 Adapted from Matt Welsh’s (Harvard University) slides. Mutual Exclusion • We want to use mutual exclusion to synchronize access to shared resources • Meaning: When only one thread can access a shared resource at a time. • Code that uses mutual exclusion to synchronize its execution is called a critical section • Only one thread at a time can execute code in the critical section • All other threads are forced to wait on entry • When one thread leaves the critical section, another can enter Critical Section (modify account balance) 1st thread leaves critical section 2nd thread free to enter

Adapted from Matt Welsh’s (Harvard University) slides. Critical Section Requirements • Mutual exclusion • At most one thread is currently executing in the critical section • Progress • If thread T1 is outside the critical section, then T1 cannot prevent T2 from entering the critical section • Bounded waiting (no starvation) • If thread T1 is waiting on the critical section, then T1 will eventually enter the critical section • Assumes threads eventually leave critical sections • Performance • The overhead of entering and exiting the critical section is small with respect to the work being done within it

Adapted from Matt Welsh’s (Harvard University) slides. Locks • A lock is a object (in memory) that provides the following two operations: • acquire( ): a thread calls this before entering a critical section • May require waiting to enter the critical section • release( ): a thread calls this after leaving a critical section • Allows another thread to enter the critical section • A call to acquire( ) must have a corresponding call to release( ) • Between acquire( ) and release( ), the thread holds the lock • acquire( ) does not return until the caller holds the lock • At most one thread can hold a lock at a time (usually!) • We'll talk about the exceptions later... • What can happen if acquire( ) and release( ) calls are not paired?

int withdraw(account, amount) { acquire(lock); balance = get_balance(account); balance -= amount; put_balance(account, balance); release(lock); return balance; } critical section Adapted from Matt Welsh’s (Harvard University) slides. Using Locks

acquire(lock); balance = get_balance(account); balance -= amount; acquire(lock); put_balance(account, balance); release(lock); balance = get_balance(account); balance -= amount; put_balance(account, balance); release(lock); Adapted from Matt Welsh’s (Harvard University) slides. Execution with Locks • What happens when the blue thread tries to acquire the lock? Thread 1 runs Thread 2 waits on lock Thread 1 completes Thread 2 resumes

The caller busy waits for the lock to be released Adapted from Matt Welsh’s (Harvard University) slides. Spinlocks • Very simple way to implement a lock: • Why doesn't this work? • Where is the race condition? struct lock { int held = 0; } void acquire(lock) { while (lock->held); lock->held = 1; } void release(lock) { lock->held = 0; }

struct lock { int held = 0; } void acquire(lock) { while (lock->held); lock->held = 1; } void release(lock) { lock->held = 0; } What can happen if there is a context switch here? Adapted from Matt Welsh’s (Harvard University) slides. Implementing Spinlocks • Problem is that the internals of the lock acquire/release have critical sections too! • The acquire( ) and release( ) actions must be atomic • Atomic means that the code cannot be interrupted during execution • “All or nothing” execution

struct lock { int held = 0; } void acquire(lock) { while (lock->held); lock->held = 1; } void release(lock) { lock->held = 0; } Adapted from Matt Welsh’s (Harvard University) slides. Implementing Spinlocks • Problem is that the internals of the lock acquire/release have critical sections too! • The acquire( ) and release( ) actions must be atomic • Atomic means that the code cannot be interrupted during execution • “All or nothing” execution This sequence needs to be atomic

Adapted from Matt Welsh’s (Harvard University) slides. Implementing Spinlocks • Problem is that the internals of the lock acquire/release have critical sections too! • The acquire( ) and release( ) actions must be atomic • Atomic means that the code cannot be interrupted during execution • “All or nothing” execution • Doing this requires help from hardware! • Disabling interrupts • Why does this prevent a context switch from occurring? • Atomic instructions – CPU guarantees entire action will execute atomically • Test-and-set • Compare-and-swap

Adapted from Matt Welsh’s (Harvard University) slides. Spinlocks using test-and-set • CPU provides the following as one atomic instruction: • So to fix our broken spinlocks, we do this: bool test_and_set(bool *flag) { … // Hardware dependent implementation } struct lock { int held = 0; } void acquire(lock) { while(test_and_set(&lock->held)); } void release(lock) { lock->held = 0; }

struct lock { int held = 0; } void acquire(lock) { while(test_and_set(&lock->held)); } void release(lock) { lock->held = 0; } Adapted from Matt Welsh’s (Harvard University) slides. What's wrong with spinlocks? • OK, so spinlocks work (if you implement them correctly), andthey are simple. So what's the catch?

Adapted from Matt Welsh’s (Harvard University) slides. Problems with spinlocks • Horribly wasteful! • Threads waiting to acquire locks spin on the CPU • Eats up lots of cycles, slows down progress of other threads • Note that other threads can still run ... how? • What happens if you have a lot of threads trying to acquire the lock? • Only want spinlocks as primitives to build higher-level synchronization constructs

Adapted from Matt Welsh’s (Harvard University) slides. Disabling Interrupts • An alternative to spinlocks: • Can two threads disable/reenable interrupts at the same time? • What's wrong with this approach? struct lock { // Note – no state! } void acquire(lock) { cli(); // disable interrupts } void release(lock) { sti(); // reenable interupts }

Introduction to Concurrent Programming with Threads

Introduction to Concurrent Programming with Threads

Presentation Transcript

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems

CENG334 Introduction to Operating Systems