CENG 334 – Operating Systems 03- Threads & Synchronization

CENG 334 – Operating Systems03- Threads & Synchronization Asst. Prof. Yusuf Sahillioğlu Computer Eng. Dept, , Turkey

Threads • Threads. • Concurrent work on the same process for efficiency. • Several activities going on as part of the same process. • Share registers, memory, and other resources. • All about data synchronization:

Concurrency vs. Parallelism • Concurrency: 2 processes or threads run concurrently (are concurrent) if their flows overlap in time • Otherwise, they are sequential. • Examples (running on single core): • Concurrent: A & B, A & C • Sequential: B & C • Parallelism: requires multiple resources to execute multiple processes or threads at a given time instant. A&B parallel:

Concurrent Programming • Many programs want to do many things “at once” • Web browser: • Download web pages, read cache files, accept user input, ... • Web server: • Handle incoming connections from multiple clients at once • Scientific programs: • Process different parts of a data set on different CPUs • In each case, would like to share memory across these activities • Web browser: Share buffer for HTML page and inlined images • Web server: Share memory cache of recently-accessed pages • Scientific programs: Share memory of data set being processes • Can't we simply do this with multiple processes?

Why processes are not always ideal? • Processes are not very efficient • Each process has its own PCB and OS resources • Typically high overhead for each process: e.g., 1.7 KB per task_struct on Linux! • Creating a new process is often very expensive • Processes don't (directly) share memory • Each process has its own address space • Parallel and concurrent programs often want to directly manipulate the same memory • e.g., When processing elements of a large array in parallel • Note: Many OS's provide some form of inter-process shared memo • e.g., UNIX shmget() and shmat() system calls • Still, this requires more programmer work and does not address the efficiency issues.

Can we do better? • What can we share across all of these tasks? • Same code – generally running the same or similar programs • Same data • Same privileges • Same OS resources (files, sockets, etc.)‏ • What is private to each task? • Execution state: CPU registers, stack, and program counter • Key idea of this lecture: • Separate the concept of a process from a thread of control • The process is the address space and OS resources • Each thread has its own CPU execution state

Threads vs. Processes • Processes form a tree hierarchy. • Threads form a pool of peers. • Each thread can kill any other. • Each thread can wait for any other thread to terminate. • Main thread: first thread to run in a process. Process hierarchy Thread pool P0 T2 T4 T1 P1 shared code, data and kernel context sh sh sh T3 T5 foo

Address space • Thread 0 • Thread 0 • Thread 1 • Thread 2 Threads vs. Processes • Each process has one or more threads “within” it • Each thread has its own stack, CPU registers, etc. • All threads within a process share the same address space and OS resources • Threads share memory, so they can communicate directly! • The thread is now the unit of CPU scheduling • A process is just a “container” for its threads • Each thread is bound to its containing process

(Old) Process Address Space • 0xFFFFFFFF (Reserved for OS)‏ • Stack • Stack pointer • Address space • Heap Uninitialized vars (BSS segment)‏ Initialized vars (data segment)‏ Code (text segment)‏ • Program counter • 0x00000000

(New) Process Address Space w/ Threads

Implementing Threads • Given what we know about processes, implementing threads is “easy” • Idea: Break the PCB into two pieces: • Thread-specific stuff: Processor state • Process-specific stuff: Address space and OS resources (open files, etc.)

Thread Control Block (TCB) • TCB contains info on a single thread • Just processor state and pointer to corresponding PCB • PCB contains information on the containing process • Address space and OS resources ... but NO processor state!

Thread Control Block (TCB) • TCB's are smaller and cheaper than processes • Linux TCB (thread_struct) has 24 fields • Linux PCB (task_struct) has 106 fields • Hence context switching threads is cheaper than context switching processes.

Context Switching • TCB is now the unit of a context switch • Ready queue, wait queues, etc. now contain pointers to TCB's • Context switch causes CPU state to be copied to/from the TCB • Context switch between two threads in the same process: • No need to change address space • Context switch between two threads in different processes: • Must change address space, sometimes invalidating cache • This will become relevant when we talk about virtual memory.

Thread State • State shared by all threads in process: • Memory content (global variables, heap, code, etc). • I/O (files, network connections, etc). • A change in the global variable will be seen by all other threads (unlike processes). • State private to each thread: • Kept in TCB (Thread Control Block). • CPU registers, program counter. • Stack (what functions it is calling, parameters, local variables, return addresses). • Pointer to enclosing process (PCB).

Thread Behavior • Some useful applications with threads: • One thread listens to connections; others handle page requests. • One thread handles GUI; other computations. • One thread paints the left part, other the right part. • ..

Thread Behavior • Single threaded • main() computePI(); //never finish printf(“hi”); //never reach here A process has a single thread of control: if it blocks on something nothing else can be done. • Multi-threaded • main() createThread( computePI() ); //never finish createThread( printf(“hi”) ); //reaches here • main() createThread( scanf() ); //not finish ‘till user enters (not in CPU) createThread( autoSaveDoc() ); //reaches here while waiting on I/O

Thread Behavior • Execution flow:

Threads on a Single CPU • Still possible. • Multitasking idea • Share one CPU among many processes (context switch). • Multithreading idea • Share the same process among many threads (thread switch). • Whenever this process has the opportunity to run in the CPU, OS can select one of its many threads to run it for a while, and so on. • One pid, several thread ids. Schedulable entities increased.

Threads on a Single CPU • If threads are all CPU-bound, e.g., no I/O or pure math, then we do not gain much by multithreading. • Luckily this is usually not the case, e.g., 1 thread does the I/O, .. • Select your threads carefully, one is I/O-bound, other is CPU-bound, .. • With multicores, we still gain big even if threads are all CPU-bound.

Multithreading Concept • Multithreading concept: pseudo-parallel runs. (pseudo: interleaving switches on 1 CPU). funct1() { .. } funct2() { .. } main() { .. createThread( funct1() ); .. createThread( funct2() ); .. createThread( funct1() ); .. } thread1 thread2 thread3 thread4

Single- vs. Multi-threaded Processes • Shared and private stuff:

Benefits of Threads • Responsiveness • One thread blocks, another runs. • One thread may always wait for the user. • Resources sharing • Very easy sharing (use global variables; unlike msg queues, pipes, shmget). • Be careful about data synchronization tough. • Economy • Thread creation is fast. • Context switching among threads may be faster. • ‘cos you do not have to duplicate code and global variables (unlike processes). • Scalability • Multiprocessoers can be utilized better. • Process that has created 4 threads can use all 4 cores (single-threaded proc. utilize 1 core).

Naming • Why do we call it a thread anyway? • Execution flow of a program is not smooth, looks like a thread. • Execution jumps around all over the place (switches) but integrity is intact.

Multithreading Example: WWW • Client (Chrome) requests a page from server (amazon.com). • Server gives the page name to the thread and resumes listening. • Thread checks the disk cache in memo; if page not there, do disk I/O;sends the page to the client.

Threading Support • User-level threads: are threads that the OS is not aware of. They exist entirely within a process, and are scheduled to run within that process's time slices. • Kernel-level threads: The OS is aware of kernel-level threads. Kernel threads are scheduled by the OS's scheduling algorithm, and require a "lightweight" context switch to switch between (that is, registers, PC, and SP must be changed, but the memory context remains the same among kernel threads in the same process).

Threading Support • User-level threads are much faster to switch between, as there is no context switch; further, a problem-domain-dependent algorithm can be used to schedule among them. CPU-bound tasks with interdependent computations, or a task that will switch among threads often, might best be handled by user-level threads.

Threading Support • Kernel-level threads are scheduled by the OS, and each thread can be granted its own time slices by the scheduling algorithm. The kernel scheduler can thus make intelligent decisions among threads, and avoid scheduling processes which consist of entirely idle threads (or I/O bound threads). A task that has multiple threads that are I/O bound, or that has many threads (and thus will benefit from the additional time slices that kernel threads will receive) might best be handled by kernel threads. • Kernel-level threads require a system call for the switch to occur; user-level threads do not.

Threading Support • Thread libraries that provide us API for creating and managing threads. • pthreads, java threads, win32 threads. • Pthreads (POSIX threads) interface. • Common in Unix operating sytems: Solaris, Mac OS, Linux. • No implemented in the standard C library; search the library named pthread while compiling: gcc –o thread1 –lpthread thread1.c • Implementation-dependent; can be user- or kernel-level. • Functions in pthread library are actually doing linux system calls, e.g., pthread_create()  clone() • See sample codes to warm up on pthreads: • http://user.ceng.metu.edu.tr/~ys/ceng334-os/threadster0.c also see threadster1.c

Pthreads thread1 thread2 int main(..) { .. .. pthread_create(&tid,…,runner,..); pthread_join(tid); printf (sum); } runner (..){ .. sum = .. pthread_exit(); } wait

Single- to Multi-thread Conversion • In a simple world • Identify functions as parallel activities. • Run them as separate threads. • In real world • Single-threaded programs use global variables, library functions (malloc). • Be careful with them. • Global variables are good for easy-communication but need special care.

Single- to Multi-thread Conversion • Careful with global variable:

Single- to Multi-thread Conversion • Global, local, and thread-specific variables. • thread-specific: global inside the thread, but not for the whole process, i.e., other threads cannot access it, but all the functions of the thread can (no problem ‘cos fnctns within a thread executed sequentially). • No language support for this variable type; C cannot do this. • Thread API has special functions to create such variables.

Single- to Multi-thread Conversion • Use thread-safe (reentrant, reenterable) library routines. • Multiple malloc()s are executed sequentially in a single-threaded code. • Say one thread is suspended on malloc(); another process calls malloc() and re-enters it while the 1st one has not finished. • Library functions should be designed to be reentrant = designed to have a second call to itself from the same process before it’s finished. • To do so, do not use global variables.

Address space • write • Thread 0 • Thread 0 • read • Thread 1 • Thread 2 Thread Issues • foo • All threads in a process share memory: • What happens when two threads access the same variable? • Which value does Thread 2 see when it reads “foo”? • What does it depend on?

Thread Issues /* shared */ volatile unsigned intcnt = 0; //see Note section below for volatile #define NITERS 100000000 intmain() { pthread_t tid1, tid2; Pthread_create(&tid1, NULL, count, NULL); Pthread_create(&tid2, NULL, count, NULL); Pthread_join(tid1, NULL); Pthread_join(tid2, NULL); if (cnt != (unsigned)NITERS*2) printf("BOOM! cnt=%d\n", cnt); else printf("OK cnt=%d\n", cnt);} /* thread routine */ void *count(void *arg) { int i; for (i=0; i<NITERS; i++) cnt++; return NULL; } • asd linux> ./badcnt BOOM! cnt=198841183 linux> ./badcnt BOOM! cnt=198261801 linux> ./badcnt BOOM! cnt=198269672 cnt should be equal to 200,000,000. What went wrong?

Thread Issues • Assembly code for counter loop:

Thread Issues • Assembly code for counter loop. • Unpredictable switches of threads by scheduler will create inconsistencies on the shared data, e.g., global variable cnt. • Handling this is arguably the most important topic of this class: Synchronization.

Synchronization • Synchronize threads/coordinate their activities so that when you access the shared data (e.g., global variables) you are not having a trouble. • Multiple processes sharing a file or shared memory segment also require synchronization (= critical section handling).

Synchronization • The part of the process that is accessing and changing shared data is called its critical section. Thread 3 Code Thread 1 Code Thread 2 Code Change X Change X Change Y Change Y Change Y Change X Assuming X and Y are shared data.

Synchronization • Solution: No 2 processes/threads are in their critical section at the same time, aka Mutual Exclusion (mutex). • Must assume processes/threads interleave executions arbitrarily (preemptive scheduling) and at different rates. • Scheduling is not under application’s control. • We control coordination using data synchronization. • We restrict interleaving of executions to ensure consistency. • Low-level mechanism to do this: locks, • High-level mechanisms: mutexes, semaphores, monitors, condition variables.

Synchronization • General way to achieve synchronization:

Synchronization • An example: race condition. Critical section: Critical section: critical section respected  not respected 

Synchronization • Another example: race condition. • Assume we had 5 items in the buffer. • Then • Assume producer just produced a new item, put it into buffer, and about to do count++ • Assume consumer just retrieved an item from the buffer, and about to do count-- Producer Consumer or Producer Consumer

Synchronization • Another example: race condition. • Critical region:is where we manipulate count. • count++ could be implemented as (similarly, count--) • register1 = count; //read value register1 += 1; //increase value count = register1; //write back

Synchronization • Then: Count register1 PRODUCER (count++) 5 6 5 6 4 register1 = count register1 = register1 + 1 count = register1 register1 = count register1 = register1 + 1 count = register1 register2 5 4 CONSUMER (count--) register2 = count register2 = register2 – 1 count = register2 register2 = count register2 = register2 – 1 count = register2 CPU Main Memory

balance = get_balance(account); balance -= amount; put_balance(account, balance); put_balance(account, balance); Synchronization • Another example: race condition. • 2 threads executing their critical section codes  • Although 2 customers withdrew 100TL, balance is 900TL, not 800TL  Balance = 1000TL balance = get_balance(account); balance -= amount; Local = 900TL Execution sequence as seen by CPU Local = 900TL Balance = 900TL Balance = 900TL!

Thread 1 Synchronization • Solution: mutual exclusion. • Only one thread at a time can execute code in their critical section. • All other threads are forced to wait on entry. • When one thread leaves the critical section, another can enter. Critical Section (modify account balance)

Thread 1 Thread 2 Synchronization • Solution: mutual exclusion. • Only one thread at a time can execute code in their critical section. • All other threads are forced to wait on entry. • When one thread leaves the critical section, another can enter. Critical Section (modify account balance) 2nd thread must wait for critical section to clear

CENG 334 – Operating Systems 03- Threads & Synchronization