CS 3214 Introduction to Computer Systems

CS 3214Introduction to Computer Systems Godmar Back Lecture 18

Distinguished Lecture - The Impact of Virtualization on Modern Computing Environments Mendel Rosenblum Stanford University Location:Whittemore300 Date:Friday, October 30, 2009 Time:11:15am-12:30pm A Meet-the-Speaker session will be held 4:00pm-5:30pm in McBryde 106. Sponsored by the Department of Computer Science and the Center for High-End Computing (CHECS) CS 3214 Fall 2009

Announcements • Read Chapter 13 • Project 3 due Oct 30 • Exercise 10 due Nov 3 • 2 day extension • Project 4 & Exercise 11 not before next week • Distinguished Lecture is relevant to class • 4pm-5pm is an opportunity to chat with Dr. Rosenblum in McB 106 CS 3214 Fall 2009

Some of the following slides are taken with permission from Complete Powerpoint Lecture Notes forComputer Systems: A Programmer's Perspective (CS:APP) Randal E. Bryant and David R. O'Hallaron http://csapp.cs.cmu.edu/public/lectures.html Part 7 Threads and Processes CS 3214 Fall 2009

Thread that enters CS locks it Others can’t get in and have to wait Thread unlocks CS when leaving it Lets in next thread which one? FIFO guarantees bounded waiting Highest priority in priority-based systems Can view Lock as an abstract data type Provides (at least) init, acquire, release Locks lock unlock CS 3214 Fall 2009

Managing Locks • Programmer should consider locks resources • Must ensure that they are released on all paths • In Java/C# • Either use built-in “synchronized” or use try/finally • In C: requires careful reasoning about all code paths • Idiomatic structure of functions helps: • minimize number of returns • can use goto’s to create a structure that resembles a try/finally clause if a function acquires multiple locks (not usually done for a single lock, though) • In C++: use RAII pattern CS 3214 Fall 2009

void f() { lock(l1); try { .... if (some error) return; …. lock(l2); try { …. if (some other err) return; …. } finally { unlock(l2); } } finally { unlock(l1); } } Locks In Java vs. C void f() { synchronized (l1) { .... if (some error) return; …. synchronized (l2) { …. if (some other err) return; …. } } } void f(…) { lock(&l1); …. if (some error) goto l1out; …. lock(&l2); …. if (some other error) goto l2out; …. l2out: unlock(l2); l1out: unlock(l1); return; } CS 3214 Fall 2009

synchronized in Java • Built-in ‘locking’ in Java • Any Java object can be used as a lock • 2 forms: block & method attribute • If block form: synchronized (o) { … } uses L = o • If instance method: uses L = ‘this’ • If static method of class C: uses L = C.class • Equivalent to (and best thought of as)try { L.lock(); … } finally { L.unlock(); } • Recursive • Synchronized methods can call each other without deadlocking CS 3214 Fall 2009

Locks: Ownership & Recursion • Locks semantically have notion of ownership • Only lock holder is allowed to unlock • Some systems allow querying lock_held_by_current_thread() • But: POSIX does not enforce this and in fact Linux’s implementation allows non-holders to unlock a lock • What if lock holder tries to acquire locks it already holds? • Nonrecursive locks: deadlock! • default semantics of POSIX locks • Recursive locks: • inc counter • dec counter on lock_release • release when zero • default semantics of Java synchronized CS 3214 Fall 2009

Thread-Safety • Property of a function to yield correct result when called from multiple threads • Attribute that must be documented as part of the API documentation of a library • Functions are not thread-safe if they • Fail to protect shared variables • Rely on persistent state across invocations • Return a pointer to a static variable • Call other functions that aren’t thread safe. CS 3214 Fall 2009

Class 2: Relying on persistent state across multiple function invocations. • Random number generator relies on static state • Fix: Rewrite function so that caller passes in all necessary state (see rand_r()) /* rand - return pseudo-random integer on 0..32767 */ int rand(void) { static unsigned int next = 1; next = next * 1103515245 + 12345; return (unsigned int)(next/65536) % 32768; } /* srand - set seed for rand() */ void srand(unsigned int seed) { next = seed; } CS 3214 Fall 2009

Class 3: Returning a ptr to a staticvariable structhostent *gethostbyname(char name) { static structhostent h; <contact DNS and fill in h> return &h; } • Fixes: • 1. Rewrite code so caller passes pointer to struct. • Issue: Requires changes in caller and callee. • 2. Lock-and-copy • Issue: Requires only simple changes in caller (and none in callee) • However, caller must free memory. hostp = malloc(...)); gethostbyname_r(name, hostp); struct hostent *gethostbyname_ts(char *p) { struct hostent *q = Malloc(...); lock(&mutex); /* lock */ p = gethostbyname(name); *q = *p; /* copy */ unlock(&mutex); return q; } CS 3214 Fall 2009

Performance Cost of Locking • Direct cost: at a minimum, 1 atomic instruction per lock acquire (uncontended) • Indirect cost if lock is contended (i.e., held by other thread when thread attempts to acquire it) • Context switch cost: Thread needs to move into blocked state, context switch to other thread (if any is ready), - later – thread must be made ready again, another context switch back to resume thread • Opportunity cost due to introduced serialization/loss of parallelism: blocked threads cannot use CPU/cores, may lead to underutilization and/or idling of cores • Remember: nobody cares about the optimization of incorrect code. Safety is always paramount CS 3214 Fall 2009

Critical Section Efficiency • As processors get faster, CSE decreases because atomic instructions become relatively more expensive (even in uncontended case!) Source: McKenney, 2005 CS 3214 Fall 2009

Atomicity Violations char *p = ….; /* shared variable */ lock lp; /* protects ‘p’ */ …. intgetplen() { lock(&lp); intlen = strlen(p); unlock(&lp); return len; } … intnchars = getplen(); char *copy = malloc(nchars + 1); lock(&lp); strcpy(copy, p); unlock(&lp); • Locking all accesses to shared variables does not guarantee that a program is free of concurrency bugs • Atomicity violations are another, more insidious type of concurrency violations • Bug pattern is- lock - get information I- unlock (other threads may invalidate information I)- lock- act on now possibly out-of-date information I- unlock CS 3214 Fall 2009

Not holding lock on ‘sb’ – other Thread may change its length Atomicity Violation (Example 2) public synchronized StringBuffer append(StringBuffer sb) { int len = sb.length(); // note: StringBuffer.length() is synchronized int newcount = count + len; if (newcount > value.length) expandCapacity(newcount);sb.getChars(0, len, value, count); // StringBuffer.getChars() is synchronized count = newcount; return this;} • Incorrect even though individual accesses to “sb” are synchronized (protected by a lock) • But “len” may no longer be equal to “sb.length” in call to getChars() • This means simply slapping lock()/unlock() around every access to a shared variable does not thread-safe code make • Found by Flanagan/Freund CS 3214 Fall 2009

Atomicity Constraints • Atomicity violations disregard atomicity constraints • Information read in critical section A must not be used in a critical section B if B relies on it not having been changed lock(); var x = read_var(); unlock(); …. lock(); use(x); unlock(); atomic atomicity required to maintain consistency atomic CS 3214 Fall 2009

How many locks should I use? • Could use one lock for all shared variables • Disadvantage: if a thread holding the lock blocks, no other thread can access any shared variable, even unrelated ones • Sometimes used when retrofitting non-threaded code into threaded framework • Examples: • “BKL” Big Kernel Lock in Linux • Interpreter Lock in Python • GUI Lock in gtk • Ideally, want fine-grained locking • One lock only protects one (or a small set of) variables – how to pick that set? CS 3214 Fall 2009

Mapping Locks To Variables • Choosing which lock should protect which shared variable(s) is not easy – must weigh: • Whether all variables are always accessed together (use one lock if so) • Whether there is an atomicity requirement if multiple variables are accessed in related sequence (must hold single lock if so) • Cost of multiple calls to lock/unlock (advantage of increased parallelism may be offset by those costs) • Whether code inside critical section may block (if not, no throughput gain from fine-grained locking on uniprocessor) CS 3214 Fall 2009

Rules for Easy Locking • Every shared variable must be protected by a lock • Establish this relationship with code comments • /* protected by … <lock>*/ • Acquire lock before touching (reading or writing) variable • Release when done, on all paths • One lock may protect more than one variable, but not too many • If in doubt, use fewer locks (may lead to worse efficiency, but less likely to lead to race conditions or deadlock) • If manipulating multiple variables, acquire locks assigned to protecting each • Acquire locks always in same order (doesn’t matter which order, but must be same) • Release in opposite order • Don’t release any locks before all have been acquired (two-phase locking) CS 3214 Fall 2009

Coordinating Multiple Threads • Aside from coordinating access to shared items, thread may need to communicate about events • “has event A already happened in another thread?” • aka “precedence constraint”, or “scheduling constraint” • Do B after A • Must do so • Correctly (never miss that event A has occurred when in fact it has) • Efficiently • Don’t waste resources in the process • Don’t unnecessarily delay notification of event A CS 3214 Fall 2009

intcoin_flip; static void * thread1(void *_) { coin_flip = rand() % 2; printf("Thread 1: flipped coin %d\n", coin_flip); return NULL; } static void * thread2(void *_) { printf("Thread 2: flipped coin %d\n", coin_flip); return NULL; } int main() { inti, N = 2; pthread_t t[N]; srand(getpid()); pthread_create(&t[1], NULL, thread2, NULL); pthread_create(&t[0], NULL, thread1, NULL); for (i = 0; i < N; i++) pthread_join(t[i], NULL); return 0; } Q.: How can thread2 make sure that ‘coin_flip’ has occurred before printing its outcome? CS 3214 Fall 2009

intcoin_flip; volatile boolcoin_flip_done; static void * thread1(void *_) { coin_flip = rand() % 2; coin_flip_done = true; printf("Thread 1: flipped coin %d\n", coin_flip); return NULL; } Thread 2 could “busy-wait” – spin until thread 1 completes the coin flip.Exceptions not withstanding, this is practically never an acceptable solution. The somewhat less wasteful variant of busy-waiting: while (!coin_flip_done)sched_yield(); is not acceptable, either. • wastes CPU cycles • is fragile (volatile needed when using –O) • does not document semantics static void * thread2(void *_) { /* Thread 2 spins, "busy-waits" until the coin flip is done. * This is an unacceptable solution. Bad for the planet, too. */ while (!coin_flip_done) continue; printf("Thread 2: flipped coin %d\n", coin_flip); return NULL; } CS 3214 Fall 2009

Source: inter.scoutnet.org Semaphores • Invented by Edsger Dijkstra in 1960s • Counter S, initialized to some value, with two operations: • P(S) or “down” or “wait” – if counter greater than zero, decrement. Else wait until greater than zero, then decrement • V(S) or “up” or “signal” or “post” – increment counter, wake up any threads stuck in P. • Semaphores don’t go negative: • #V + InitialValue - #P >= 0 • Note: direct access to counter value after initialization is not allowed • Counting Semaphores vs Binary Semaphores • Binary: counter can only be 0 or 1 • Simple to implement, yet powerful • Can be used for many synchronization problems CS 3214 Fall 2009

intcoin_flip; sem_tcoin_flip_done; // semaphore for thread 1 to signal coin flip static void * thread1(void *_) { coin_flip = rand() % 2; sem_post(&coin_flip_done); // raise semaphore, increment, 'up' printf("Thread 1: flipped coin %d\n", coin_flip); } POSIX Semaphores Notice the 3rd argument of sem_init() – it gives the initial value of the semaphore: ‘0’ means the semaphore is used to express scheduling constraint static void * thread2(void *_) { // wait until semaphore is raised, // then decrement, 'down' sem_wait(&coin_flip_done); printf("Thread 2: flipped coin %d\n", coin_flip); } int main() { … sem_init(&coin_flip_done, 0, 0); pthread_create(&t[1], NULL, thread2, NULL); pthread_create(&t[0], NULL, thread1, NULL); … } CS 3214 Fall 2009

Semaphores can be used to build locks Must initialize semaphore with 1 to allow one thread to enter critical section This is not a recommended style, despite of what Bryant & O’Hallaron suggest – you should use a mutex instead [Cantrill & Bonwick 2008] Easily generalized to allow at most N simultaneous threads: multiplex pattern (i.e., a resource can be accessed by at most N threads) Implementing Mutual Exclusion with Semaphores sem_t S; sem_init(&S, 0, 1); lock_acquire() { // try to decrement, wait if 0 sem_wait (S); } lock_release() { // increment (wake up waiters if any) sem_post(S); } CS 3214 Fall 2009

CS 3214 Introduction to Computer Systems