1 / 22

Threads Cannot be Implemented As a Library

Andrew Hobbs. Threads Cannot be Implemented As a Library. As a library...what does that mean?. Language specification doesn't say anything about it The specification defines what compilers should do So the compiler doesn't know about them either. How does this affect programming?.

elsa
Télécharger la présentation

Threads Cannot be Implemented As a Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andrew Hobbs Threads Cannot be Implemented As a Library

  2. As a library...what does that mean? • Language specification doesn't say anything about it • The specification defines what compilers should do • So the compiler doesn't know about them either

  3. How does this affect programming? • The compiler transforms your code to hopefully make it as fast as possible • It has some restrictions, depending on the language specification • But if the compiler doesn't know about concurrency... • It can make optimizations that are valid in sequential programs, but can cause bugs in multiprocessor environments

  4. An example Assuming x and y are both set to 0, suppose we have 2 threads: Thread 1: x = 1; r1 = y; Thread 2: y = 1; r2 = x; What are the possible values of r1 and r2 at the end of both threads executing?

  5. An example But what if our compiler changes our code to the following? Thread 1: r1 = y; x = 1; Thread 2: r2 = x; y = 1; What are the possible values of r1 and r2 at the end of both threads executing? This results could turn out differently...but from the compiler's view, everything is fine, because it doesn't know each thread can interact with others.

  6. Why did this happen? • The compiler didn't know about concurrency, so it performed optimizations assuming sequential execution • Some of these don't work with concurrency! • In fact, the hardware itself can also do this in an attempt to speed up execution, by (for example) putting loads before unrelated stores

  7. The Pthreads approach No threads shall read or modify memory that another thread is modifying (such an activity is called a race condition) To restrict access, the programmer uses synchronization routines: • pthread_mutex.lock() • pthread_mutex.unlock() • …

  8. The Pthreads approach If the programmer uses the synchronization methods correctly to prevent race conditions, then they should have no issues But this isn't quite true...

  9. Concurrent modification Suppose we had the following two threads: Thread 1: if (x == 1) ++y; Thread 2: if (y == 1) ++x; Is there a data race in this program?

  10. Concurrent modification What if our compiler modified our code a little? Thread 1: ++y; if (x != 1) --y; Thread 2: ++x; if (y != 1) --x; Is there a data race in this program?

  11. Adjacent data Suppose we had the following structure definition: struct { int a:17; int b:15 } x; There are probably no machines that have a 17-bit wide store, so if someone were to attempt to execute: x.a = 42; it would probably be done like this: { tmp = x; // Read both fields into // 32-bit variable tmp &= ~0x1ffff; // Mask off old a tmp |= 42; x = tmp; // Overwrite all of x }

  12. Adjacent data Suppose we had the following structure definition: struct { char a; char b; char c; char d; Char e; char f; char g; char h; } x; Where a is the only field that needs to be protected by a lock. If that was the case, some programmer might write the following code: x.b = ’b’; x.c = ’c’; x.d = ’d’; x.e = ’e’; x.f = ’f’; x.g = ’g’; x.h = ’h’; x = ’hgfedcb\0’ | x.a; But a compiler might realize that it could just write all of the data at once as a 64-bit quantity (not exact syntax):

  13. Register Promotion Suppose we had a global shared variable x, protected by a lock...but only conditionally, perhaps only if we had actually created other threads: for (...) { ... if (mt) pthread_mutex_lock(...); x = ... x ... if (mt) pthread_mutex_unlock(...); } r = x; for (...) { ... if (mt) { x = r; pthread_mutex_lock(...); r = x; } r = ... r ... if (mt) { x = r; pthread_mutex_unlock(...); r = x; } } x = r; If the conditionals are rarely taken, it might decide to promote x to a register to increase the performance:

  14. What does this mean? Pthreads says that as long as we prevent race conditions with the synchronization functions, we will be fine But since our compiler doesn't know, it might make optimizations that break it, even though it looks perfectly fine to us We can't use locks at a high level if the presence of race conditions depends on the compiler and the hardware

  15. Performance So why are we running multiple threads? To (hopefully) get better performance out of our program But locking is expensive! Atomic updates are hundreds of times slower than normal ones

  16. Is synchronization always needed? Consider the following Sieve of Eratosthenes implementation: for (my_prime = start; my_prime < 10000; ++my_prime) if (!get(my_prime)) { for (multiple = my_prime; multiple < 100000000; multiple += my_prime) if (!get(multiple)) set(multiple); } What happens if we run this on multiple threads, with all of them accessing one shared data block?

  17. The conclusions? Sometimes there are times when you can gain large performance benefits without directly using atomic operations But if we use a library that disallows this (like Pthreads), we are throwing away this ability But we are allowed to, then we need the compiler and hardware to somehow know about it and help us

  18. The conclusions? So how do we get the compiler and hardware to help us? We need to have the programming language itself define a memory model so that the programmer knows whether there are races Only if we have that can we reason about our programs

More Related