Summary of Boehm’s “threads … as a library” + other thoughts and class discussions

Summary of Boehm’s“threads … as a library”+ other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4

Assignment : Dining Phil code • Some versions of Dining Phil have data races • What are races? • Why are they harmful? • Are they always harmful? • P1 : temp = shared-x • P2 : x = 1 • versus • the same codes inside a single lock/unlock • In this case, the atomicity of the locations gives the same computational semantics • Be sure of the atomicity being assumed!

Why we should know memory models • Not very intuitive • Takes time to sink in • Something as important as this stays with one only through repeated exposures • Other classes do not give emphasis • They attempt to sweep things under the rug • They are playing ‘head in the sand’! • While it is like a grain of sand, its presence under the eye-lid or in a ball-bearing is what mem models are akin to… • This is dangerous! • Stifles understanding • We are in a world where even basic rules are being broken • Academia is about not buying into decrees • e.g. “goto”s always harmful?

Why we should know memory models • Clearly, success in multi-core programming depends on having high-level primitives • Unfortunately nobody has a clue as to which high level primitives “work” • are safe and predictable • are efficient • Offering an inefficient high-level primitive does more damage • People will swing clear back to a much lower primitive!

Why we should know memory models • Till we form a good shared understanding of which high level primitives work well, we must be prepared to evaluate the low level effects of existing high level primitives • The added surprises that compilers throw in can cause such non-intuitive outcomes that we had better know that they exist, and solve issues when they arise

Why we should know memory models • Locks are expensive • Performance and energy • If lock-free code works superbly faster, and there is an alternate (lock-free) reasoning to explain such behaviors, clearly one must entertain such thoughts • Need all tools in one’s kit • HW costs are becoming very skewed • Attend Uri Weiser’s talk Feb 12th • Finally, we need to understand what tools such as Inspect are actually doing!

Where mem models mattered • PCI bus ordering (producer/consumer broken) • Holzmann’s experience in multi-core SPIN • Our class experiments • OpenMPmem model in conflict with Gccmem model • In understanding architectural consequences • Hit-under-miss optimization in speculative execution (in snoopy busses such as HP Runway)

On “HW / SW” split • Till the dust settles (if at all) in multi-core computing, you had better be interested in HW and SW matters • HW matters • C-like low level behavior matters • Later we will learn whether “comfortable” abstractions such as C# / Java are viable • Of course when programming in the large, we will prefer such high level views; when understanding concepts, however, we need all the “nuts and bolts” exposed…

Boehm’s points • Threads are going to be increasingly used • We focus on languages such as C/C++ where threads are not built into the language – but are provided through add-on libraries • Ability to program in C/Pthreads comes through ‘painful experience’ – not through strict adherence to standards • This paper is an attempt to ameliorate that

Page 2: Thread lib, lang, compiler … • Thread semantics cannot be argued purely within the context of the libraries • They involve the • compiler semantics • language semantics (together the “software” or “language” mem model) • Disciplined use of concurrency thru thread APIs is OK for 98% of the users • But need to know the 2% uses outside .. esp in a world where we rely on MP systems for performance

P2 S3: Pthread Approach to Concur. • Seq consistency is the intuitive model • Too expensive to implement as such • x = 1 ; r1 = y; • y = 1 ; r2 = x; • final value of x=y=0 is allowed (and is what happens today) • Compilers may reorder subject to intra-thread dependencies • HW may reorder subject to intra-thread dependencies

P2 S3: Pthread silent on mem model semantics ; reasons: • Many don’t understand • So they preferred “simple rules” • Instead, it “decrees” : • Synchronize thread execution using mutex_lock , mutex_unlock • then it is expected that no two threads race on a single location • (Java is more precise even about racing semantics)

P2 S3: Pthread silent on mem model semantics ; reasons: • In practice, mutex_lock etc contain memory barries (fences) that prevent HW reordering around the call • Calls to mutex_lock etc treated as opaque function calls • No instructions can be moved across • If f() calls mutex_lock(), even f() is treated as such • Unfortunately, many real systems intentionally or unknowingly violate these rules

P4 S4: Correctness Issues • Consider this program • if (x==1) ++y; • if (y==1) ++x; • Is (x==1, y==1) acceptable? Is there a race? • Not under SC! • However if the compiler transforms the code to • ++y ; if (x != 1) –y; • ++x ; if (y != 1) –x; • then there is a race / x==1, y==1 is allowed… is a possible conclusion (or say the semantics are undefined)

P5 S4.2 Rewriting of adjacent data • Bugs of this type actually have arisen • struct (int a:17, int b:15} x • Now realise “x.a=42” as • {tmp =x; tmp &= ~0x1fff; tmp |= 42; x=tmp; } • Introduces an “unintended” write of b also! • OK for sequential • But in concurrent setting, a concurrent “b” update could now race !! • Race is not “seen” at source level!

P5 : another example • struct {char a; char b; … char h; } x • x.b = ‘b’; x.c=‘c’; … ; x.h = ‘h’; can be realized as • x = ‘hgfedcb\0’ | x.a • Now if you protect “a” with one lock and “b thru h” with another lock” , you are hosed – there is a data race! • C should define when adjacent data may be over-written

P5/P6 : register promotion • Compilers must be aware of existence of threads • Consider the code optimized to speed up for serial case • for(..){ • if (mt) lock(…); • x = ..x… • if (mt) unlock(..); • }

P5/P6 : register promotion • for(..){ • if (mt) lock(…); • x = ..x… • if (mt) unlock(..); } • can be optimized according to Pthread rules to • r=x; for(…) {.. • if (mt) { x=r; lock(…); r=x; } • r = …x…; • if (mt) { x=r; unlock(..); r=x; } } • x=r; • Fully broken – reads/writes to x without holding lock!

avoiding expensive synch. • for(mp = start; mp<10^4; ++mp) • if (!get(mp)) { • for (mult=mp; mult<10^8; mult +=mp) • if (!get(mult)) set(mult) • Sieve algo • Benefits from races !!

Summary of Boehm’s “threads … as a library” + other thoughts and class discussions

Summary of Boehm’s “threads … as a library” + other thoughts and class discussions

Presentation Transcript

Applets Event Handling Threads and more in Java

Threads, Fasteners, and Springs

MYOCARDIAL INFARCTION AND ACUTE CORONARY SYNDROME

Chapter 13

Threads and Concurrency

The Virtual International Authority File

BW 4/16/2013

Advanced Programming

Concurrent Programming Without Locks

The Next Generation of Library Automation and Discovery: Key Issues and Trends

Summary from the last class

How to Write An Objective Summary

Introduction to CUDA (2 of 2)

Class Discussions 2014

CS450/550 Operating Systems Lecture 2 Processes and Threads

Synchronization

Processes and Threads

Today’s Producers

Concurrent Programming, Threads

SQL