The State of Parallel Programming

FT07 The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation

Parallel Computing is Now Mainstream • Cores are reaching performance limits • More transistors per core just makes it hot • New processors are multi-core • and maybe multithreaded as well • Uniform shared memory within a socket • Multi-socket may be pretty non-uniform • Logic cost ($ per gate-Hz) keeps falling • New “killer apps” will doubtless need more performance • How should we write parallel programs?

Parallel Programming Practice Today • Threads and locks • SPMD languages • OpenMP • Co-array Fortran, UPC, and Titanium • Message passing languages • MPI, Erlang • Data-parallel languages • Cuda, OpenCL • Most of these are pretty low level

Higher Level Parallel Languages • Allow higher level data-parallel operations • E.g. programmer-defined reductions and scans • Exploit architectural support for parallelism • SIMD instructions, inexpensive synchronization • Provide for abstract specification of locality • Present a transparent performance model • Make data races impossible For the last item, something must be done about unrestricted use of variables

Shared Memory is not the Problem • Shared memory has some benefits: • Forms a delivery vehicle for high bandwidth • Permits unpredictable data-dependent sharing • Provides a large synchronization namespace • Facilitates high level language implementations • Language implementers like it as a target • Non-uniform memory can even scale • But shared variables are an issue: stores do not commute with other loads or stores • Shared memory isn’t a programming model

Pure Functional Languages • Imperative languages do computations by scheduling values into variables • Their parallel dialects are prone to data races • There are far too many parallel schedules • Pure functional languages avoid data races simply by avoiding variables entirely • They compute new constants from old • Loads commute so data races can’t happen • Dead constants can be reclaimed efficiently • But no variables implies no mutable state

Mutable State is Crucial for Efficiency • To let data structures inexpensively evolve • To avoid always copying nearly all of them • Monads were added to pure functional languages to allow mutable state (and I/O) • Plain monadic updates may still have data races • The problem is maintaining state invariants • These are just a program’s “conservation laws” • They describe the legal attributes of the state • As with physics, they are associated with a certain generalized type of commutativity

Maintaining Invariants • Updates perturb, then restore an invariant • Program composability depends on this • It’s automatic for us once we learn to program • How can we maintain invariants in parallel? • Two requirements must be met: • Updates must not interfere with each other • That is, they must be isolatedin some fashion • Updates must finish once they start • …lest the next update see the invariant false • We say the state updates must be atomic • Updates that are both isolated and atomic are called transactions

Commutativity and Non-Determinism • If p and q preserve invariant I and do not interfere, their parallel execution { p || q } also preserves I† • If p and q are performed in isolation and atomically, i.e. astransactions, then they will not interfere‡ • Operations may not commute with respect to state • But we always get commutativity with respect to the invariant • This leads to a weaker form of determinism • Long ago some of us called it “good non-determinism” • It’s the non-determinism operating systems rely on †Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach. CACM 19(5), pp. 279−285, May 1976. ‡Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2), pp. 281−296, April 1984.

Example: Hash Tables • Hash tables implement sets of items • The key invariant is that an item is in the set iff its insertion followed all removals • There are also storage structure invariants, e.g. hash buckets must be well-formed linked lists • Parallel insertions and removals need only maintain the logical AND of these invariants • This may not result in deterministic state • The order of items in a bucket is unspecified

High Level Data Races • Some loads and stores can be isolated and atomic but cover only a part of the invariant • E.g. copying data from one structure to another • If atomicity is violated, the data can be lost • Another example is isolating a graph node while deleting it but then decrementing neighbors’ reference counts with LOCK DEC • Some of the neighbors may no longer exist • It is challenging to see how to automate data race detection for examples like these

Other Examples • Data bases and operating systems commonly mutate state in parallel • Data bases use transactions to achieve consistency via atomicity and isolation • SQL programming is pretty simple • SQL is arguably not general-purpose • Operating systems use locks for isolation • Atomicity is left to the OS developer • Lock ordering is used to prevent deadlock • A general purpose parallel language should easily handle applications like these

Implementing Isolation • Analysis • Proving concurrent state updates are isolated • Locking • Deadlock must be handled somehow • Buffering • Often used for wait-free updates • Partitioning • Partitions can be dynamic, e.g. as in quicksort • Serializing • These schemes can be nested

Isolation in Existing Languages • Static in space: MPI, Erlang • Dynamic in space: Refined C, Jade • Static in time: Serial execution • Dynamic in time: Single global lock • Static in both: Dependence analysis • Semi-static in both: Inspector-executor • Dynamic in both: Multiple locks

Atomicity • Atomicity means “all or nothing” execution • State changes must be all done or undone • Isolation without atomicity has little value • But atomicity is vital even in the serial case • Implementation techniques: • Compensating, i.e. reversing a computation “in place” • Logging, i.e. remembering and restoring the original state values • Atomicity is challenging for distributed computing and I/O

Exceptions • Exceptions can threaten atomicity • An aborted state update must be undone • What if a state update depends on querying a remote service and the query fails? • The message from the remote service should send exception information in lieu of the data • Message arrival can then throw as usual and the partial update can be undone

Transactional Memory • “Transactional memory” means transaction semantics within lexically scoped blocks • TM has been a hot topic of late • As usual, lexical scope seems a virtue here • Adding TM to existing languages has problems • There is a lot of optimization work to do • to make atomicity and isolation highly efficient • Meanwhile, we shouldn’t ignore traditional ways to get transactional semantics

Whence Invariants? • Can we generate invariants from code? • Only sometimes, and it is difficult even then • Can we generate code from invariants? • Is this the same as intentional programming? • Can we write invariants plus code and let the compiler check invariant preservation? • This is much easier, but may be less attractive • Can languages make it more likely that a transaction covers the invariant’s domain? • E.g. leveraging objects with encapsulated state • Can we at least debug our mistakes?

Conclusions • Functional languages with transactions enable higher level parallel programming • Microsoft is heading in this general direction • Efficient implementations of isolation and atomicity are important • We trust architecture will ultimately help support these things • The von Neumann model needs replacing, and soon

YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com

Learn More On Channel 9 • Expand your PDC experience through Channel 9. • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses. channel9.msdn.com/learn Built by Developers for Developers….

The State of Parallel Programming

The State of Parallel Programming

Presentation Transcript

Parallel Programming

PARALLEL programming

Parallel Programming

Parallel Programming

Parallel Programming

Parallel Programming

Parallel Programming

Parallel Programming

Parallel Programming

Aspects of practical parallel programming Parallel programming models Data parallel

Parallel Programming

The basic of parallel programming

Parallel Programming

Parallel Programming

Parallel Programming

Parallel Programming