230 likes | 351 Vues
Parallel programming has become mainstream with the rise of multi-core and multi-threaded processors. As cores near their performance limits, new applications demand higher performance, challenging developers to write efficient parallel programs. This presentation explores critical aspects of parallel programming, such as threads, locks, and various programming models like OpenMP, MPI, and data-parallel languages like CUDA and OpenCL. It also discusses the complexities of shared memory, the significance of maintaining state invariants, atomic updates, and the role of pure functional languages in avoiding data races, ultimately highlighting the evolving landscape of parallel programming practices.
E N D
FT07 The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation
Parallel Computing is Now Mainstream • Cores are reaching performance limits • More transistors per core just makes it hot • New processors are multi-core • and maybe multithreaded as well • Uniform shared memory within a socket • Multi-socket may be pretty non-uniform • Logic cost ($ per gate-Hz) keeps falling • New “killer apps” will doubtless need more performance • How should we write parallel programs?
Parallel Programming Practice Today • Threads and locks • SPMD languages • OpenMP • Co-array Fortran, UPC, and Titanium • Message passing languages • MPI, Erlang • Data-parallel languages • Cuda, OpenCL • Most of these are pretty low level
Higher Level Parallel Languages • Allow higher level data-parallel operations • E.g. programmer-defined reductions and scans • Exploit architectural support for parallelism • SIMD instructions, inexpensive synchronization • Provide for abstract specification of locality • Present a transparent performance model • Make data races impossible For the last item, something must be done about unrestricted use of variables
Shared Memory is not the Problem • Shared memory has some benefits: • Forms a delivery vehicle for high bandwidth • Permits unpredictable data-dependent sharing • Provides a large synchronization namespace • Facilitates high level language implementations • Language implementers like it as a target • Non-uniform memory can even scale • But shared variables are an issue: stores do not commute with other loads or stores • Shared memory isn’t a programming model
Pure Functional Languages • Imperative languages do computations by scheduling values into variables • Their parallel dialects are prone to data races • There are far too many parallel schedules • Pure functional languages avoid data races simply by avoiding variables entirely • They compute new constants from old • Loads commute so data races can’t happen • Dead constants can be reclaimed efficiently • But no variables implies no mutable state
Mutable State is Crucial for Efficiency • To let data structures inexpensively evolve • To avoid always copying nearly all of them • Monads were added to pure functional languages to allow mutable state (and I/O) • Plain monadic updates may still have data races • The problem is maintaining state invariants • These are just a program’s “conservation laws” • They describe the legal attributes of the state • As with physics, they are associated with a certain generalized type of commutativity
Maintaining Invariants • Updates perturb, then restore an invariant • Program composability depends on this • It’s automatic for us once we learn to program • How can we maintain invariants in parallel? • Two requirements must be met: • Updates must not interfere with each other • That is, they must be isolatedin some fashion • Updates must finish once they start • …lest the next update see the invariant false • We say the state updates must be atomic • Updates that are both isolated and atomic are called transactions
Commutativity and Non-Determinism • If p and q preserve invariant I and do not interfere, their parallel execution { p || q } also preserves I† • If p and q are performed in isolation and atomically, i.e. astransactions, then they will not interfere‡ • Operations may not commute with respect to state • But we always get commutativity with respect to the invariant • This leads to a weaker form of determinism • Long ago some of us called it “good non-determinism” • It’s the non-determinism operating systems rely on †Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach. CACM 19(5), pp. 279−285, May 1976. ‡Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2), pp. 281−296, April 1984.
Example: Hash Tables • Hash tables implement sets of items • The key invariant is that an item is in the set iff its insertion followed all removals • There are also storage structure invariants, e.g. hash buckets must be well-formed linked lists • Parallel insertions and removals need only maintain the logical AND of these invariants • This may not result in deterministic state • The order of items in a bucket is unspecified
High Level Data Races • Some loads and stores can be isolated and atomic but cover only a part of the invariant • E.g. copying data from one structure to another • If atomicity is violated, the data can be lost • Another example is isolating a graph node while deleting it but then decrementing neighbors’ reference counts with LOCK DEC • Some of the neighbors may no longer exist • It is challenging to see how to automate data race detection for examples like these
Other Examples • Data bases and operating systems commonly mutate state in parallel • Data bases use transactions to achieve consistency via atomicity and isolation • SQL programming is pretty simple • SQL is arguably not general-purpose • Operating systems use locks for isolation • Atomicity is left to the OS developer • Lock ordering is used to prevent deadlock • A general purpose parallel language should easily handle applications like these
Implementing Isolation • Analysis • Proving concurrent state updates are isolated • Locking • Deadlock must be handled somehow • Buffering • Often used for wait-free updates • Partitioning • Partitions can be dynamic, e.g. as in quicksort • Serializing • These schemes can be nested
Isolation in Existing Languages • Static in space: MPI, Erlang • Dynamic in space: Refined C, Jade • Static in time: Serial execution • Dynamic in time: Single global lock • Static in both: Dependence analysis • Semi-static in both: Inspector-executor • Dynamic in both: Multiple locks
Atomicity • Atomicity means “all or nothing” execution • State changes must be all done or undone • Isolation without atomicity has little value • But atomicity is vital even in the serial case • Implementation techniques: • Compensating, i.e. reversing a computation “in place” • Logging, i.e. remembering and restoring the original state values • Atomicity is challenging for distributed computing and I/O
Exceptions • Exceptions can threaten atomicity • An aborted state update must be undone • What if a state update depends on querying a remote service and the query fails? • The message from the remote service should send exception information in lieu of the data • Message arrival can then throw as usual and the partial update can be undone
Transactional Memory • “Transactional memory” means transaction semantics within lexically scoped blocks • TM has been a hot topic of late • As usual, lexical scope seems a virtue here • Adding TM to existing languages has problems • There is a lot of optimization work to do • to make atomicity and isolation highly efficient • Meanwhile, we shouldn’t ignore traditional ways to get transactional semantics
Whence Invariants? • Can we generate invariants from code? • Only sometimes, and it is difficult even then • Can we generate code from invariants? • Is this the same as intentional programming? • Can we write invariants plus code and let the compiler check invariant preservation? • This is much easier, but may be less attractive • Can languages make it more likely that a transaction covers the invariant’s domain? • E.g. leveraging objects with encapsulated state • Can we at least debug our mistakes?
Conclusions • Functional languages with transactions enable higher level parallel programming • Microsoft is heading in this general direction • Efficient implementations of isolation and atomicity are important • We trust architecture will ultimately help support these things • The von Neumann model needs replacing, and soon
YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com
Learn More On Channel 9 • Expand your PDC experience through Channel 9. • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses. channel9.msdn.com/learn Built by Developers for Developers….