1 / 7

Dr. Michael McCool

Dr. Michael McCool. HPC practioners should know how to use large-scale computers efficiently  Productive generation of efficient and maintainable parallel programs Sufficient knowledge of computer architecture leading to ability to predict performance

deva
Télécharger la présentation

Dr. Michael McCool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dr. Michael McCool • HPC practioners should know how to use large-scale computers efficiently •  Productive generation of efficient and maintainable parallel programs • Sufficient knowledge of computer architecture leading to ability to predict performance • Be able to design efficient parallel algorithms • Have skills to implement these algorithms • Use known best practices patterns

  2. New Kinds of Errors Race conditions • At best: non-deterministic results complicate testing • At worst: Heisenbugs delay or block shipment of software products Deadlock • Pathological contention for resources can lock up system, and make it impossible to proceed • Combination with non-determinism tricky Both of these problems are very hard to track down in large systems. Thesis: It is better to avoid these problems by design.

  3. Challenge:Multiple Parallelism Mechanisms Modern computers have many kinds of parallelism: Pipelining SIMD within a register (SWAR) vectorization Superscalar instruction issue or VLIW Overlapping memory access with computation (prefetch) Simultaneous multithreading (hyperthreading) per core Multiple cores Multiple processors Asynchronous host and accelerator execution HPC adds: clusters, distributed memory, grid… How to design for performance? 3

  4. Main Factors Affecting Performance • Parallelism • Use/design/choose a “good” parallel algorithm • Large amount of latent parallelism, low serial overhead • Asymptotically efficient • Should scale to large number of processing elements • Locality • Efficient use of the memory hierarchy • More frequent use of faster local memory • Coherent use of memory and data transfer • Good alignment, predictable memory access • High arithmetic intensity

  5. Patterns A parallel pattern is a commonly occurring combination of task distribution and data access • Want to use/teach/learn “good” patterns that solve recurring problems • Many common programming models support either only a small number of patterns, or only low-level hardware mechanisms • So often common patterns implemented only as “conventions” • Thesis: About a dozen patterns, many of them deterministic, can support a wide range of applications

  6. Structured Serial Patterns The following patterns are the basis of “structured programming” for serial computation: • Sequence • Selection • Iteration • Recursion • Random read • Random write Compositions of these patterns can be used in place of unstructured mechanisms such as “goto.”

  7. Structured Parallel Patterns The following additional parallel patterns can be used for “structured parallel programming”: • Superscalar sequence • Speculativeselection • Map • Recurrence/scan • Reduce • Pack/expand • Nest • Pipeline • Partition • Stencil • Search/match • Gather • Merge scatter • Priority scatter • *Permutation scatter • !Atomic scatter

More Related