1 / 13

Scaling Parallel Applications

Mukesh Agrawal. Scaling Parallel Applications. Introduction. Parallel systems are ccNUMA ...so is ccNUMA useful? How much faster is it? How can we make it faster? How hard is it?. ccNUMA (review). Multiple processors Private physical memories Shared address space

Télécharger la présentation

Scaling Parallel Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mukesh Agrawal Scaling Parallel Applications

  2. Introduction • Parallel systems are ccNUMA • ...so is ccNUMA useful? • How much faster is it? • How can we make it faster? • How hard is it?

  3. ccNUMA (review) • Multiple processors • Private physical memories • Shared address space • Hardware support for cache coherence

  4. Scenario • Scientific computation problems (SPLASH-2) • Metric: • Simulation study (simulate Stanford FLASH) • Experimental study (SGI Origin 2000, 128 proc)

  5. Efficiency and Size • What is the smallest problem instance to achieve 60% efficiency? • Why might this be a bad metric?

  6. Efficiency and Size • What is the smallest problem instance to achieve 60% efficiency? • Why might this be a bad metric? • Assumes more efficiency for larger instances • May not happen if data is laid out poorly (cache usage) • Why might larger instances run more efficiently?

  7. Efficiency and Size • What is the smallest problem instance to achieve 60% efficiency? • Why might this be a bad metric? • Assumes more efficiency for larger instances • May not happen if data is laid out poorly (cache usage) • Why might larger instances run more efficiently? • Better communication/computation ratio (nearest neighbor) • Less load imbalance (less waiting for others) • Cache capacity (many misses on uniprocessor) • Cache sharing (small problem may share lines)

  8. Efficiency and Size (results) • Depends on problem • For some, efficiency on reasonable sizes (Barnes-Hut) • Others never efficient (Radix) • Experiments show: reality requires larger instances than simulation

  9. Efficiency and Structure • Can we get higher efficiency on small instances by modifying computation structure? • What might we try?

  10. Efficiency and Structure • Can we get higher efficiency on small instances by modifying computation structure? • What might we try? • Reduce communication! • Algorithmic changes • Cache management (keep remote data in cache) • Static partitioning

  11. Efficiency and Structure • Can we get higher efficiency on small instances by modifying computation structure? • What might we try? • Reduce communication! • Algorithmic changes • Cache management (keep remote data in cache) • Static partitioning • Most programs can scale after restructuring • Bonus: changes for ccNUMA often help with SVM (cluster) systems as well

  12. Programming Guidelines • Partition statically; optimize for locality • Load balance should not be compromised • Separate partitions, avoid write sharing

  13. Conclusion • ccNUMA can deliver scalable performance for scientific computation • Restructuring program usually required • ccNUMA and SVM machines need similar program mods • Simulator good for qualitative questions; not so good for quantitative

More Related