1 / 34

David A. Bader Electrical & Computer Engineering Department

OPAL : Open Source Parallel Algorithm Library Designing High-Performance Algorithms for SMP Clusters. David A. Bader Electrical & Computer Engineering Department Albuquerque High Performance Computing Center University of New Mexico dbader@eece.unm.edu http://hpc.eece.unm.edu/.

quang
Télécharger la présentation

David A. Bader Electrical & Computer Engineering Department

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OPAL: Open Source Parallel Algorithm LibraryDesigning High-Performance Algorithms for SMP Clusters David A. Bader Electrical & Computer Engineering Department Albuquerque High Performance Computing Center University of New Mexico dbader@eece.unm.edu http://hpc.eece.unm.edu/

  2. High-Performance Applications using SMP Clusters • Long-term Earth science studies using terascale remotely-sensed global satellite imagery (4 km AVHRR GAC) • Computational Ecological Studies: Self-Organization of Semi-Arid Landscapes: Test of Optimality Principles • Computational Bioinformatics: Large Scale Phylogeny Reconstruction High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  3. Research Collaborators • Joseph JáJá, University of Maryland • Bernard Moret, CS (Experimental Algorithmics), University of New Mexico • Bruce Milne, Biology (Landscape Ecology), University of New Mexico • Tandy Warnow, CS, University of Texas-Austin • IBM ACTC Group (David Klepacki, John Levesque, and others) • Current Graduate Students: • Mi Yan, Niranjan Prabhu, Vinila Yarlagadda • Laboratory Alumni: • Kavita Balakavi (Intel), Ajith Illendula (Intel) High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  4. Acknowledgment of Support • NSF CISE Postdoctoral Research Associate in Experimental Computer Science No. 96-25668 • NSF BIO Division of Environmental Biology DEB 99-10123 • Department of Energy Sandia-University New Assistant Professorship Program (SUNAPP) Award AX-3006 • IBM SUR Grant (UNM Vista-Azul Project ) • NPACI/SDSC and NCSA/Alliance • NSF 00-* Algorithms for Irregular Discrete Computations on SMPs High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  5. Outline • Motivation • SMP Cluster Programming (SIMPLE) • Complexity model • Message-Passing • Shared-Memory • OPAL Facets (parallel libraries) • OPAL Setting (programming framework) • Example SMP Algorithms High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  6. Motivation • High performance computing has been leveraging COTS workstation technologies • Commodity microprocessors • High-performance networks • Operating system and compiler technology • Symmetric multiprocessor (SMP) • Hardware support for hierarchical memory management • Multithreaded operating system kernels • Optimizing compilers and runtime systems High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  7. LLNL ASCI White IBM SP (512x16) UNM/Alliance LosLobos IBM Netfinity(256x2) UNM/Alliance Roadrunner Linux SuperCluster (64x2) SMP Cluster Architectures • IBM SP (NPACI Blue Horizon 144x8) • Linux Clusters • Compaq AlphaServers (PSC/NSF Terascale 682x4) • Sun Ultra HPC (4x64) High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  8. Message-Passing Performance High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  9. Shared-Memory Performance • One Sun HPC E10K processor • Contiguous array; each element read exactly once • C, X = cyclic read (stride X) of contiguous array • R = random access of array High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  10. High Performance Algorithms for SMP Clusters • “SIMPLE” Model • Use a hybrid, natural combination of message-passing and shared-memory • Message passing interface between nodes • Shared-memory programming (OpenMP, POSIX Threads) on each SMP node • Methodology for adapting message-passing algorithms for SMP Clusters • Freely-available open source implementation of parallel algorithms, libraries, and programming environment, for C/C++/Fortran with GNU Public License (GPL) High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  11. Optimizing from MPI to SIMPLE (Regular or Irregular Algorithms) • Similar Single-Program Multiple-Data (SPMD) paradigm • Replace multiple MPI tasks per node with a single task and multiple shared-memory threads • Parallelize sequential work into equivalent shared-memory algorithms • Replace MPI communication primitives with corresponding “SIMPLE” primitives High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  12. Portability: Access from User Space High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  13. Parallel Complexity Models High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  14. SIMPLE Complexity ModelMessage Passing Primitives High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  15. PRAM (theory) O(n) processors Global clock Synchronous shared-memory Unit cost for computation or memory access Ideal Read/Write models (EREW, CREW, CRCW) SMP (practice) “P” processors (2 to 64) Asynchronous lock-step operation Uniform memory access to main memory (< 600 ns), faster access to local cache (10-40 ns) Cache-coherency at external caches Contention for shared memory Comparison of PRAM to SMP High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  16. OPAL Complexity Model • SMP Complexity model motivated by Helman and JáJá, Ramachandran • Complexity given by the triplet (MA, ME, TC) • MA is the number of memory accesses, • ME is the maximum volume of data exchanged between any processor and memory, • TC is the computational complexity. High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  17. Common Primitives Read/Write Replicate Barrier Scan Reduce Broadcast Allreduce Techniques Pointer-jumping Balanced Trees (Prefix-Sums) Symmetric Breaking (3-Coloring) Parallel Prefix (List Ranking) Graph Algorithms Spanning Tree Euler Tour Tree Functions Ear Decomposition Combinatorics Sorting Selection Bioinformatics (Minimum Evolution) Phylogeny Trees Computational Genomics: Breakpoints, Inversions, Translocations OPAL Facets High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  18. SMP Complexity ModelSMP Node Primitives • Read/Write • Replicate • Barrier • Scan • Reduce • Broadcast • Allreduce • Etc. • SMP Complexity model motivated by Helman and JáJá • Complexity given by the triplet (MA, ME, TC) • MA is the number of memory accesses, • ME is the maximum volume of data exchanged between any processor and memory, • TC is the computational complexity. High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  19. OPAL Setting:Programming Environment High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  20. Local Context Parameters for Each Thread High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  21. Control Primitives High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  22. Memory Management Primitives High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  23. Example Application: Radixsort • Stable sort of n integers spread evenly across a cluster of p shared-memory r-way nodes • Decompose b-bit keys into -bit digits • Perform b/ passes of counting sort on digits (LSD  MSD) • Counting Sort • Compute histogram of local keys • Communicate: Alltoall primitive of histograms • Locally compute prefix-sums of histograms • Communicate: (Inverse) Alltoall of prefix-sums • Rank each local element • Perform a personalized communication (1-relation) rearranging elements into sorted order High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  24. High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  25. High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  26. Execution Time of Radix Sort on an SMP Cluster High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  27. SMP Example: Ear Decomposition • Ear decomposition • Partitions the edges of a graph, useful in parallel processing • “Like peeling the layers of an onion” • Applied to scientific computing problems • Computational mechanics (structural rigidity) • Computational biology (molecular structure, atoms in DNA chains) • Computational fluid dynamics • Similar to other parallel algorithms for combinatorial problems • Trivial and fast sequential algorithm • Efficient PRAM algorithm • But no known practical, parallel algorithm High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  28. Ear Decomposition Example Input Output Ears n = number of vertices m = number of edges Spanning Tree High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  29. Message Passing: Spanning Tree Ear Decomposition Shared Memory: Spanning Tree Ear Decomposition Sequential Complexity: Ear Decomposition Complexities High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  30. Comparison of Ear Decomposition Algorithms High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  31. Performance of SMP Ear Decomposition on a Variety of Input Graphs n = 8192 High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  32. SMP Ear Decomposition Algorithms High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  33. Conclusions • New hybrid model for SMP Clusters • Open Source Parallel Algorithm Library (OPAL) • High-Performance methodology • Fastest known algorithms on SMPs and SMP clusters • Preliminary experimental results High Performance Algorithms for SMP Clusters, Prof. David A. Bader

  34. Future Work • Algorithms for SMP Clusters • Validate complexity model • Identify classes of efficient algorithms • Library of SMP algorithms • Methodology for algorithm-engineering • Clusters of Heterogeneous SMP Nodes • Varying node sizes • Nodes from different vendors & architectures • Hierarchical clusters of SMPs • Scientific Applications • Bioinformatics and Genomics • Landscape Ecology and Remote Sensing • Computational Fluid Dynamics High Performance Algorithms for SMP Clusters, Prof. David A. Bader

More Related