1 / 23

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing. Rahul .S. Sampath May 9 th 2007. Computational Power Today…. Floating Point Operations Per Second (FLOPS). Humans doing long division: Milli-flops (1/1000th of one flop)

lyle
Télécharger la présentation

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul .S. Sampath May 9th 2007

  2. Computational Power Today…

  3. Floating Point Operations Per Second (FLOPS) • Humans doing long division: Milli-flops (1/1000th of one flop) • Cray-1 supercomputer, 1976, $8m: 80 MFLOPS • Pentium II, 400 mhz: 100 MFLOPS • TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS • Sony Playstation 3, 2006: 2 TFLOPS • IBM TRIPS, 2010 (one-chip solution, CPU only): 1 TFLOPS • IBM Blue Gene, < 2010 (with 65,536 microprocessors): 360 TFLOPS

  4. Why do we need more? • "DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, 1980. • "640k ought to be enough for anybody"--Bill Gates, 1981. • Bottom-line: Demand for computational power will continue to increase.

  5. Some Computationally Intensive Applications Today • Computer Aided Surgery • Medical Imaging • MD simulations • FEM simulations with > 10^10 unknowns • Galaxy formation and evolution • 17 million particle Cold Dark Matter Cosmology simulation

  6. Any application, which can be scaled up should be treated as a computationally intensive application.

  7. The Need for Parallel Computing • Memory (RAM) • There is a theoretical limit on the RAM that is available on your computer. • 32 bit systems: 4GB (2^32) • 64 bit systems: 16 exabytes (> 16,000 TB) • Speed • Upgrading microprocessors can’t help you anymore  • Flops is not the bottleneck, memory is • What we need is more registers • Think pre-computing, higher bandwidth memory bus, L2/L3 cache, compiler optimizations, assembly language  Asylum  • Or… • Think parallel…

  8. Hacks • If Speed is not an issue… • Is out-of-core implementation an option? • Parallel programs can be converted into out-of-core implementations easily.

  9. Parallel Algorithms

  10. The Key Questions • Why? • Memory • Speed • Both • What kind of platform? • Shared Memory • Distributed Computing • Typical size of the application • Small (< 32 processors) • Medium ( 32 - 256 processors) • Large (> 256 processors) • How much time and effort do you want to invest? • How many times will the component be used in a single execution of the program?

  11. Factors to Consider in any Parallel Algorithm Design • Give equal work to all processors at all times • Load Balancing • Give equal amount of data to all processors • Efficient Memory Management • Processors should work independently as much as possible • Minimize communication, especially iterative communication • If communication is necessary, try to do some work in the background as well • Overlapping communication and computation • Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible • Optimal Work Algorithm

  12. Difference Between Sequential and Parallel Algorithms • Not all data is accessible at all times • All computations must be as localized as possible • Can’t have random access • New dimension to the existing algorithm – division of work • Which processor does what portion of the work? • If communication can not be avoided • How will it be initiated? • What type of communication? • What are the pre-processing and post-processing operations? • Order of operations could be very critical for performance

  13. Parallel Algorithm Approaches • Data-Parallel Approach • Partition the data among the processors • Each processor will execute the same set of commands • Control-Parallel Approach • Partition the tasks to be performed among the processors • Each processor will execute different commands • Hybrid Approach • Switch between the two approaches at different stages of the algorithm • Most parallel algorithms fall in this category

  14. Performance Metrics • Speedup • Overhead • Scalability • Fixed Size • Iso-granular • Efficiency • Speedup per processor • Iso-Efficiency • Problem size as a function of p in order to keep efficiency constant

  15. The Take Home Message • A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm. • What model to use? – Problem dependent. • e.g. a+b+c+… = (a+b) + (c+d) + … • Not much choice really. • It is a big investment, but can really be worth it.

  16. Parallel Programming

  17. How does a parallel program work? • You request a certain number of processors • You setup a communicator • Give a unique id to each processor – rank • Every processor executes the same program • Inside the program • Query for the rank and use it decide what to do • Exchange messages between different processors using their ranks • In theory, you only need 3 functions: Isend, Irecv, wait • In practice, you can optimize communication depending on the underlying network topolgoy – Message Passing Standards…

  18. Message Passing Standards • The standards define a set of primitive communication operations. • The vendors implementing these on any machine are responsible to optimize these operations for that machine. • Popular Standards • Message Passing Interface (MPI) • Open Message Passing (OpenMP)

  19. Languages that support MPI • Fortran 77 • C/C++ • Python • Matlab

  20. MPI Implementations • MPICH • ftp://info.mcs.anl.gov/pub/mpi • LAM • http://www.mpi.nd.edu/lam/download • CHIMP • ftp://ftp.epcc.ed.ac.uk/pub/chimp/release • WinMPI (Windows) • ftp://csftp.unomaha.edu/pub/rewini/WinMPI • W32MPI (Windows) • http://dsg.dei.uc.pt/wmpi/intro.html

  21. Open Source Parallel Software • PETSc ( Linear and NonLinear Solvers ) • http://www-unix.mcs.anl.gov/petsc/petsc-as/ • ScaLAPACK ( Linear Algebra ) • http://www.netlib.org/scalapack/scalapack_home.html • SPRNG ( Random Number Generator ) • http://sprng.cs.fsu.edu/ • Paraview ( Visualization ) • http://www.paraview.org/HTML/Index.html • NAMD ( Molecular Dynamics ) • http://www.ks.uiuc.edu/Research/namd/ • CHARMM++ ( Parallel Objects ) • http://charm.cs.uiuc.edu/research/charm/

  22. References • Parallel Programming with MPI, Peter S. Pacheco • Introduction to Parallel Computing, A. Grama, A. gupta, G. Karypis, V. Kumar • MPI-The Complete Reference, William Gropp et.al. • http://www-unix.mcs.anl.gov/mpi/ • http://www.erc.msstate.edu/mpi • http://www.epm.ornl.gov/~walker/mpi • http://www.erc.msstate.edu/mpi/mpi-faq.html (FAQ) • Comp.parallel.mpi (Newsgroup) • http://www.mpi-forum.org (MPI Forum)

  23. Thank You

More Related