300 likes | 311 Vues
This presentation explores advanced concepts in MPI, including performance measurements, point-to-point communications, datatypes, communicators, collective operations, and MPI-2 features. It also discusses the portability and tool-friendliness of MPI, as well as tuning MPI programs for peak performance.
E N D
Advanced MPI William D. GroppRusty Lusk and Rajeev ThakurMathematics and Computer Science DivisionArgonne National Laboratory
Outline • Introduction and review of MPI concepts • Performance measurements in MPI: methods and pitfalls • MPI Point-to-point communications • MPI Datatypes • Communicators and Libraries • Collective Operations • MPI-2 Features
Outline • Background • The message-passing model • Origins of MPI and current status • Sources of further MPI information • Basics of MPI message passing • Hello, World! • Fundamental concepts • Simple examples in Fortran and C • Extended point-to-point operations • non-blocking communication • modes
Outline (continued) • Advanced MPI topics • Collective operations • More on MPI datatypes • Application topologies • The profiling interface • Toward a portable MPI environment
MPI is Simple • Many parallel programs can be written using just these six functions, only two of which are non-trivial: • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV
Alternative set of 6 Functions for Simplified MPI • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_BCAST • MPI_REDUCE
Toward a Portable MPI Environment • MPICH is a high-performance portable implementation of MPI (1). • It runs on MPP's, clusters, and heterogeneous networks of workstations. • In a wide variety of environments, one can do: configure make mpicc -mpitrace myprog.c mpirun -np 10 myprog upshot myprog.log to build, compile, run, and analyze performance.
MPI is Tool-Friendly • The MPI profiling interface can be used to write portable performance-analysis tools that interact with any MPI implementation. • Upshot is one such tool:
Still Not Covered • Process topologies • Creating groups and communicators • Attributes • Persistent requests
How Big is MPI? • MPI is large (MPI-1 contains about 125 calls). • MPI’s extensive functionality requires many functions • The number of functions is not necessarily a measure of complexity. • MPI is small • (Many useful programs can be written with just 6 of them). • MPI is just right • One can access flexibility when it is required. • One need not master all parts of MPI to use it.
A Final Point • MPI provides an extensive specification for message-passing programs and libraries. • Many issues required for writing portable parallel libraries have been addressed. • Efficient implementations have made it possible for library developers to write efficient, portable code for others to use. • End users may increasingly find that libraries, rather than explicit message-passing code, will be the key to developing applications.
Tuning MPI Programs for Peak Performance William Gropp Ewing Lusk Argonne National Laboratory
Outline • Goals of the Tutorial • Background assumptions • How message passing works (protocols) • How protocols relate to MPI calls • Performance modeling, measurements, and tools • Diagnosing and understanding performance problems • Vendor-specific issues • MPI-2
Assumptions and Background We assume you have some familiarity with • Various MPI send/receive modes • Elementary collective operations • MPI datatypes
Performance Modeling, Measurements and Tools • Basic Model • Needed to evaluate approaches • Must be simple • Synchronization delays • Main components • Latency and Bandwidth • Other effects on performance • Understand deviations from the model
Including Contention • Lack of contention greatest limitation of latency/bandwidth model • Hyperbolic model of Stoica, Sultan, and Keyes provides a way to estimate effects of contention for different communication patterns; see ftp://ftp.icase.edu/pub/techreports/96/96-34.ps.Z
Other Impacts on Performance • Contention • In the network • At the processors • Memory Copies • Packet sizes and stepping
Diagnosing and understanding performance problems • Memory Copies and MPI datatypes • Effect of message packetization • Synchronization delays • Unexpected hot spots and premature synchronization • Polling and Interrupt style MPI implementations • Effect of contention • Choosing between MPI alternatives
Memory copies • Memory copies are the primary source of performance problems • Cost of non-contiguous datatypes • Single processor memcpy is often much slower than the hardware.Measured memcpy performance:
Example: Performance Impact of Memory Copies • Assume n bytes sent eagerly (and buffered) • s + r n + c n • Rendezvous, not buffered • s + s + (s + r n) • Rendezvous faster if s < cn/2 • Assumes no delays in responding to rendezvous control information
Summary • Achieving peak performance requires a model of how message-passing systems are implemented • MPI exposes message-passing semantics to give programmers more control • Experimentation is necessary to understand performance • Tools are available • Last word • Defer Synchronization!
Logging and Visualization Tools • Upshot and MPE tools • VT • Pablo, Paragraph, and Paradyn • Other vendor tools • Validation by running with coarse-grain logging
Upshot and MPE • Automatic logging • Uses PMPI interface and a special library • mpicc -mpilog … • mpirun -np 8 a.out …. • User-directed logging • MPE_Log_event calls inserted by user • MPE_Describe_state defines user state • states may be nested • Works with MPICH and vendor MPIs
Validating the logging • Logging introduces some timing differences • Can change the behavior of the computation (and not all can be filtered out, for example, if the presence of logging causes different protocols to be used) • Compare coarser grain timings to check that the detailed logging did not change the behavior of the program
Deficiency analysis/filtering techniques • Only interested in • taking too long, and • too slow (compared to model) • Filter logs with thresholds for both; use this to further instrument • Interesting research area • P3T is an example tool