Open MPI

Open MPI - A High Performance Fault Tolerant MPI Library Richard L. Graham Advanced Computing Laboratory, Group Leader (acting)

Overview • Open MPI Collaboration • MPI • Run-time • Future directions

Los Alamos National Laboratory (LA-MPI) Sandia National Laboratory Indiana University (LAM/MPI) The University of Tennessee (FT-MPI) High Performance Computing Center, Stuttgart (PACX-MPI) University of Houston Cisco Systems Mellanox Voltaire Sun Myricom IBM QLogic URL: www.open-mpi.org Collaborators

A Convergence of Ideas FT-MPI (U of TN) Open MPI LA-MPI (LANL) LAM/MPI (IU) PACX-MPI (HLRS) OpenRTE Fault Detection (LANL, Industry) FDDP (Semi. Mfg. Industry) Resilient Computing Systems Robustness (CSU) Autonomous Computing (many) Grid (many)

Components • Formalized interfaces • Specifies “black box” implementation • Different implementations available at run-time • Can compose different systems on the fly Caller Interface 1 Interface 2 Interface 3

Performance Impact

MPI

Two Sided Communications

P2P Component Frameworks

Shared Memory - Bandwidth

Shared Memory - Latency

IB PerformanceLatency

IB PerformanceBandwidth

GM Performance DataPing-Pong Latency (usec)

GM Performance DataPing-Pong Latency (usec) - Data FT

GM Performance DataPing-Pong Bandwidth

MX Ping-Pong Latency (usec)

MX Performance DataPing-Pong Bandwidth (MB/sec)

XT3 PerformanceLatency

XT3 PerformanceBandwidth

Collective Operations

MPI Reduce - Performance

MPI Broadcast - Performance

MPI Reduction - II

Open RTE

Open RTE - Design Overview Cluster Seamless, transparent environment for high-performance applications Grid • Inter-process communications within and across cells • Distributed publish/subscribe registry • Supports event-driven logic across applications, cells • Persistent, fault tolerant • Dynamic “spawn” of processes, applications both within and across cells Cluster Single Computer

Open RTE - Components Cluster UNIVERSE Grid Cluster Single Computer

General Purpose Registry • Cached, distributed storage/retrieval system • All common data types plus user-defined • Heterogeneity between storing process and recipient automatically resolved • Publish/subscribe • Support event-driven coordination and notification • Subscribe to individual data elements, groups of elements, wildcard collections • Specify actions that trigger notifications

Subscription Services • Subscribe to container and/or keyval entry • Can be entered before data arrives • Specifies data elements to be monitored • Container tokens and/or data keys • Wildcards supported • Specifies action that generates event • Data entered, modified, deleted • Number of matching elements equals, exceeds, is less than specified level • Number of matching elements transitions (increases/decreases) through specified level • Events generate message to subscriber • Includes specified data elements • Asynchronously delivered to specified callback function on subscribing process

Future Directions

Revise MPI Standard • Clarify standard • Standardized the interface • Simplify standard • Make the standard more “H/W Friendly”

Beyond Simple Performance Measures • Performance and scalability are important, but • What about future HPC systems • Heterogeneity • Multi-core • Mix of processors • Mix of networks • Fault-tolerance

Focus on Programmability • Performance and Scalability are important, but what about • Programmability

Open MPI - A High Performance Fault Tolerant MPI Library

Open MPI - A High Performance Fault Tolerant MPI Library

Presentation Transcript

MPI

MPI Program Performance

MPI

Building Algorithmically Nonstop Fault Tolerant MPI Programs

MPI

Open MPI Git Migration

Fault Tolerant MPI in High Performance Computing: Semantics and Applications

Open MPI - A High Performance MPI-2 Library

High-Performance, Low Fault-Tolerant Schools

Fault Tolerant MPI

MPICH- V : Toward a scalable fault tolerant MPI for V olatile nodes

Coordinated Checkpoint Versus Message Log For Fault Tolerant MPI

MPI

MPI

Performance Oriented MPI

A Multi-Protocols Fault Tolerant MPI

HARNESS and Fault Tolerant MPI

MPI

Open MPI

Open MPI Progress

MPI Program Performance