Grid Middleware for High Performance Computing

Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education and Research Centre (SERC) Indian Institute of Science (IISc) Bangalore - 560012 Workshop on HPC in India ATIP 1st Workshop on HPC in India @ SC-09

Grid Applications Research Lab • Grid and Parallel Computing with primary focus on • developing grid applications, • building strategies for checkpointing, migration, rescheduling, and fault-tolerance for parallel applications on grid systems, and • performance modeling of parallel applications on grids ATIP 1st Workshop on HPC in India @ SC-09

Motivation • Developing solutions for deployment and use of large-scale scientific applications on grids • Will result in exploration of large-sized problems and long-running applications ATIP 1st Workshop on HPC in India @ SC-09

Grid ApplicationsClimate Modeling CCSM • Enable efficient executions of long-running climate modeling simulations on grid systems with the objective of solving climate science problems • Community Climate System Model (CCSM) – a multi-component global general circulation model • Analyzed the benefits of executing different components with checkpointing and rescheduling in different batch systems of a grid with a novel execution model ATIP 1st Workshop on HPC in India @ SC-09

Grid ApplicationsClimate Modeling – General IdeaIJHPCA, FGCS Novel Execution Model • Job submission to a batch system incurs queue waiting time • Waiting time depends on processor requirements • How about decomposing a job into small subjobs with small processor requirements and submitting the subjobs to multiple batch systems of a grid? • Efficiency depends on effective system utilization using checkpointing, migration and rescheduling • Leads to 55% average increase in throughput ATIP 1st Workshop on HPC in India @ SC-09

Grid ApplicationsDNA Sequence Evolutions JPDC, escience 2009 Master-Worker Architecture for Analyzing Mutations • Predictions of future sequences in an evolutionary tree important for drug discovery, pharmaceutical research and disease control • Different ways of an ancestor sequence to transform to a progeny sequence • Formulated as a search-space exploration problem and used computational grids for explorationsof the huge space of possible mutations • Used popular mutations to predict future evolutionary paths. • Performed predictions for hiv sequences and other protein sequences • 40% better than random methods 40% Better Predictions ATIP 1st Workshop on HPC in India @ SC-09

Rescheduling • It is necessary to adapt application execution to grid resource and application dynamics • SRS – a checkpointing library for malleable applications • Can allow processor reconfiguration between migrations • Supports different data distributions, storage infrastructure, active migration and fault tolerance ATIP 1st Workshop on HPC in India @ SC-09

Resheduling Strategies • Given a parallel application consisting of multiple phases and given a set of resources, the problem is to derive a rescheduling plan • Where to execute the different phases and when to migrate/reschedule Application Phases Cluster-1 2 3 Interval 1 (t1) • To find {I1, I2, …,ILopt} such that Interval 2 (t2) is minimized where Lopt – number of intervals; ti – predicted execution time of each interval; rcost – rescheduling cost Interval 3 (t3) • Developed 3 novel algorithms for deriving a rescheduling plan • Incremental algorithm, division heuristic and genetic algorithm Interval i (ti) Division heuristic ATIP 1st Workshop on HPC in India @ SC-09

Rescheduling Strategies • Performed experiments with five large-scale multi-phase parallel applications • Molecular dynamics, n-body simulations, astrophysical gas dynamics, crack propagation, electromagnetics. Huge Benefits due to Rescheduling ATIP 1st Workshop on HPC in India @ SC-09

Performance ModelingJPDC,CPE Performance Model Accuracy for Parallel QR • It is imperative to automatically derive “knowledge” (performance characteristics) of applications • Can be used for effective mapping of applications to resources • Built techniques for automatically deriving performance model functions for predicting execution costs of parallel applications on grids • First effort to deal with load changes during application executions • Less than 30% modeling errors – best reported for non-dedicated systems • Have also developed novel scheduling algorithms that use the model functions • Generates 80% better schedules than existing approaches Scheduling Results Box Elimination (BE) [red bars] 50-80% more efficient! ATIP 1st Workshop on HPC in India @ SC-09 Scheduling Method

Grid Middleware • Created a grid middleware for parallel multi-phase applications with rescheduling capabilities • Have successfully run multi-phase applications on grid consisting of multiple batch and interactive clusters in two geographically distributed sites • Also created a grid middleware for multi-component applications for coordinating the executions of the components on the different systems Grid Middleware for Multi-Component Applications Grid Middleware for Multi-Phase Applications ATIP 1st Workshop on HPC in India @ SC-09

Other Research • Checkpointing Interval Selection • For efficient execution in the presence of failures • A Markov Model consisting of 3 kinds of states for performance prediction • Extensive simulations with 9-year real supercomputer failure traces on 8 parallel systems, 3 rescheduling policies, and 3 parallel applications • Our model’s checkpointing intervals lead to high amount of useful work by the applications in the presence of failures • Compiler-aided checkpointing instrumentation • A source-to-source precompiler for automatic insertion of checkpointing calls • Performs live-variable analysis for determining data and wrappers for finding data sizes • Can handle parallel applications with block-distribution (molecular dynamics) ATIP 1st Workshop on HPC in India @ SC-09

Summary • Primary endeavor to aid scientific advancement in different domain areas using grid systems • Grid research in two different application areas that resulted in significant application benefits using grids • Contributed novel scheduling and rescheduling algorithms, performance modeling strategies and robust grid middleware for use by scientific community ATIP 1st Workshop on HPC in India @ SC-09

Areas of Collaborations • Scalability of large-scale and peta applications • Fault tolerance in high performance systems • Setting up Indo-US grids • Grid middleware collaborations Thank You ATIP 1st Workshop on HPC in India @ SC-09

Grid Middleware for High Performance Computing

Grid Middleware for High Performance Computing

Presentation Transcript

Middleware for Grid Computing On Virtual Machines

Introduction to Grid Computing with High Performance Computing

High Performance and Grid Computing Group

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

Grid Computing Middleware

Grid-related High Performance Middleware and Laboratories

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

Grid Computing and Middleware

High Performance Cluster and Grid Computing

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking

High-Performance Grid Computing and Research Networking