210 likes | 357 Vues
Adaptive Computing on the Grid Using AppLeS. Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen, Neil Spring, Alan Su, and Dmitrii Zagorodnov.
E N D
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman,Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen,Neil Spring, Alan Su, and Dmitrii Zagorodnov IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 5, May 2003
Agenda • Introduction • Problems • AppLeS and its components • Result products • Related works • Discussions • Conclusions
Introduction • What is a Grid? • A collection of resources that can be used as an ensemble • What are resources? • Computational devices, networks, online instruments, storage archives, and etc
Problems • Heterogeneity • Different performance • Inconsistentcy • Shared • Fail • Upgraded
AppLeS Project • Application Level Scheduling • Goals • Investigate adaptive scheduling for Grid computing • Apply research results to applications for validating the efficacy of the approach and extracting Grid performance for the end-user
Steps (6) ScheduleAdaptation (1) ResourceDiscovery (2) ResourceSelection (3) ScheduleGeneration (4) ScheduleSelection (5) ApplicationExecution
Resource Discovery • Depend on the Grid • A List of user’s logins • Resource discovery services of each Grid
Resource Selection • Simple SARA • Synthetic Aperture Radar Atlas • Developed by JPL and SDSC • Provide access to satellite images distributed in various repositories • End-to-end available bandwidth is predicted using NWS
Performance Modeling i,j-1 i-1,j i,j i+1,j • Jacobi 2D • Main loop • Loop until convergence • For all matrix entries Ai,j • Ai,j = ¼(Ai,j + Ai+1,j + Ai-1,j + Ai,j+1 + Ai,j-1) • Compute local error • Model • Ti = Areai * Operi * AvailCPUi + Ci ; 1 <= I <= p i,j+1 Area - the size of the strip, Oper - execution time to compute one entry AvailCPU - percentage of available CPU, C - Communication time
Scheduling Generation • Complib • A computational biology application • Compare a library of unknown sequences against a database of “known” sequences using FASTA scoring method • Parallization • Master/Worker • Work size • Small unit size (Self-scheduling) - high overhead • Big unit size - load imbalance
Scheduling Adaptation • MCell • A computational neuroscience application • Study biochemical interactions within living cells at molecular level • Multiple independent tasks • Shared input
XSufferage • Based on Sufferage • Sufferage value = second best - first best • XSufferage concerns data replication time (zero for locally available)
Outcome • APST - AppLeS Parameter Sweep Template • AMWAT - AppLeS Master/Worker Application Template • SA - Supercomputer AppLeS
APST • Parameter Sweep Applications • Mostly independent • Provide • Transparent deployment • Automatic scheduling • Capabilities • Launching tasks • Moving and storing data • Discovering and monitoring resources
AMWAT • Master/Worker • Provide • APIs for • Discovering • Scheduling • Predicting SS - Self-Scheduling FSC - Fixed Size Chunking GSS - Guided Self-Schduling TSS - Trapezoidal Self-Scheduling FAC2 - Factoring
SA • Space-shared • Moldable jobs • Reduce response times
Related Works • Environment • MARS and Dome - Run-time checkpointing environment • Structure • MARS - SPMD • VDCE and SEA - Task graph • IOS - Real-time, fine-grained, task graph • Dome and SPP - Abstract language • Dome - SPMD • SPP - Task graph • Performance model • Depend on program structure • Objective • Minimize execution time
Discussions • Performance of distributed applications depend on both application and platform-specific information • Storage and service are usually separated • Communication must be concerned in the model • Multi-applications environment has not been addressed
Conclusions • AppLeS • An application-level scheduling framework • Provide adaptive, flexible, and reusable components • being integrated into GrADS for building next generation Grid applications • Each part has been demonstrated its improvement