220 likes | 355 Vues
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources. Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle US Inc K orea Advanced Institute of Science and Technology I nformation Sciences Institute/University of Southern California
E N D
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle US Inc Korea Advanced Institute of Science and Technology Information Sciences Institute/University of Southern California Sungkyunkwan University
Overview • Motivation • Background • Pegasus • Virtual Grid • Pegasus-VG Proxy • Conclusion • Discussion
Motivation • Challenges in scientific application development • Data/control flow, task scheduling, data replication, fault-tolerance, etc • Challenges in resource management • Availability, performance, cost, reliability, fault-tolerance, etc • How to leverage existing cyber infrastructures for easy and efficient scientific computing?
Separations of Concerns • Application domain • Workflow management: application management can be conducted independently of target execution environments. • E.g.) Pegasus, Askalon, Triana • Resource domain • Resource provisioning: resource management can be encapsulated underneath abstractions or virtualizations • E.g.) Virtual Grid, virtual cluster, cloud
Pegasus • A framework for workflow planning and execution • Workflow lifecycle • Design: describe the data/control flows of application via an abstract workflow • Planning: map the workflow tasks onto physical resources • Execution: schedule and run the workflow tasks on the mapped resources
Pegasus Workflow Management Abstract workflow Condor pool Pegasus mapper Executable workflow Pegasus Condor DAGman Monitoring Information provenance tasks Condor Monitoring Information provenance tasks Computing environment
Virtual Grid • A programmable virtualized resource provisioning framework • Components • vgDL (Virtual Grid Description Language) • Specifies resource requirements • vgES (Virtual Grid Execution System) • Compiles and coordinates resources • PC (Personal Cluster) • Provides uniform job management
Application A A B D vgdl=clusterof (node) [2] { node = [Processor==“P4”] } B C C D program run Virtual Grid Resource Abstraction VGDL VG P4 P4 Classification Selection Binding Environment PBS ok Timeshare VG Lease Timeshare Batch
Pegasus on Virtual Grid • Scope • A basic integration for workflow planning and execution over provisioned resources • Issues • Resource capacity estimation • Resource specification (vgDL) synthesis for Virtual Grid • Resource information publication • Site catalog generation for Pegasus
Resource Capacity Estimation • What Virtual Grid expects from Pegasus • vgDL description • Available information • Task execution time, data transfer time, performance metrics, minimum memory capacity, cost, deadline, etc • Unknown information • # of virtual processors • Resource capacity estimate • Minimize the # of processors that can execute a workflow within a deadline
BTS (Balanced Time Scheduling) p2 p1 1 ID ET 1 1 1 2 3 2 5 Time 2 3 4 5 3 2 4 4 2 5 1 5 6 6 1 6 How many processors do we need to run this workflow within 7 units? Ref: E-science’08 E.-K. Byun, Y.-S. Kee et. al
Example • Execution time of each task - Xeon processor • Data transfer time - network with 1Gbs bandwidth. • Deadline is 1 hour. f.input preprocess Diamond = ClusterOf [2] (nd) [, 0:30:00] { nd = [Processor == “Xeon”] } findrange findrange analyze f.output
Resource Information Publication • What Pegasus expects from Virtual Grid • Site catalog • Virtual Grid • VG instance • Resource information publication • Devirtualize a VG instance and generate a site catalog for Pegasus
Application A A B D vgdl=clusterof (node) [2] { node = [Processor==“P4”] } B C C D program run Virtual Grid Resource Abstraction VGDL VG P4 P4 Classification Selection Binding Environment PBS ok Timeshare VG Lease Timeshare Batch
Personal Cluster • A partition of resources dedicated to a user under the control of a user-level resource manager during a limited time period GT4/PBS GT4/PBS Ref: HCW’08 Y.-S. Kee and C. Kesselman
Site Catalog Publication <sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" …> … <profile namespace="env" key="PEGASUS_HOME">/home/globus/pegasus-2.1.0</profile> <profile namespace="condor" key="grid_type">gt4</profile> <profile namespace="condor" key="jobmanager_type">PBS</profile> <lrc url="rlsn://cat7.kaist.ac.kr" /> <gridftp url="gsiftp://cat7.kaist.ac.kr:2811" storage="/home/globus" major="4" minor="0" patch="7" /> <jobmanager universe="transfer" url="https://cat7.kaist.ac.kr:9000/wsrf/services/ManagedJobFactoryService" major="4" minor="0" patch="7" total-nodes="2" /> <jobmanager universe="vanilla" url="https://cat7.kaist.ac.kr:9000/wsrf/services/ManagedJobFactoryService" major="4" minor="0" patch="7" total-nodes="2" /> <workdirectory>$HOME/workdir</workdirectory> </site> … </sitecatalog>
Workflow Planning over Provisioned Resources Pegasus VG-Pegasus Proxy Virtual Grid Abstract workflow BTS Creation VGDL A vgdl = ClusterOf (nd) [2] { nd = [Proc==“Xeon”] } B C C C D Planning A Site catalog VG B C C C D Scheduling/ Execution GT4+PBS Devirtualization Executable workflow
Conclusion • Pegasus on Virtual Grid • Implements workflow planning and execution over on-demand captive resources • Enables easy and efficient application development and execution • Issues • Resource capacity estimation • Site catalog publication
Discussion • Effective performance • What is the cost that a user has to pay to have a successful execution? • Ongoing studies • Find-grain planning for resource provisioning • Performance, cost, reliability • Workflow execution for virtualization • Recovery of failed tasks
Need More Information? • Pegaus • http://pegasus.isi.edu • VGrADS • Tuesday, 11:30am, RENCI booth (2633) • Wednesday, noon, GCAS booth (285) • Wednesday, 2:00Pm, SDSC booth (568) • Wednesday, 4:00pm, RENCI booth (2633)
Q & Q U E S T I O N S A N S W E R S A