370 likes | 770 Vues
Workflow Management in Grids Ewa Deelman Center for Grid Technologies USC Information Sciences Institute People Involved Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Jaskaran Singh, Karan Vahi (ISI) Jim Blythe, Yolanda Gil (ISI) Anirban Mandal (Rice) Outline
E N D
Workflow Management in Grids Ewa Deelman Center for Grid Technologies USC Information Sciences Institute
People Involved • Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Jaskaran Singh, Karan Vahi (ISI) • Jim Blythe, Yolanda Gil (ISI) • Anirban Mandal (Rice)
Outline • The GriPhyN project and Grid Applications • Workflow Management in Grids • Pegasus, Planning for Execution in Grids • Framework Description • Generation of Executable Workflows • Based on abstract workflow descriptions • Based on application-level metadata • Application of Pegasus to the LIGO pulsar search • Future Research Directions
GriPhyN Data Grid Challenge • Provide a framework which enables Virtual Organizations around the world to perform computationally demanding analysis of large, geographically distributed datasets. • The Virtual Organizations are large and highly distributed • The datasets are large, currently on the order of Terabytes and expected to grow to the level of 100s of Petabytes in the next decade • Provide a seamless access to data: experimental raw data or processed data products • Enable a user/application to ask for any domain-specific data, whether computed or not Concept of Virtual Data
GriPhyN Project • 5-year, $12.5M NSF ITR proposal to realize the concept of virtual data • Four Applications: • ATLAS and CMS experiments at Large Hadron Collider at CERN • SDSS (Sloan Digital Sky Survey) • LIGO (Laser Interferometer Gravitational-wave Observatory) • Key research areas: • Virtual data technologies (information models, management of virtual data software, etc.) • Request planning and scheduling (including policy representation and enforcement) • Task execution (fault tolerance) • CS Participants: U.Chicago, USC/ISI, UW-Madison, UCSD, UCB, Indiana, Northwestern, Florida • Close collaboration and participation from the Physicists community
Applications • Increasing in the level of complexity • Use of individual application components • Reuse of individual intermediate data products • Description of Data Products using Metadata Attributes • Execution environment is complex and very dynamic • Resources come and go • Data is replicated • Components can be found at various locations or staged in on demand • Separation between • the application description • the actual execution description
Outline • The GriPhyN project and Grid Applications • Workflow Management in Grids • Pegasus, Planning for Execution in Grids • Framework Description • Generating Executable Workflows • Based on abstract workflow descriptions • Based on application-level metadata • Application of Pegasus to the LIGO pulsar search • Future Research Directions
Abstract Workflow Generation Concrete Workflow Generation
Generating an Abstract Workflow • Available Information • Specification of component capabilities • Ability to generate the desired data products Select and configure application components to form an abstract workflow • assign input files that exist or that can be generated by other application components. • specify the order in which the components must be executed • components and files are referred to by their logical names • Logical transformation name • Logical file name • Both transformations and data can be replicated
Generating a Concrete Workflow • Information • location of files and component Instances • State of the Grid resources • Select specific • Resources • Files • Adding jobs required to form a concrete workflow that can be executed in the Grid environment • Data movement • Data registration • Each component in the abstract workflow is turned into an executable job
Why Automate Workflow Generation? • Usability: Limit User’s necessary Grid knowledge • Monitoring and Directory Service • Replica Location Service • Transformation Catalog • Complexity: • User needs to make choices • Alternative application components • Alternative files • Alternative locations • The user may reach a dead end • Many different interdependencies may occur among components • Solution cost: • Evaluate the alternative solution costs • Performance • Reliability • Resource Usage • Global cost: • minimizing cost within a community or a virtual organization • requires reasoning about individual user’s choices in light of other user’s choices
Outline • The GriPhyN project and Grid Applications • Workflow Management in Grids • Pegasus, Planning for Execution in Grids • Framework Description • Generating Executable Workflows • Based on abstract workflow descriptions • Based on application-level metadata • Application of Pegasus to the LIGO pulsar search • Future Research Directions
Pegasus, Planning for Execution in Grids • performs the mapping from an abstract workflow to a concrete workflow, which can be executed on the Grid • isolates the user from many Grid details • automatically locates physical locations for both components (transformations) and data, via Globus RLS and the Transformation Catalog • finds appropriate resources to execute the components (via Globus MDS) • whenever several alternatives are possible (e.g., alternative physical files, alternative resources) it makes a random choice • publishes newly derived data products • reuses existing data products where applicable
Job c Job a Job b Job f Job d Job e Job g Job h Job i KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm • The output jobs for the Dag are all the leaf nodes i.e. f, h, i. • Each job requires 2 input files and generates 2 output files. • The user specifies the output location Abstract Workflow Reduction
KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm Optimizing from the point of view of Virtual Data Job c Job a Job b Job f Job d Job e Job g Job h Job i • Jobs d, e, f have output files that have been found in the Replica Location Service. • Additional jobs are deleted. • All jobs (a, b, c, d, e, f) are removed from the DAG.
KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm • Planner picks execution • and replica locations • Plans for staging data in Job c adding transfer nodes for the input files for the root nodes Job a Job b Job f Job d Job e Job g Job h Job i
transferring the output files of the leaf job (f) to the output location Staging data out and registering new derived products in the RLS Job c Job a Job b Job f Job d Job e Job g Job h Staging and registering for each job that materializes data (g, h, i ). Job i KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm
The final, executable DAG Input DAG Job g Job h Job c Job a Job b Job i Job f KEY The original node Input transfer node Registration node Output transfer node Job d Job e Job g Job h Job i
Pegasus’ Initial Solution • Reused existing data products • Used in a variety of complex applications • Astronomy • Bioinformatics • High-energy physics • Cluster of 25 dual-processor Pentium machines. • Computation: 7 days, 678 jobs with 250 events each • Produced ~ 200GB of simulated data. • Provided a feasible solution, but not necessarily a low-cost one • To improving the quality of the solution: • Need to efficiently search large problem space and apply both local and global optimizations • Need to compose various optimization strategies • Need to reuse the resource and component models.
AI Planning technologies • Provide broad-base, generic implementation foundation • Use techniques, such as backtracking, and domain-specific and domain-independent control rules • Allow the easy addition of new system constraints and rules • Incorporate optimality and policy into the search for solutions • Can integrate the generation of workflows across users and policies within VOs. • Collaboration with Yolanda Gil and Jim Blythe (ISI)
AI-based Abstract and Concrete Planner Goal (Provided by the User) • A metadata specification of the information the user requires and the desired location for the output file Initial State (Automatically Extracted) • Information about the state of the Grid • Information about data location Operators (encoded for the application domain) • Represent the execution of a component at a particular location and the generation a particular file(s) • A file movement across the network Search control rules (Grid or application specific) • Specify options that should be exclusively considered at any choice point in the search algorithm (execute “close” to the data
Techniques Used by the Planner • Uses search algorithms to explore the solution space • Uses local heuristics to choose between alternatives and prune the solution space • Execute “close” to the data • Global evaluation of alternative plans (user, community specific) • Performance • Reliability • Resource usage
Outline • The GriPhyN project and Grid Applications • Workflow Management in Grids • Pegasus, Planning for Execution in Grids • Framework Description • Generating Executable Workflows • Based on abstract workflow descriptions • Based on application-level metadata • Application of Pegasus to the LIGO pulsar search • Future Research Directions
LIGO Livingston LIGO Experiment(Laser Interferometer Gravitational-Wave Observatory) • Aims to detect gravitational waves predicted by Einstein’s theory of relativity. • Can be used to detect • binary pulsars • mergers of black holes • “starquakes” in neutron stars • Two installations: in Louisiana (Livingston) and Washington State • Other projects: Virgo (Italy), GEO (Germany), Tama (Japan) • Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum. • Data collected during experiments is a collection of time series (multi-channel) • Analysis is performed in time and Fourier domains
archive Interferometer Hz Time raw channels LIGO’s Pulsar Search(Laser Interferometer Gravitational-wave Observatory) Extract channel Short Fourier Transform transpose Long time frames 30 minutes Short time frames Single Frame Time-frequency Image Extract frequency range event DB Construct image Find Candidate Store
LIGO’s Problem Decomposition • Originally all the pulsar search conducted inside the LIGO Data Analysis System • Resources not sufficient to conduct all the desired analysis • The various components of the search, SFT, sSFT, pulsar search were identified and described in terms of metadata attributes • Subsets of the code were ported to the linux environment • Interface was created to LDAS to schedule jobs, stage data in and out and send the results of the analysis to the LIGO event database • Formed the application-level components • High level of interest and participation from the GEO experiment in Germany LIGO collaborators: Kent Blackburn, Albert Lazzarini (Caltech) Scott Koranda (UWM) and others GEO collaborators: Maria Alessadra Papa (Albert Einstein Institute, Germany), Alicia Sintes (Balearic Islands University, Spain)
LIGO’s pulsar search at SC 2002 • The pulsar search conducted at SC 2002 • Used LIGO’s data collected during the first scientific run of the instrument • Targeted a set of 1000 locations of known pulsar as well as random locations in the sky • Results of the analysis were be published via LDAS (LIGO Data Analysis System) to the LIGO Scientific Collaboration • performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee. Visualization by Marcus Thiebaux
SC 2002 demo Over 58 pulsar searches Total of 330 tasks 469 data transfers 330 output files produced. The total runtime was 11:24:35. To date 185 pulsar searches Total of 975 tasks 1365 data transfers 975 output files Total runtime 96:49:47 Results
Summary so far • To date • Applied to gravitational wave physics applications • Solution based on application component composition • Reasoning at the metadata level • Simple optimizations • Challenges Tomorrow • Build and integrate models and ontologies • Application components • Grid components • Apply and extend AI planning technologies to the problem domain • More sophisticated solutions • Generic solution, but also application relevant • Investigate Reliability in Grids
Outline • The GriPhyN project and Grid Applications • Workflow Management in Grids • Pegasus, Planning for Execution in Grids • Framework Description • Generating Executable Workflows • Based on abstract workflow descriptions • Based on application-level metadata • Application of Pegasus to the LIGO pulsar search • Future Research Directions
Challenges • Dealing with faults in the system • Fault tolerance • Fault avoidance • Supporting a wide spectrum of applications • Optimizing application performance • Incorporating policies in the planning decisions • Providing data provenance information for derived data
Future Research Directions: Smart middleware for the Grid • Knowledge everywhere: expressive knowledge sources to support all levels of decision-making in the Grid • Intelligent reasoners: use knowledge and operate incrementally to reason about tradeoffs in task allocation • Smart workflows: augment workflows with annotations, such as constraints and alternatives, to support incremental updates by heterogeneous agents Continued Collaboration with Jim Blythe, Yolanda Gil and Hongsuda Tangmunarunkit
Workflow History Community Users Workflow history Workflow history High-level specification of desired results, constraints, requirements, user policies Policy Information Services Policy KB Other KB Application KB Resource KB Workflow Refinement Policy Management Other Grid services Smart Workflow Pool Resource Matching Workflow Repair Workflow Manager Resource Indexes Simulation codes Replica Locators Community Distributed Resources (e.g., computers, storage, network, simulation codes, data) Pervasive Knowledge Sources Intelligent Reasoners
User’s Workflow refinement Allow incremental workflow generation Request Levels of abstraction Policy reasoner Application Workflow repair -level knowledge Relevant components Logical tasks Full abstract workflow Tasks bound to Onto-based Matchmaker resources Partial and sent for execution execution Not yet time executed executed
Publications AI Forum • “The Role of Planning in Grid Computing”Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman, Amit Agarwal, Gaurang Mehta, Karan Vahi, accepted to the International Conference on Automated Planning and Scheduling 2003 • “Transparent Grid Computing: a Knowledge-Based Approach”Jim Blythe, Ewa Deelman, Yolanda Gil, Carl Kesselman, accepted to Innovative Applications of Artificial Intelligence Conference 2003 Grid Forum • “Workflow Management in GriPhyN”, Chapter in “The Grid Resource Management” book, E. Deelman, J. Blythe, Y. Gil, Carl Kesselman 2003 • "Mapping Abstract Complex Workflows onto Grid Environments," E. Deelman, J. Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, Adam Arbree, Richard Cavanaugh, Kent Blackburn, Albert Lazzarini, Scott Koranda, to appear in the Journal of Grid Computing, 2003. • “Pegasus: Planning for Execution in Grids”, Ewa Deelman, Jim Blythe, Yolanda Gil, Carl Kesselman, Technical report GriPhyN 2002-20. www.griphyn.org, www.isi.edu/~deelman/pegasus.htm