1 / 26

Using Grid Technologies to Support Large-Scale Astronomy Applications

Using Grid Technologies to Support Large-Scale Astronomy Applications. Ewa Deelman Center for Grid Technologies USC Information Sciences Institute. Outline. Large-scale applications Mapping large-scale applications onto Grid environments Pegasus (developed by ISI under the GriPhyN project)

amergin
Télécharger la présentation

Using Grid Technologies to Support Large-Scale Astronomy Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information Sciences Institute Ewa Deelman deelman@isi.edu

  2. Outline • Large-scale applications • Mapping large-scale applications onto Grid environments • Pegasus (developed by ISI under the GriPhyN project) • Supporting Montage (an image mosaicking application) on the Grid • Recent results of running on the Teragrid • Other applications and conclusions Ewa Deelman deelman@isi.edu

  3. Acknowledgements Pegasus • Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI) • James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) • http://pegasus.isi.edu • Research funded as part of the NSF GriPhyN, NVO and SCEC projects and EU-funded GridLab Montage • Bruce Berriman, John Good, Anastasia Laity, IPAC • Joseph C. Jacob, Daniel S. Katz, JPL • http://montage.ipac.caltech.edu/ • Montage is funded by the NASA’s Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology. Ewa Deelman deelman@isi.edu

  4. Grid Applications • Increasing in the level of complexity • Use of individual application components • Reuse of individual intermediate data products • Description of Data Products using Metadata Attributes • Execution environment is complex and very dynamic • Resources are heterogeneous and distributed in the WAN • Resources come and go because of failure or policy changes • Data is replicated • Components can be found at various locations or staged in on demand • Separation between • the application description • the actual execution description Ewa Deelman deelman@isi.edu

  5. Ewa Deelman deelman@isi.edu

  6. Ewa Deelman deelman@isi.edu

  7. Why Automate Workflow Generation? Usability: • Limit User’s necessary Grid knowledge • Monitoring and Directory Service • Replica Location Service Complexity: • User needs to make choices • Alternative application components • Alternative files • Alternative locations • The user may reach a dead end • Many different interdependencies may occur among components Solution cost: • Evaluate the alternative solution costs • Performance • Reliability • Resource Usage Global cost: • minimizing cost within a community or a virtual organization • requires reasoning about individual user’s choices in light of other user’s choices Ewa Deelman deelman@isi.edu

  8. Concrete Workflow Generation and Mapping Ewa Deelman deelman@isi.edu

  9. Specifying abstract workflows • Using GriPhyN Tools (Chimera) • Using the Chimera Virtual Data Language • Writing the abstract workflow directly • Using scripts (write XML) • Using high-level workflow composition tools • Component Analysis Tool (CAT), uses ontologies to describe workflow components TR galMorph( in redshift, in pixScale, in zeroPoint, in Ho, in om, in flat, in image, out galMorph ) { … } Ewa Deelman deelman@isi.edu

  10. Generating a Concrete Workflow Information • location of files and component Instances • State of the Grid resources Select specific • Resources • Files • Add jobs required to form a concrete workflow that can be executed in the Grid environment • Data movement • Data registration • Each component in the abstract workflow is turned into an executable job Ewa Deelman deelman@isi.edu

  11. Pegasus:Planning for Execution in Grids • Maps from abstract to concrete workflow • Algorithmic and AI-based techniques • Automatically locates physical locations for both components (transformations) and data • Finds appropriate resources to execute • Reuses existing data products where applicable • Publishes newly derived data products • Chimera virtual data catalog • Provides provenance information Ewa Deelman deelman@isi.edu

  12. Information Components used by Pegasus • Globus Monitoring and Discovery Service (MDS) • Locates available resources • Finds resource properties • Dynamic: load, queue length • Static: location of GridFTP server, RLS, etc • Globus Replica Location Service • Locates data that may be replicated • Registers new data products • Transformation Catalog • Locates installed executables Ewa Deelman deelman@isi.edu

  13. Example Workflow Reduction • Original abstract workflow • If “b” already exists (as determined by query to the RLS), the workflow can be reduced Ewa Deelman deelman@isi.edu

  14. Mapping from abstract to concrete • Query RLS, MDS, and TC, schedule computation and data movement Ewa Deelman deelman@isi.edu

  15. Condor’s DAGMan • Developed at UW Madison (Livny) • Executes a concrete workflow • Makes sure the dependencies are followed • Executes the jobs specified in the workflow • Execution • Data movement • Catalog updates • Provides a “rescue DAG” in case of failure Ewa Deelman deelman@isi.edu

  16. What is Montage? • Delivers custom, science grade image mosaics • User specifies projection, coordinates, spatial sampling, mosaic size, image rotation • Preserve astrometry & photometric accuracy • Modular “toolbox” design • Loosely-coupled Engines for Image Reprojection, Background Rectification, Co-addition • Control testing and maintenance costs • Flexibility; e.g custom background algorithm; use as a reprojection and co-registration engine • Public service will be deployed on the Teragrid • Order mosaics through web portal Ewa Deelman deelman@isi.edu

  17. Montage Portal Ewa Deelman deelman@isi.edu

  18. Small Montage Workflow ~1200 nodes Ewa Deelman deelman@isi.edu

  19. Mosaic of M42 created on the Teragrid resources using Pegasus Ewa Deelman deelman@isi.edu

  20. Node Clustering for Performance (Gurmeet Singh, ISI) Overheads are incurred when scheduling individual nodes of the workflow Oneway to look at the workflow is by level and then cluster jobs within the level and destined for the same host You can construct as many clusters as there are available processors for example mProject mDiff mFitplane mConcatFit mBgModel mBackground mAdd Ewa Deelman deelman@isi.edu

  21. Total time (in minutes) for executingthe concrete workflow for creating a mosaic covering 6× 6degrees2 region centered at M16. Ewa Deelman deelman@isi.edu

  22. Total time taken (in minutes) for executing the concrete workflow as the size of the desired mosaic increases from 1×1 degree2 to 10×10 degree2 centered at M16. 64 processors used Number of nodes in The abstract workflow Ewa Deelman deelman@isi.edu

  23. Benefits of the workflow & Pegasus approach • The workflow exposes • the structure of the application • maximum parallelism of the application • Pegasus can take advantage of the structure to • Set a planning horizon (how far into the workflow to plan) • Cluster a set of workflow nodes to be executed as one • Pegasus shields from the Grid details • Pegasus can run the workflow on a variety of resources • Pegasus can run a single workflow across multiple resources • Pegasus can opportunistically take advantage of available resources (through dynamic workflow mapping) • Pegasus can take advantage of pre-existing intermediate data products • Pegasus can improve the performance of the application. Ewa Deelman deelman@isi.edu

  24. Applications Using Pegasus and DAGMan • GriPhyN applications: • High-energy physics: Atlas, CMS (many) • Astronomy: SDSS (Fermi Lab, ANL) • Gravitational-wave physics: LIGO (Caltech, AEI) • Astronomy: • Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) • Montage (IPAC, JPL, NASA-funded) • Biology • BLAST (ANL, PDQ-funded) • Neuroscience • Tomography for Telescience(SDSC, NIH-funded) • Earthquake Science • Simulation of earthquake propagation in soil (in the Southern California area – SCEC) Ewa Deelman deelman@isi.edu

  25. Future directions • Improving scheduling strategies • Supporting the Pegasus framework through pluggable interfaces for resource and data selection • Support for staging in executables on demand • Supporting better space and resource management (space and compute node reservation) • Reliability Ewa Deelman deelman@isi.edu

  26. For more information • NVO project www.us-vo.org • GriPhyN project www.griphyn.org • Virtual Data Toolkit www.cs.wisc.edu/vdt • Montage montage.ipac.caltech.edu (IRSA Booth) • Pegasus pegasus.isi.edu • My website www.isi.edu/~deelman Ewa Deelman deelman@isi.edu

More Related