1 / 33

Design and Execution of Scientific Workflows using Web Services

Design and Execution of Scientific Workflows using Web Services. Ilkay Altintas Ashraf Memon Bertram Ludaescher San Diego Supercomputer Center University of California San Diego. Outline. Introduction & Overview – Bertram The Kepler System – Ilkay

liv
Télécharger la présentation

Design and Execution of Scientific Workflows using Web Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design and Execution of Scientific Workflows using Web Services Ilkay Altintas Ashraf Memon Bertram Ludaescher San Diego Supercomputer Center University of California San Diego

  2. Outline • Introduction & Overview – Bertram • The Kepler System – Ilkay • Demonstration: Geologic Map Integration – Ashraf • From Web services to Grid services (and back ;-) – Karan SDSIC 01/29/2004

  3. The Scientific Workflow (SWF) “Business” • In silico science: • from the wet lab to the information managers, analysts, data miners, … • commercially really a big business • Scientific Workflows – Goals: • simplify and automate data management & analysis for scientists • support “knowledge discovery workflows” • Scientific Workflows – Aspects: • Capture (reverse-engineer) existing SWFs • legacy SWFs: hard-wired, hard to reuse, maintain, change, … • Design new SWFs: • reuse components and SWFs • needs: intuitive modeling paradigm, clear component interaction semantics, … • Debug SWFS (test, simulate, validate, verify, …) • Operate SWFs (deploy, execute, monitor, steer, archive, re-run, …) SDSIC 01/29/2004

  4. Scientific Workflows: Tools • Scientific Workflows – Aspects: • Data Integration • Process Integration • Application/tools Integration • Different tools and angles: • PSEs (Problem Solving Environments): SciRUN, … (quite a few) • LIMS (Laboratory Information Management Systems): … (many) • Workflow systems: … (very many) • Signal processing and dataflow systems (AVS, Khoros, Ptolemy, …) • Scientific workflow systems (DiscoveryNet/InforSense, PipelinePilot/SciTegic, … Triana, Taverna, …, Kepler, …) • often dataflow oriented (but some workflow aspects too) SDSIC 01/29/2004

  5. Web Services and Scientific Workflows in Kepler • Web services = individual components (“actors”) • “Minute-Made” Application Integration: • Plugging-in and harvesting web service components is easy and fast • Rich SWF modeling semantics (“directors” and more): • Different and precise dataflow models of computation • Clear and composable component interaction semantics  Web service composition and application integration tool • Coming soon: • Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8) • SWFs with structural and semantic data types (better design support) • Grid-enabled web services (for big data, big computations,…) • Different deployment models (SWF WS, web site, applet, …) SDSIC 01/29/2004

  6. Genomics: Promoter Identification Workflow Source: Matt Coleman (LLNL) SDSIC 01/29/2004

  7. Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Test sample (d) Species presence & absence points (native range) (a) Native range prediction map (f) Training sample (d) GARP rule set (e) Data Calculation Map Generation Map Generation EcoGrid Query EcoGrid Query Validation User Validation Sample Data +A2 +A3 Model quality parameter (g) Generate Metadata Integrated layers (native range) (c) Layer Integration Layer Integration +A1 Environmental layers (native range) (b) Invasion area prediction map (f) Selected prediction maps (h) Model quality parameter (g) Integrated layers (invasion area) (c) Environmental layers (invasion area) (b) Species presence &absence points (invasion area) (a) Ecology: GARP Analysis Pipeline forInvasive Species Prediction Source: NSF SEEK (Deana Pennington et. al, UNM) SDSIC 01/29/2004

  8. Source: NIH BIRN (Jeffrey Grethe, UCSD) SDSIC 01/29/2004

  9. Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Ashraf Memon GEON Bertram Ludaescher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Mladen Vouk SDM Yang Zhao Ptolemy II … Kepler Team, Projects, Sponsors Ptolemy II SDSIC 01/29/2004

  10. Collaboration of various projects http://kepler.ecoinformatics.org The KEPLER Systemfor Scientific Workflows … • A framework for design, execution and deployment of scientific workflows • Caters specifically to the domain scientist • Builds on Ptolemy II (next slide... :-) SDSIC 01/29/2004

  11. … based on Ptolemy II • A set of Java packages for heterogeneous, concurrent modeling, design and execution. • Strengths include: • Precisely defined models of computation and component interaction • e.g. Process Networks (PN) – data-flow oriented • An intuitive GUI that lets rapid workflow composition • A modular, reusable and extendable object-oriented environment • An XML based workflow definition – MoML • Workflows defined in Ptolemy II MoML XML schema • Easily exchangable SDSIC 01/29/2004

  12. KEPLER Core Capabilities (1/2) • Capturing scientific workflows • Accessing available workflows through the Grid • Designing scientific workflows • Composition of actors (tasks) to perform a scientific WF • Actor prototyping • Accessing heterogeneous data • Data access wizard to search and retrieve Grid-based resources • Relational DB access and query • Ability to link to EML data sources SDSIC 01/29/2004

  13. KEPLER Core Capabilities (2/2) • Data transformation actors to link heterogeneous data • Executing scientific workflows • Distributed and/or local computation • Various models for computational semantics and scheduling • SDF and PN: Most common for scientific workflows • External computing environments: • C++, Python, C (… Perl--planned ...) • Deploying scientific tasks and workflows as web services (… planned …) SDSIC 01/29/2004

  14. Drag and drop utilities, director and actor libraries. The KEPLER GUI (Vergil) SDSIC 01/29/2004

  15. Running the workflow SDSIC 01/29/2004

  16. Distributed SWFs in KEPLER • Web and Grid Service plug-ins • WSDL, GWSDL • ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard • WS Harvester • Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors • WS-deployment interface (…ongoing work…) • XSLT and XQuery transformers to link non-fitting services together SDSIC 01/29/2004

  17. Configure - select service operation A Generic Web Service Actor • Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. SDSIC 01/29/2004

  18. Set Parameters and Commit Set parameters and commit SDSIC 01/29/2004

  19. WS Actor after Instantiation SDSIC 01/29/2004

  20. Web Service Harvester • Imports the web services in a repository into the actor library. • Has the capability to search for web services based on a keyword. SDSIC 01/29/2004

  21. Output of previous web service Composing 3rd-Party WSs Input of next web service User interaction & Transformations SDSIC 01/29/2004

  22. More information… • Recent changes in the WS and Grid standards • Changes in the future expected based on the changes on the standards. • Focus for this talk: web service-based components of Kepler. For more info on other Kepler components: • http://kepler.ecoinformatics.org • http://kbis.sdsc.edu/SciDAC-SDM/ • http://ptolemy.eecs.berkeley.edu/ptolemyII/ • http://seek.ecoinformatics.org SDSIC 01/29/2004

  23. What’s next? • Ashraf Memon • GEON Geological Map Information Integration • Conceptual Workflow • WS-based Architecture and Design in Kepler • DEMO in Kepler • Karan Bhatia • Grid standards and their relations to web services • OGSI, OGSA, GWSDL, etc. • Informal discussion on WSRF SDSIC 01/29/2004

  24. Problem Description • Geologic Map Information Integration (GMMI) • Integration of Heterogeneous Geological Datasets • Data sets • State geology map datasets (rocky mountain area) • State boundaries and coast lines. SDSIC 01/29/2004

  25. Heterogeneities • System • Use Different operating systems to store and process the data, vendor databases. • Representational • Different Formats (shape files, BLOB, binary, spatial data objects etc.). • Structural • Different schema (table) structures. SDSIC 01/29/2004

  26. Heterogeneities • Syntactic • Different Query Languages (SQL, Spatial SQL, XQuery etc.) • Semantic • Use of different concept maps by different state for storing the data values. • Example, use of term “Holocene”, “Pleistocene”, that are the sub-periods of “Quarternary” period which in the geologic age hierarchy, others unknown about the finer details about the geology would refer to its subdivisions (“Quarternary”). SDSIC 01/29/2004

  27. Using Web Services SDSIC 01/29/2004

  28. Continued… Ontology Legend Generator Map Assembler … Web Service FOR MAP INTEGRATION ArcIMS and WMS Services wrapped in WSDL/SOAP SDSIC 01/29/2004

  29. GMMI WF Designed in Kepler SDSIC 01/29/2004

  30. DataMapper Sub-Workflow SDSIC 01/29/2004

  31. The result in a BrowserDisplay SDSIC 01/29/2004

  32. Kepler … is a community-based, cross-project, open source collaboration uses web services as basic building blocks has a joint CVS repository, mailing lists, web site, … is gaining momentum thanks to contributors and contributions BSD-style license allows commercial spin-offs a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you… KEPLER and You SDSIC 01/29/2004

  33. From Web Services to Grid Serivces … and back! Source: Ian Foster’s GlobusWORLD keynote talk SDSIC 01/29/2004

More Related