1 / 20

Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products

UAHuntsville The University of Alabama in Huntsville. Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products. AMSR-E Science Team Meeting 29 June 2011. Instant Karma Overview. Data Center Operations. Provenance Research. Collaboration among

len
Télécharger la présentation

Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UAHuntsville The University of Alabama in Huntsville Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011

  2. Instant Karma Overview Data Center Operations Provenance Research • Collaboration among • AMSR-E SIPS (MSFC Earth Science Office and UAHuntsville ITSC) • Provenance researchers at Indiana University’s Data to Insight Center • AMSR-E Sea Ice science team (GSFC) Earth Science Instant Karma Project • Primary goal is to improve the collection, preservation, utility and dissemination of provenanceinformationwithin the NASA Earth Science community • Using Karma provenance tool • Initial focus on Sea Ice processing AMSR-E Science Team Meeting Asheville, NC

  3. Team Members AMSR-E Science Team Meeting Asheville, NC

  4. Key Responsibility at Data Creation • Archivists are traditionally responsible for curation (i.e., adding metadata and provenance) but not present at creation  of scientific data.  Moment of creation is where most knowledge about product is present. • To be effective, requires tools in earliest stages of data’s life that help with preservation.     Create Distribute Archive Long Term Access Preservation Action Preservation Action Preservation Action Diagram from Berman et al.  “Sustainable Economics for a Digital Planet”   AMSR-E Science Team Meeting Asheville, NC

  5. Instant Karma Provenance Research • How to’s of provenance capture are becoming  better understood • Collecting right (and only “right”)  provenance is a harder challenge, and one we hope to make  progress on in this project • Delivering on representation and use case support simultaneously for today’s users and next  generation uses is ongoing challenge • Sea  ice  research,  climate,  classroom,  informatics  ... AMSR-E Science Team Meeting Asheville, NC

  6. Types of Provenance Information • Data lineage: product generation “recipe” (data inputs, software and hardware) • Additional knowledge about science algorithms, instrument variations, etc. • Lots of information already available, but scattered across multiple locations • Processing system configuration • Dataset and file level metadata • Processing history information • Quality assurance information • Software documentation (e.g., algorithm theoretical basis documents, release notes) • Data documentation (e.g., guide documents, README files) • Instant Karma project aims to collate and organize information from multiple sources AMSR-E Science Team Meeting Asheville, NC

  7. SIPS-GHRC Processing Architecture • Algorithm Packages from the Science Team and Team Lead of the Science Computing Facility • Processing automation controlled by SIPS scripts • Pass processing is data driven • L3 product generation is scheduled after nominal availability of input products Daily Script Control Script Pentad Script Delivered Algorithm Package Ocean Land Snow Sea Ice Pass Script Snow L2A Brightness Temps L2B Ocean L2B Land L2B Rain Monthly Script Ocean Rain Snow Weekly Script Ancillary File(s) Ocean Science Data Metadata Processing History Quality Assurance Browse imagery Custom subsets Input Data and Metadata File(s) AMSR-E Science Team Meeting Asheville, NC

  8. Sea Ice Product Generation Ice Mask Snow Melt Mask Perl with C Provided by SIPS C and FORTRAN Provided by Science Team Daily Processing Script Bourne shell Provided by SCF Delivered Algorithm Package Sea Ice Products (6, 12, and 25 km) Metadata Processing History Quality Assurance Browse imagery Custom subsets Sea Ice Algorithms One day’s worth of Level-2A Tbs Software Libraries Operating System, Compilers Hardware Platform AMSR-E Science Team Meeting Asheville, NC

  9. AMSR-E Standard Sea Ice Products Except for the snow depth product, all products daily averages using (a) ascending orbits only, (b) descending orbits only, (c) all orbits. AMSR-E Science Team Meeting Asheville, NC

  10. Perspectives on Processing Lineage Data Processing View Provenance View AMSR-E Science Team Meeting Asheville, NC

  11. AMSR-E Provenance Use Cases • Browse provenance graphs : convey rich information about final data granule details [Use case 1] • Spatial location, time of observation, algorithms employed, input data and ancillary files • Provenance bundle to include pointers to relevant documentation • Answer “Something isn’t right” question [Use case 1 variant] • E.g., did not receive data for several days so snow melt mask may be inaccurate. • Compare two data granules [Use case 2] • Query system to get list of provenance differences (e.g., versions of software, number and versions of input files) • General provenance graph for a given science process, e.g., Sea Ice processing [Use case 3] • Current algorithms and versions, nominal number and versions of input files, pointers to relevant documentation AMSR-E Science Team Meeting Asheville, NC

  12. Provenance Collection, Storage and Access • Instrumented AMSR-E SIPS processing workflow for sea ice in the testbed environment. • Provenance information is captured in experiment run log files • Log files are parsed to generate provenance notifications. • These notifications are then imported into the Karma database. • The Karma Service Query API is used to generate OPM-compatible XML graphs, each corresponding to a processing run. Daily Script Processing Testbed ProvenanceCollection Ocean Land Snow Sea Ice Provenance Browser QueryAPI Pass Script Karma Provenance Repository L2A Brightness Temps L2B Ocean L2B Land L2B Rain Monthly Script Ocean Rain Snow Weekly Script Pentad Script Ocean Snow AMSR-E Science Team Meeting Asheville, NC

  13. Additional Science-Relevant Provenance and Context Information • Harvesting granule information from ECS metadata • Also recording processing location associated with each data granule • Working with AMSR-E Science Computing Facility to identify algorithm and data product information • Algorithm versions and descriptions • Parameters and data fields • Ancillary files • Flag values and explanations • Pointers to full documentation • Defining how to harvest, transmit and display this information AMSR-E Science Team Meeting Asheville, NC

  14. Provenance Browser for AMSR-E Products • Initial Prototype • Interactive browsing of provenance graphs and information about workflows and final data products • Uses Karma Service Query API to extract provenance graphs from the Karma provenance repository. http://instantkarmatest.itsc.uah.edu/karmabrowser/view AMSR-E Science Team Meeting Asheville, NC

  15. Instant Karma Major Milestones AMSR-E Science Team Meeting Asheville, NC

  16. Project Schedule AMSR-E Science Team Meeting Asheville, NC

  17. Near Term Plans • Begin routine daily Sea Ice processing with provenance collection in Testbed • Enhance context metadata available to Karma system • Refine Provenance Browser and Karma Query API • Develop joint Karma project / SIPS approach for instrumenting additional processing workflows • Work with other NASA Earth science provenance projects to develop common approaches to providing provenance information with science data files • Begin to engage user community • Present to AMSR-E Science Team • Outreach to Sea Ice community at GSFC • Work with NSIDC DAAC to identify friendly users for beta testing AMSR-E Science Team Meeting Asheville, NC

  18. Conclusions • Instant Karma project is on track to incorporate provenance capture into the AMSR-E SIPS • Ready to begin testing the AMSR-E Provenance Browser with the wider user community • Continuing to work with NASA ESDIS, ESDSWG and ESIP Federation in defining common provenance practices across NASA data systems AMSR-E Science Team Meeting Asheville, NC

  19. Instant Karma Project Contacts • Instant Karma Project Web Site • http://instantkarmatest.itsc.uah.edu/ • http://www.pti.iu.edu/d2i/provenance_instantkarma • AMSR-E Provenance Browser • http://instantkarmatest.itsc.uah.edu/karmabrowser/view • Contacts • Michael Goodman michael.goodman@nasa.gov • Helen Conover hconover@itsc.uah.edu • Beth Plaleplale@cs.indiana.edu • Thorsten Markus thorsten.markus-1@nasa.gov AMSR-E Science Team Meeting Asheville, NC

  20. Acronyms AMSR-E Advanced Microwave Scanning Radiometer for EOS API Application Programming Interface DAAC Distributed Active Archive Center ECS EOSDIS Core System EOS Earth Observing System EOSDIS Earth Observing System Data and Information System ESDSWG Earth Science Data Systems Working Group ESIP Earth Science Information Partners GSFC Goddard Space Flight Center ITSC Information Technology & Systems Center MSFC Marshall Space Flight Center MODIS Moderate Resolution Imaging Spectroradiometer NASA National Aeronautics and Space Administration NSIDC National Snow and Ice Data Center OPM Open Provenance Model SCF Science Computing Facility SIPS Science Investigator-led Processing System UAHuntsville University of Alabama in Huntsville XML eXtensible Markup Language AMSR-E Science Team Meeting Asheville, NC

More Related