1 / 26

Data Grids - Data Management Environments for e-Science

Data Grids - Data Management Environments for e-Science. Kerstin Kleese van Dam et. al., CCLRC e-Science Centre k.kleese@dl.ac.uk http://www.e-science.clrc.ac.uk. Metadata. Data without further information is only of short and very limited use. Varying degree of Metadata

Jeffrey
Télécharger la présentation

Data Grids - Data Management Environments for e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Data Grids - Data Management Environments for e-Science Kerstin Kleese van Dam et. al., CCLRC e-Science Centre k.kleese@dl.ac.uk http://www.e-science.clrc.ac.uk

  2. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Metadata Data without further information is only of short and very limited use. Varying degree of Metadata Many standards and formats Example: CLRC Scientific Metadata Schema http://www.e-science.clrc.ac.uk/Activity/ACTIVITY=DataPortal;SECTION=5; used by ISIS, e-Minerals and e-Materials project NERC DataGrid Metadata Model

  3. Discovery Excavation Experimenter Data curator General community Wider science community Specialistuser Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam CCLRC Scientific Metadata Model - Diversity: Users & Searches

  4. Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational to where the data on the study can be found. References into the literature and community providing context about the study. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam CCLRC Scientific Metadata Model Metadata Object Topic Study Description Access Conditions Data Description Data Location Related Material

  5. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam NERC DataGrid Metadata and Data Model • Provides clear separation of function • Difference between data use and discovery etc. • “Tuning” of metadata to include relevant detail • Allows increased reuse of metadata model • Avoids tie-in to details of a particular fields data formats • Can plug-in another data model

  6. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Conceptual Overview

  7. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam NDG Data Model Dataset: named container for a number of variables Variable: physical parameters within the dataset; controlled vocabularies eg BODC datadictionary, CF standard names Array: multidimensional container for other arrays or numeric data Coordinate: may be shared between multiple Arrays; ‘anonymous’ if not georeferenced; MappedCoordinate vs ProductCoordinate; with respect to a Coordinate reference System (ref ISO 19111, ISO 19115) GranuleDescriptor: describes data granule in terms of file storage; enables file aggregation; SQL/OGSA-DAI for RDBMS; physical or logical (eg SRB) files “Profiles” of model defined for important data types

  8. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Different Levels of Metadata supporting Discovery and Selection A -Metadata – can be derived from the data itself B -Metadata – A summary of all other types of metadata C -Metadata – All related metadata, papers, pictures, related studies D -Metadata – User provided information on what, who, what and when

  9. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Data Discovery Most data is currently ‘discovered’ by word of mouth from friends and colleagues or sheer luck. In a grid environment it is necessary to automate these processes to enable humans and machines/processes alike to discover data. Example: CCLRC DataPortal http://esc.dl.ac.uk:9000/index.html The DataPortal software is also used in the e-Minerals Mini Grid.

  10. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam CCLRC DataPortal • The CCLRC DataPortal currently allows access to selected metadata and data from four facilities. The first three housed by CLRC: • The Synchrotron Radiation Department (SRD) • The Neutron Spallation Source (ISIS) • The British Atmospheric Data Centre (BADC) • Max-Planck Institute for Meteorology (MPIM) You are able to assess the available data via the basic search. A Grid enabled version of the DataPortal can be found under: http://esc.dl.ac.uk:9000/dataportal/index.html You can also download the code itself for your project under: for unix http://esc.dl.ac.uk:9000/dist/dataportal/v3/dataportal-v3.tar.gz for windows http://esc.dl.ac.uk:9000/dist/dataportal/v3/dataportal-v3.zip

  11. CLRC DataPortal Server Other Instances of the CLRC DataPortal Server XML wrapper XML wrapper XML wrapper Local metadata Local metadata Local metadata Local data Local data Local data Facility 1 Facility N Facility 1 ... Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam General CLRC DataPortal Architecture

  12. Data Transfer External Data File Store(s) Authentication & Authorisation DataPortal Web Interface Service Look Up Certification Authority DataPortal Permanent Repository Session Management Query & Reply Shopping Cart The Shopping Cart allows registered users to permanently store and annotate pointers to the external data files and data sets. Facilities Access Control Facilities XML Wrappers Facility Administration allows external facilities to advertise their grid services to the DataPortal. Facility Administration Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam DataPortal Architecture (2) As well as interacting with the DataPortal via the Web Interface users can also run queries by directly calling the Query & Reply service assuming that they are properly authenticated. Other services are also externally visible, for example the Shopping Cart.

  13. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  14. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  15. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  16. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  17. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  18. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Metadata Capture Metadata needs to be captured or harvested. Some metadata can only be obtained through interaction with the user other metadata can be obtained automatically. The first option needs to be reduced to the absolute minimum.

  19. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam

  20. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Automatic capture from Climate Simulation

  21. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Data Management The Grid environment provides access to a multitude of storage systems, often hiding the type of system behind services interfaces. Managing personal data in a Grid environment. Two possible solutions to manage your data: Globus Data Management tools - example ESG http://www.earthsystemsgrid.org Storage Resource Broker (SRB) from the San Diego Super Computing Centre http://www.npaci.edu/DICE/SRB

  22. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Storage Resource Broker (1) Professional Data Storage Management System initially developed in the mid 90’s by the San Diego Super Computing Centre. http://www.npaci.edu/DICE/SRB/. Current version supports many platforms and authentication methods. Web services Interfaces.

  23. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Storage Resource Broker Integrated access to data on PC, UNIX, LINUX, DB and Tape Store http://www.npaci.edu/dice/srb/mySRB/mySRB.html SRB is currently used within CCLRC and Southampton, operated for the e-Minerals Mini Grid, Bristol PPD, will be tested for NERC DataGrid, e-Materials.

  24. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Functions including ingestion, movement and replication of data. Providing access to data for others Version of Data Type of Data Replica or Original Data Physical Data Location and Type of Resource

  25. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam Current projects of the Data Management Group of the CCLRC e-Science Centre CLRC DataPortal Environment from the Molecular Level NERC DataGrid Automatic Collection of Climate Simulation Metadata Storage Resource Broker e-Science Database Service Hydrology Data Grid (just funded) e-Science Technologies for the Simulation of Complex Materials

  26. Environmental e-Science Challenges & Opportunities Kerstin Kleese van Dam More Information can be found under: CLRC e-Science Centre Projects - http://www.e-science.clrc.ac.uk/web/projects/ Kerstin Kleese k.kleese@dl.ac.uk http://www.e-science.clrc.ac.uk/web/staff/kerstin_kleese

More Related