1 / 31

Towards an information model for I2S2

Towards an information model for I2S2. Brian Matthews , Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk. Facilities Process. Record Publication. Proposal. Approval . Scheduling. Data storage.

nami
Télécharger la présentation

Towards an information model for I2S2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

  2. Facilities Process Record Publication Proposal Approval Scheduling Data storage Subsequent publication registered with facility Experiment Data analysis • Characteristics : • - formal application • - set processes • - central infrastructure • - standard tools • - hierarchical control • - dedicated staff • user office • instrument scientists • Library and IT support Scientist submits application for beamtime Tools for processing made available Facility committee approves application Raw data filtered, cleansed and stored Scientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visit

  3. Requirements • Secure access to user’s data • Flexible data searching • Scalable architecture • Extensible architecture • Integration with analysis tools • Access to high-performance resources • Linking to other scientific outputs • Data policy aware

  4. User Office System: User Database Scheduling Health and Safety Proposal Management Data Acquisition System DataAccess Portal Storage Management System Principles • The ICAT software suite • Catalogues all experiment related information • Metadata gathered via integration with existing IT systems • proposal systems • data acquisition • Provides a well defined API for easy embedding into any applications. • Access data anywhere via the web • Annotate and Search for data • Share data with colleagues • Access data via user’s own programs • Utilise integrated e-Science resources • Link to data from your publications Online Proposal System Single Sign On Account Creation and Management Metadata Catalogue ICAT Software Suite, providing the crucial integration of key functions.

  5. Component architecture

  6. ICAT Deployment User Database System Single Sign On Data Storage/ Delivery System Proposal System Publication System ICAT API e-Science Services RDBMS Software Repository Web Services API Command Line Tools Fortran C++ Java Glassfish / JBOSS

  7. Data Portal

  8. TopCat

  9. Towards an Information Model

  10. Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston http://dublincore.org/documents/singapore-framework/

  11. Functional requirements

  12. A Metadata Model for Facilities Science • A common general format/standard for Scientific Studies and data holdings metadata did not exist • By proposing a Model • A specification for the types of metadata to capture Scientific Studies • Cataloguing data holdings: provide access for the Data Owner • Ease citation, sharing collaboration, and integration • Allow easy Federation of distributed heterogeneous metadata systems into a homogeneous (virtual) Platform • Therefore – The Common Scientific Metadata Model (CSMD) developed.

  13. A Domain Model

  14. Modelling Scientific Activity

  15. Core Scientific Metadata Model Damian Flannery Name Units String Value Numeric Value Range Top Range Bottom Error Name Units String Value Numeric Value Range Top Range Bottom Error Name/Units/Value etcSearchableIs Sample ParameterIs Dataset ParameterIs Datafile ParameterVerified Reference / Proposal Id Previous ReferenceFacilityInstrument Title Abstract Etc. Name Units String Value Numeric Value Range Top Range Bottom Error Name Chemical Formula Safety Information User Id Role Name Topic Publication Keyword Full Reference URLRepository Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter Related Datafile Datafile Parameter Name Parent IdTopic Level Name Sample IdDescription Name Description Version LocationFormatFormat VersionCreate TimeModify Time SizeChecksum Source Datafile Id Destination Datafile Id RelationS/W ApllicationS/W Version User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc.Element TypeElement Id

  16. Description set profile

  17. Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational aid to where the data on the study can be found. References into the literature and community providing context about the study. • Copyright, patents and conditions of use etc relating to the study and the data in the study • . Metadata granule Metadata Granule Topic Study Description Access Conditions Data Description Data Location Related Material Legal Note

  18. ICAT 3.3 Schema – Study (2)

  19. Syntax and metadata formats

  20. ICAT API and XML format

  21. ICAT 3.3 Database Schema

  22. CSMD History • Model first pilot developed in 2001! • Now in ICAT 3.3 • Serving data from STFC Facilities (ISIS, DLS) • Model proven robust – simple yet expressive • http://code.google.com/p/icatproject/

  23. I2S2 - Infrastructure for Integration in Structural Sciences Bridging the gap between raw and derived data • EPSRC National Crystallography Service • service provision function • operates across institutions • moderate infrastructure • Diamond & ISIS • operates on behalf of multiple institutions • processes for experiments • large infrastructure engineered to manage raw data • derived data taken off site on laptops / removable drives • “Lone” researcher scenario • data sharing with colleagues via email • Little or no infrastructure • Little management of raw or derived data

  24. Interactions between research process Proposal Extend to To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies By being: - standardised - modular - extensible Record Publication Approval CSMD Scheduling Analysis Tools Facilities Experiment Facilities Experiment Data storage Data cleansing Sample Preparation Data analysis Local experiments Publication Simulation Facilities Proposal Cover the scientist’s research lifecycle as well as the facilities. Record Publication Literature Review Grant Proposal

  25. Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston http://dublincore.org/documents/singapore-framework/

  26. Issues • Metadata model • Framework for developing metadata model • Modularisation mechanisms and extensions • Formats • Model supporting laboratory tools • How does the model fit ? • Flexibility to handle local processes • Adhoc, partial, un-ordered • What needs changing in the model? • What needs changing in tools? • Data input and maintenance??? • Simple ways of inputting the data • Lab books?

  27. Extension areas: • Secondary analysis data • Preservation data • Publication data • Topic data • chemistry • Controlled lists (ontologies) for • Instruments • Facilities, • Methods • Access control • Safety data • Blogs and notebooks

  28. Part of ISIS study ISIS - ICAT Correction data Sample data Calibration data User inputs Control file Gudrun Scattering function data

  29. Derived Data Generalised model Managing the links between data Inputs of data sets Associated with a software item with a set of parameters Managing this? - lab-books ? - simple tools? - VRE ?

More Related