1 / 31

Publication of facility investigations

Publication of facility investigations. Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk. STFC. Funds and operates large scale science for UK Research base - physics, astronomy

yaron
Télécharger la présentation

Publication of facility investigations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

  2. STFC Funds and operates large scale science for UK Research base - physics, astronomy - chemistry, materials Scientific computing develop and operate computing infrastructure - HPC, PB Datastore, s/w, data management… ESO: Alma Array

  3. Major Science Facilities Big Science Particle Physics - exploring the very small Space Science - exploring the very large Small Science Understanding the world around us at a molecular level Lasers, Neutron & Light Source – ISIS & Diamond

  4. Facilities Support ISIS CLF Diamond Big Facilities for Small Science

  5. Science at STFC Facilities Neutrons and photons Provide complementary views of matter: Photons “see” electric charge – high atomic number nuclei Neutrons “see” nucleons – especially hydrogen atoms knowledge ComputingAnalysis Modelling data beam sample Imaging detector

  6. The science we do - Structure of materials • ~30,000 user visitors each year in Europe: • physics, chemistry, biology, medicine, • energy, environmental, materials, culture • pharmaceuticals, petrochemicals, microelectronics Visit facility on research campus Place sample in beam Diffraction pattern from sample Fitting experimental data to model • Billions of € of investment • c. £400M for DLS • + running costs • Over 5.000 high impact publications per year in Europe • But so far no integrated data repositories • Lacking sustainability & traceability Magnetic moments in electronic storage Hydrogen storage for zero emission vehicles Bioactive glass for bone growth Longitudinal strain in aircraft wing Structure of cholesterol in crude oil

  7. Similar architecture use for DLS • Scaling is a constant concern • Data rates keep increasing • 70TB per month and rising • Tailored ICAT • Reengineered StorageD

  8. Secure access to user’s data • Flexible data searching • Scalable and extensible architecture • Integration with analysis tools • Access to high-performance resources • Linking to other scientific outputs • Data policy aware Central Facility Example ISIS Proposal H2-(zeolite) vibrational frequencies vs polarising potential of cations B-lactoglobulin protein interfacial structure GEM – High intensity, high resolution neutron diffractometer Proposals Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment. Experiment Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team Publication Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications. Analysed Data You will have the capability to upload any desired analysed data and associate it with your experiments. http://code.google.com/p/icatproject/

  9. Core Scientific Metadata Model (CSMD) Topic Publication Keyword Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter The Core Metadata model forms the information model for ICAT. Designed to describe facilities based experiments in Structural Science. Related Datafile Datafile Parameter

  10. TopCat

  11. DOI’s for Data Publication

  12. Is this enough? • What we have so far is good for: • us to manage data • users to access their own data • citation of raw data • But • Traceability and Validation? • Reuse of the data? • Need to make context more explicit • Focussing on the dataset is the wrong subject of discourse

  13. Support the wider Facilities Lifecycle Record Publication Proposal Approval Scheduling Data storage Subsequent publication registered with facility Experiment Data analysis Scientist submits application for beamtime Tools for processing made available Raw data filtered, and stored Facility committee approves application Scientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visit As in PanData-ODI – D6.1 (which has much more detail)

  14. Publishing Investigations • So what we want is a record of EXPERIMENTS not data. • Thus want the record of the context • The experimental intention and actors • The instruments and configurations used • The sample • The environmental parameters and context • The Raw Data • Thus we want to publish a record of the whole INVESTIGATION • Can get most of this this from what we have • The Investigation becomes a “first class” research object • Published • Identified and treated as a single entity • Cited and credited • Record of the output of the facility • Analogous to a Journal Article • Investigation as the unit of discourse for scientific facilities. • But also as an access point for validation and reuse • Because we have a record of what actually happened.

  15. Our DataCite entries are in fact Investigations (red is for “data” notion, and green is for “investigation”)

  16. “DataCite abuse” As we have seen, we use DataCite for Investigations, with Datasets only referred from them. Other data curators sometimes use DataCite for Publications (“documents”) that contain data: http://data.datacite.org/10.7480/OA So “data” DOIs tend to resolve either into Investigations or Publications • Extend the Resource Type • Also may not want to have a landing page for all DOIs

  17. Research Objects • Represent the “investigation” as a Research Object • Research Objects (ROs) are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations. Their goal is to create a class of artifacts that can encapsulate our digital knowledge and provide a mechanism for sharing and discovering assets of reusable research and scientific knowledge • www.researchobject.org and elsewhere (WorkFlow4Ever) • Represent Investigation as a Research Object • Build a graph structure for the links in the research object. • Using an RDF representation, URIs • Publish as a linked data object Bechhofer, et. al. Why Linked Data is Not Enough for Scientists, Proceedings of the 10th IEEE e-Science Conference, Brisbane, Australia (2010) http://eprints.ecs.soton.ac.uk/21587/5/research-objects-final.pdf ArifShaon, Sarah Callaghan, Bryan Lawrence, Brian Matthews. Opening up Climate Research: a linked data approach to publishing data provenance 7thInt Digital Curation Conference (2011).

  18. RDF representation of CSMD model <!-- csmd:Investigation --> <owl:Classrdf:about="csmd:Investigation"> <rdfs:label>Investigation</rdfs:label> <rdfs:comment>An investigation or experiment</rdfs:comment> </owl:Class> <!-- csmd:Facility --> <owl:Classrdf:about="csmd:Facility"> <rdfs:label>Facility</rdfs:label> <rdfs:comment>An experimental facility</rdfs:comment> </owl:Class> <!-- csmd:Dataset --> <owl:Classrdf:about="csmd:Dataset"> <rdfs:label>Dataset</rdfs:label> <rdfs:comment>A collection of data files and part of an investigation</rdfs:comment> </owl:Class> <!-- csmd:Datafile --> <owl:Classrdf:about="csmd:Datafile"> <rdfs:label>Datafile</rdfs:label> <rdfs:comment>A data file</rdfs:comment> </owl:Class>

  19. After proposal: Initialise the Research Object :n a csmd:Investigation ; csmd:investigation_doidoi:stfc.xxx.n csmd:investigation_investigationUser :iu1 ; csmd:investigation_instrument :inst1 . :iu1 a csmd:investigationUser ; csmd:investigationUser_user :u1 . :u1 a csmd:User . :inst1 a csmd:Instrument . :investigator Investigation #n DOI:STFC.xxx.n :instrument

  20. After the experiment Experimental Data Metadata :investigator :dataset Investigation #n DOI:STFC.xxx.n Data Storage :sample :instrument • Own metadata format (CSMD) • More or less what ICAT currently supports • Adds extra details on parameters, datasets, formats etc.

  21. Linking Publication into Investigation Raw Data Repository :dataset :investigator Investigation #n DOI:STFC.xxx.n cito:cites cito:cites :sample :instrument :publication :publication Publication Repository Publication Store

  22. Linking the derived data into the Investigation Raw Data Repository :dataset :investigator Derived Data Repository Investigation #n DOI:STFC.xxx.n :relatedDataset :sample :instrument :publication :publication Publication Repository • Note that derived data could be on a different site

  23. Linking the software into the Investigation Software Repository :inputDataset :dataset :investigator :application Investigation #n DOI:STFC.xxx.n Software Package 1 :outputDataset :relatedDataset :sample :instrument cito:cites cito:cites :publication :publication • W3C Prov ontology • Assume that the software is in a repository

  24. Generate Landing page from RO

  25. Setting the Boundary: It depends on your Point of View Investigations Extended Publication E-Portfolio

  26. Setting a boundary : OAI-ORE

  27. Preserving Investigations • Now becomes preserving the research object. • Preserving a linked data graph • Persistency of identifiers • Managing integrity of external artefacts. • Link checking • Copying and mirrorign – checking consistency • Representation Information to give more context on the objects • And on the aggregate as a whole • PDI (Provenance, Integrity etc) on the whole aggregate object • As well as components

  28. Adding Preservation Information – Rep Info for various items Software classification Raw data format description (e.g. NeXus) Software description Parameter description (e.g. NXDL, Con Vocab) :dataset :investigator :application Investigation #n DOI:STFC.xxx.n :relatedDataset Analysed data format description :sample :instrument :publication Publication format description :publication Sample description • Would probably be more • Work into a RepInfo Repository • Would also have a RepInfoNetwork Instrument description (website)

  29. Adding Preservation Information – Rep Info for the whole aggregate CSMD Vocabulary description :dataset :investigator :application Investigation #n DOI:STFC.xxx.n :relatedDataset :sample :instrument :publication Software classification :publication

  30. Summary • Investigation appropriate unit of discourse for facilities science • Publishable, Citable, Reportable • Can be used as a vehicle for validation and reuse • Basic principles of building research objects for facilities science • Follow research lifecycle • Consider Investigation a RO “seed” • Apply Linked Data principles • Re-use existing vocabularies and ontologies • Share ROs via recognizable data formats and APIs • Applicable beyond Facilities • Other analogous objects: • “experiments”, “observations”, “studies” • The subject of preservation • How do we maintain the integrity of Investigation objects?

  31. Thank YouQuestions?brian.matthews@stfc.ac.ukwww.e-science.stfc.ac.uk

More Related