1 / 1

Resource Discovery for Extreme Scale Collaboration

Sponsors: US Department of Energy.

farren
Télécharger la présentation

Resource Discovery for Extreme Scale Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sponsors: US Department of Energy The amount of data produced in the practice of science is growing rapidly. Despite the accumulation and demand for scientific data, relatively little is actually made available for the broader scientific community. We surmise that the root of the problem is the perceived difficulty to electronically publish scientific data and associated metadata in a way that makes it discoverable. We propose to exploit Semantic Web technologies and practices to make (meta)data discoverable and easy to publish. We share our experiences in curating metadata to illustrate both the flexibility of our approach and the pain of discovering data in the current research environment. We also make recommendations by concrete example of how data publishers can provide their (meta)data by adding some limited, additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery/access/sharing can be greatly reduced and the impact of research data greatly enhanced. Glossary: OWL – Web Ontology Language PNNL – Pacific Northwest National Laboratory RDESC – Resource Discovery for Extreme Scale Collaboration RDFS – Resource Description Language Schema RPI – Rensselaer Polytechnic Institute SPARQL – a RDF query language S2S – a faceted web browser TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Resource Discovery for Extreme Scale Collaboration Jesse Weaver1 (Jesse.Weaver@pnnl.gov), Alan Chappell1 (alan.chappell@pnnl.gov), Sumit Purohit1(sumit.purohit@pnnl.gov), William Smith1(william.smith@pnnl.gov), Patrick West2 (westp@rpi.edu), Benno Lee2(leeb5@rpi.edu), Karen Schuchardt1(karen.schuchardt@pnnl.gov), Peter Fox2(pfox@cs.rpi.edu) (1Pacific Northwest National Laboratory, 2Rensselaer Polytechnic Institute) Resources: http://rdesc.org - site developed fro RDESC project http://rdesc.org/2014/ - The RDESC ontology Acknowledgments: Eric Rozell, Masters Student at Rensselaer Polytechnic Institute now with Microsoft RDESC Architecture TWC/RPI S2S Faceted Browser Facets on the left allow users to constrain their search based on data resources, GCMD Keywords, Special Measured Parameters, and lat/lon coordinates. The facets changed over time based on the metadata extracted from ingesting the various data resources. RDESC RDF Graphs An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest An example description of a GCMD dataset as a RDF graph, using the initial ontology. An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest The current ontology. Ovals represent classes/concepts, and arrows indicate subClassOf relationships. Classes are colored so that darker classes were established in the ontology prior to lighter classes. No matter what dataset we have ingested we will be able to present the metadata in search and browse interface, like S2S above, and provide splash pages for each dataset with the information retrieved from the external system. Conclusion we have emphasized the importance that data publish- ers provide their (meta)data in a way that makes structural and semantic integration a natural process. This is accomplished by following a shared vocabulary of terms embodied as an ontology, and by expressing metadata as RDF triples that utilize the ontology. Although this can sound daunting, we showed that doing so is actually quite easy in practice (section 5). We demonstrated the flexibility of this approach by curating existing metadata into the recommended format. Publishing (meta)data in this (or a similar) way will ameliorate (at least in part) the poor data sharing practices that currently pervade the practice of science

More Related