1 / 26

Joining the Dots Managing and identifying geolocated data by DOIs and IGSNs

Joining the Dots Managing and identifying geolocated data by DOIs and IGSNs. Jens Klump | OCE Science Leader Earth Science Informatics. 20 August 2014. Mineral Resources Flagship. A few words to introduce myself.

asta
Télécharger la présentation

Joining the Dots Managing and identifying geolocated data by DOIs and IGSNs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joining the DotsManaging and identifying geolocated data by DOIs and IGSNs Jens Klump | OCE Science Leader Earth Science Informatics 20 August 2014 Mineral Resources Flagship

  2. A few words to introduce myself ... 1992 – 1995 B.Sc. in geology and in oceanography, Univ. Cape Town, South Africa. 1995 B.Sc. (Honours) in geology (exploration geochemistry) from Univ. Cape Town, South Africa. 1996 – 1999 PhD in marine geology (biogeochemistry) from Univ. Bremen, Germany. 1999 – 2000 training in application and database development, project management. 2000 – 2001 IT project manager for DIE ZEIT (weekly newspaper, Hamburg, Germany). 2001 – 2014 senior research scientist at the German Research Centre for Geosciences GFZ, Potsdam, Germany. Since March 2014 CSIRO OCE Science Leader Earth Science Informatics. Joining the dots | Jens Klump

  3. Previous Work • Supporting the research data value chain • Understanding data management in geoscience research • Development of project and enterprise research data solutions • Development and implementation of persistent identifiers (DOI, IGSN) • Integration of data from heterogeneous sources • Information models to describe data and processes • Semantic technologies for data interoperability • Adoption of new technologies • Studies on HPC, visualisation, 3D printing, internet of things • Application of information technology to geosciences • Sensor web enablement in environmental monitoring networks and in the laboratory, • Data driven research on natural gas hydrates Joining the dots | Jens Klump

  4. DOI: Data Publication and Citation Making data part of the record of science

  5. HTTP Error 404 Joining the dots | Jens Klump

  6. History of DOI • “Link rot” was recognised as a problem early on and led to the development of the handle system of persistent identifiers in 1995. • DOI proposed 1997 and in production since 1998. • First DOI for data minted 2004 in the context of DFG project. • A business model had to be found to expand DOI for data to an international scale. • DataCite founded in 2009. • 31 members at present, 3.6 M datasets registered (1.2 M in last 12 months) • Total journal publications was estimated at 1.8 M articles for 2012. • Some of the data sets are really fine grained. Joining the dots | Jens Klump

  7. Data in publications http://dx.doi.org/10.1594/GFZ.SDDB.1043 Joining the dots | Jens Klump

  8. Access to data • Description • Citation • Related materials • Download data • Download metadata • ISO19115 • NASA DIF • DataCite • eSciDoc http://dx.doi.org/10.1594/GFZ.SDDB.1043 Joining the dots | Jens Klump

  9. DOI for data • Resolution  • Resolution from DOI to URL provided by Handle service. • Granularity? • What is the smallest identifiable object? • Identity? • What exactly is identified by a DOI? • Versioning? • Updates, corrections, errata … • Time series? • Continuing time series from environmental monitoring Joining the dots | Jens Klump

  10. The Ship of Theseus Paradox Year 1 Year 2 Change one plank Year 3 Change one plank Year n Change one plank Joining the dots | Jens Klump

  11. The Ship of Theseus Paradox Year n Collected planks Year 1 Year 2 Change one plank Year 3 Change one plank Year n Change one plank Joining the dots | Jens Klump

  12. The ship of Theseus Paradox Can any object be identical with another object? Is it the equivalent object we are looking for? What is represented by the identifier? Formally the Ship of Theseus Paradox can be approached by introducing the concept of perdurantism. The perdurantist view is that an individual has distinct temporal parts throughout its existence. Perdurantism is usually presented as the antipode to endurantism, the view that an individual is wholly present at every moment of its existence Joining the dots | Jens Klump

  13. Single item Joining the dots | Jens Klump

  14. Appended time series Joining the dots | Jens Klump

  15. Updated item Joining the dots | Jens Klump

  16. Snapshots Joining the dots | Jens Klump

  17. Collection Joining the dots | Jens Klump

  18. Publication of Geodata doi:10.1594/GFZ.SDDB.1202 Joining the dots | Jens Klump

  19. Repositories vs. Services • How should data identifies by DOI be disseminated? • File based: • Generic, close to original record of science, OAIS compliant. • Limited for use by user agents (machines), often requires manual interventions. • Services: • Machine friendly, use can be automated. • Storage not OAIS compliant. • File based data can be transformed into services. Joining the dots | Jens Klump

  20. IGSN: International Geo Sample Number Connecting Geology to the Internet of Things

  21. Internet of Things “The Internet of Things refers to uniquely identifiable objects (things) and their virtual representations in an Internet-like structure.” Joining the dots | Jens Klump

  22. Internet of Things • Specimens are a basic unit for Geoscience observations. • basic unit in data reporting. • basic unit for data discovery, access, and analysis. • Access to information about the samples is essential for evaluation and interpretation of specimen-based data. • Access to physical specimens allows to build more comprehensive datasets and facilitates re-use of resources. • No standard way to access information about specimens • Few online repository catalogues • Few disciplinary catalogues (e.g. Index of Marine & Lacustrine Geological Samples, IODP) • Incomplete specimen metadata in publications – if any. Joining the dots | Jens Klump

  23. Why do we need identifiers for specimens? Locations of rock specimens in EarthChem called “M1”. Joining the dots | Jens Klump

  24. Globally Unique Identifiers • Verification of literature data without GUID for data and drill holes or samples required in-depth knowledge of the organisational structures of ocean drilling. • Data were available, but difficult to find. • Search involved PANGAEA and SEDIS (IODP). Joining the dots | Jens Klump

  25. Literature, Data, Samples doi:10... Search: ... doi: ... doi:10.1594/... doi:10... IGSN hdl: ... doi:10.1594/... doi:10.1594/... Joining the dots | Jens Klump

  26. Why not use DOI for specimens? • DOI could be used for specimens. • Remember, it’s a digital identifier for objects, not only digital objects. • Historically, TIB Hannover declined to register DOI for specimens on formal grounds. This was prior to DataCite. • The use case of dealing with physical specimens called for a different set of rules even though structures are similar to DataCite. • Based on the Handle system, IGSN can easily be merged with DataCite in the future. Joining the dots | Jens Klump

More Related