1 / 29

(Linked) Data Curation challenges

(Linked) Data Curation challenges. Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk Kevin.ashley@ed.ac.uk. Reusable with attribution: CC-BY. The DCC is supported by Jisc. Acknowledgements. John Wilkins & Cameron Neylon Ideas, images, slides, inspiration.

hilda
Télécharger la présentation

(Linked) Data Curation challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc

  2. Acknowledgements • John Wilkins & Cameron Neylon • Ideas, images, slides, inspiration Kevin Ashley – CC-BY

  3. Data views and processes • Administration • Discovery • Work-level description • Discipline-level interpretation Kevin Ashley – CC-BY

  4. Administrative view Data produced by the department of linguistics Data from projects funded by NERC Kevin Ashley – CC-BY

  5. Discovery view Data about reproductive behaviour in freshwater fish Kevin Ashley – CC-BY

  6. Work-level description Kevin Ashley – CC-BY

  7. Kevin Ashley – CC-BY

  8. Kevin Ashley – CC-BY

  9. Data is variable • Not always textual • Not always tabular • Not always fixed • Not always clearly authored – think of archival provenance • Not always associated with publication Kevin Ashley – CC-BY

  10. 95% of research results are never published Kevin Ashley – CC-BY http://www.flickr.com/photos/sethw/113073189/

  11. If a million postdocs repeat a million experiments… Kevin Ashley – CC-BY http://flickr.com/photos/heymans/480396810/

  12. And 25% of those don’t work… Kevin Ashley – CC-BY http://flickr.com/photos/cliche/120070310/

  13. …how much taxpayer’s money is that? Kevin Ashley – CC-BY http://flickr.com/photos/luismimunoznajar/2093185804/

  14. I need that data now!!! I don’t care how messy it is – I can fix it! I’ve wasted too much of my life fixing other’s people’s bad data. I’m not interested until you’ve cleaned it up and documented it. Besides, I have other things to think about Kevin Ashley – CC-BY

  15. Grandfather’s axe When is my dataset a new dataset? coconinoco@flickr.com CC-BY-NC-SA Kevin Ashley – CC-BY

  16. Authorship • Reference data – cell-level provenance versus single author data table • ‘Cleaned’ data – can pass through many hands • Synthesis… Kevin Ashley – CC-BY

  17. Kevin Ashley – CC-BY

  18. Kevin Ashley – CC-BY

  19. Potential wins • Provenance of machine-gathered data – linking observations to instrument descriptions • Linking data in multiple places • Data and publications and plans • Robust assertions about data versioning • Association of data with institutions Kevin Ashley – CC-BY

  20. networks of people… Kevin Ashley – CC-BY

  21. Kevin Ashley – CC-BY

  22. More wins • Assertions at table and variable group level • Linking that crosses disciplinary boundaries: • Biochemistry and neuroscience • Naval history, economics and climate science • Linking that crosses research and administrative boundaries Kevin Ashley – CC-BY

  23. IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases After John WIlbanks Kevin Ashley – CC-BY

  24. Tylenol N-acetyl-p-aminophenol Acetaminophen SameAs Paracetamol N-(4-hydroxyphenyl)ethanamide N-(4-hydroxyphenyl)acetamide Kevin Ashley – CC-BY

  25. “I never had an idea that couldn’t be improved by sharing it with as many people as possible…” BillHooker (2006)http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html Kevin Ashley – CC-BY

  26. Kevin Ashley – CC-BY

  27. Kevin Ashley – CC-BY

  28. Kevin Ashley – CC-BY

  29. Challenge? Opportunity • Linked data can improve administration of research and research data • The real potential is in improving research quality and efficiency • The same actors can’t do both • The actions don’t need to be in lock-step Kevin Ashley – CC-BY

More Related