1 / 10

The Hiberlink Project is supported by the Andrew W. Mellon Foundation

The Hiberlink Project is supported by the Andrew W. Mellon Foundation. Hiberlink – Towards Time Travel for the Scholarly Web. Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @ azaroth42 Herbert Van de Sompel hvdsomp@gmail.com

thelma
Télécharger la présentation

The Hiberlink Project is supported by the Andrew W. Mellon Foundation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Hiberlink Project is supported by theAndrew W. Mellon Foundation Hiberlink – Towards Time Travelfor the Scholarly Web Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @azaroth42 Herbert Van de Sompel hvdsomp@gmail.com @hvdsomp http://www.hiberlink.org/ http://www.mementoweb.org/

  2. Hiberlink Project and Partners • LANL • Herbert Van deSompel • Rob Sanderson • Martin Klein • EDINA • Peter Burnhill • Christine Rees • MurielMewissen • Tim Strickland • Neil Mayo • U. Edinburgh • Claire Grover • Beatrix Alex • Richard Tobin • Adam Zhou Two year project funded by Andrew W. Mellon Foundation

  3. Problem Statement • Preservation of formal scholarly output is (relatively) well understood. • Preservation of the resources that make up the context for that research is not: • Datasets • Software • Workflows • Videos, Slides • Project and Demonstration web sites • AJAX • …

  4. Pilot Study To what extent are web resources that are referenced from works in repositories still available at their original URL … or from archives of web resources? • Participants: LANL, UNT, arXiv • Paper: http://arxiv.org/abs/1105.3459 • Contributions: • Much larger scale than any previous study, 162,052 unique URLs • Automatically searched multiple archives for all URLs, rather than manually for a small subset

  5. Pilot Study: Results UNT • 72% in archives and/or still exist • High proportion of archived URLs, possibly due to academic level and general disciplines • 78% in archives and/or still exist • 45% still exist, but not archived!Possibly due to high value, but very discipline specific references arXiv

  6. Hiberlink: Quantify Full Extent of the Problem To what extent are web resources that are referenced from works in repositories still available at their original URL …or from archives of web resources? • Redo the same experiment with… • Even larger dataset with millions of papers and URLs • Text mining processes for URL extraction • Track location of URL (citations, footnote, text, etc) • Evaluation of extraction via gold standard dataset • Determine type of resource referenced • Track type of publication (journal, thesis, report, etc)

  7. Hiberlink: Propose Solutions (1) We propose two active archiving solutions of resources referenced from scholarly papers to ensure that the scholarly record remains unbroken • 1. Active Crawling: • Run extraction routines at repositories, publishers, or third parties via text mining agreements or open access publications • Feed the URL seed list to existing web crawlers, such as the Internet Archive • IA (and others) already Memento compliant

  8. Hiberlink: Propose Solutions (2) • 2. Transactional Archiving: • Willing server forks responses for resources and sends to both browser and to archive for preservation

  9. Summary • 2011 pilot study showed: • Significant problem! • Random archiving by web crawlers is not enough • Hiberlink project will: • Fully quantify the extent to which web resources that form the context of scholarly output are available and archived • Propose active solutions to prevent the loss of further resources • Use Memento for both research and access

  10. The Hiberlink Project is supported by theAndrew W. Mellon Foundation Hiberlink – Towards Time Travelfor the Scholarly Web Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @azaroth42 Herbert Van de Sompel hvdsomp@gmail.com @hvdsomp http://www.hiberlink.org/ http://www.mementoweb.org/

More Related