1 / 53

Memento mementoweb/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson

Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson. Giant Leaps Towards Seamless Navigation of the Web of the Past. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding

Télécharger la présentation

Memento mementoweb/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memento http://mementoweb.org/ Herbert Van de SompelRobert Sanderson Michael L. Nelson Giant Leaps Towards Seamless Navigation of the Web of the Past

  2. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  3. Overview of Memento Framework Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  4. Memento wants to make it easy to access the Web of the Past.

  5. Tate Online Today Select Date March 16 2008 Tate Online March 16 2008 From National Archives

  6. Tate Online Today Select Date March 16 2008 Tate Online March 16 2008 From National Archives Dynamic Static

  7. Memento achieves this by introducing a uniform version access capability to integrate the present and past Web.

  8. Content Management Systems: • Designed to be aware of all versions of a resource • Self-contained • Variety of proprietary version mechanisms • Versions interlinked using proprietary mechanisms • Dynamism is managed

  9. World Wide Web: • Designed to forget about prior versions of a resource • Distributed • Dynamism from a management perspective is ignored

  10. There are resource versions on the Web: • Content management systems • Web archives • Transactional archives • Search engine caches

  11. But the Web architecture has a hard time dealing with them: • Cannot talk about a resource as it used to exist • Cannot access a prior version knowing the current one • Cannot access the current version knowing a prior one • Current approaches are ad hoc and localized

  12. Memento: • Regards the Web as a big Content Management System • Introduces a uniform capability to access versions on the Web • Does not build new archives but leverages all systems that host versions: Web archives, Content Management Systems, Software Version Systems, etc.

  13. Memento’s version access approach: • Is distributed: versions may exist on several servers • Uses time as a global version indicator • Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link

  14. Original Resource and Versions

  15. Bridge from Present to Past

  16. Bridge from Past to Present

  17. Memento Framework

  18. Multiple Archives

  19. Memento Client-Server Interaction

  20. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  21. Significant progress has been made towards seamless navigation of the Web of the Past.

  22. Standardization • Standardization process started via the IETF • Interest from IETF and W3C • Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas https://datatracker.ietf.org/doc/draft-vandesompel-memento/

  23. Memento Clients • Several client tools developed by us and others • Add-ons for FireFox (operational) and Internet Explorer (experimental) • Applications for Android (operational) and iPhone/iPad (in development) • Paper in next issue ofCode4Lib Journal http://www.mementoweb.org/tools/

  24. Memento Server Support (1) • Memento-compliant Wayback software: • Used by Internet Archive • Available to Web archives, worldwide • Please have your favorite Web Archive install this new version 1.6! http://www.mementoweb.org/tools/

  25. Memento Server Support (2) • Plug-in for MediaWiki (operational) • Used on W3C’s main wiki • Please install it for your MediaWiki! http://www.mementoweb.org/tools/

  26. Memento Server Validator • Server side client: • Attempts to perform all Memento actions against a given URI • Reports success/failure of the interactions and warnings for optional aspects • Kept up to date with IETF Internet Draft http://www.mementoweb.org/tools/

  27. Memento Proxy Support • Several systems that host Mementos made Memento-compliant “by proxy”: • All major Web Archives that do not yet run Memento-compliant Wayback software • 3,000+ MediaWiki systems, including Wikipedia • We want all of these to become natively Memento compliant!

  28. Memento Website • Ongoing effort to add materials that support understanding and adoption: • Introduction to Memento • How to recognize Mementos, TimeGates, Original Resources? • Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.) http://www.mementoweb.org/guide/

  29. Funding • 2007-2010: US $250K grant from Library of Congress • Approx. 50K on Memento • 2010-2011: US $1 Million follow-up grant from Library of Congress • For: Specification, outreach, tool development, further research

  30. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  31. Memento Time Travel is really powerful. Time-Series Data via HTTP follow-your-nose.

  32. Memento Framework

  33. Time Series for Humans Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png

  34. Time Travel across versions of a Picture of the Day Data collected through HTTP Navigation

  35. Thanks Christine! Data Process Reproducibility change time time time But if we had static, discoverable snapshots of the data and the process…

  36. Time Series for Machines Original Resource: http://dbpedia.org/resource/France

  37. Time Travel across versions of DBPedia Data collected through HTTP Navigation paper at http://arxiv.org/abs/1003.3661

  38. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  39. Very few Web sites provide a “timegate” link. Need additional mechanisms to support Discovery.

  40. Batch discovery of Mementos: TimeMaps • A TimeMap minimally lists: • URI and datetime of Mementos known to an archive • URI of Original Resource • TimeMaps can be aggregated across systems that host Mementos

  41. Batch discovery of Mementos: Feed of TimeMaps • System that host Mementos exposes Feed (e.g. Atom) of TimeMaps to allow applications to remain in sync with its evolving Memento collection: • One Atom entry per Original Resource for which system hosts Mementos • The entry provides a “timemap” link to a TimeMap for the Original Resource • The datetime value of the updated field of the entry changes when additional Memento for Original Resource becomes available (i.e. TimeMap changes) • The ID of the entry is a tag URI based on URI of Original Resource Will be proposed to IIPC

  42. Batch discovery of Mementos: robots.txt • robots.txt file is used by Web servers to convey crawling policies • Add a directive to support discovery of Mementos known to the server: • Pointer to a single Memento can suffice as the robot can crawl on from there • Mementos allow for discovery of TimeMaps via HTTP links • e.g. jcdl.org hosts snapshot archives of prior JCDL conferences and adds the following to its robots.txt • Memento: jcdl.org/archive/2002/index.html Will be promoted via Internet Draft

  43. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  44. Memento can recreate pages using resources from different archives. This poses a branding challenge.

  45. Current Branding Practice for Web Archives Page and embedded resources from same Web Archive Branding for page and embedded resources

  46. Branding for Web Archives in Memento Mode Page and embedded resources from various Web Archives Page branding No branding No branding Will be researched

  47. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

  48. Crawl-based Archives host distinct observations. Transactional Archives never miss an update.

  49. Crawl-Based Web Archives Observations For example: Heritrix crawler for Internet Archive

  50. Server-Side Transactional Web Archives Change History For example: TTApache, PageVault, Vignette Web Capture

More Related