1 / 41

The Web Archiving Service

The Web Archiving Service. and the Web-at-Risk NDIIPP Project. Tracy Seneca California Digital Library. National Digital Information Infrastructure Preservation Program Library of Congress. California Digital Library. New York University. University of North Texas. Overview.

joannad
Télécharger la présentation

The Web Archiving Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Web Archiving Service and the Web-at-Risk NDIIPP Project Tracy SenecaCalifornia Digital Library National Digital Information Infrastructure Preservation ProgramLibrary of Congress California Digital Library New York University University of North Texas

  2. Overview • Web archiving: what & why • Web-at-Risk grant: scope & purpose • Web Archiving Service Sample Screens

  3. Web archiving: what & why

  4. “Web Archiving”: Assumptions • Using automated methods to gather web content • Building some kind of collection composed of more than one site • Intent on preserving captured content • Results are searchable • Public access may not be available

  5. How is the material at risk? • Vulnerability of • Digital publications • Web publications • Government web publications • Local government web publications

  6. The Ephemeral Web

  7. Issues Unique to Government and Political Web Documents • Publication & notification streams • Elections, political change • Security vs. freedom of information • Local agencies often don’t have the resources to archive their own publications

  8. Web-at-Risk grant: scope & purpose

  9. Grant ScopeJan 2005 – Jun 2009 • Build tools to allow librarians to capture, curate and preserve web-based government and political information. • Create topical and event-based archives • Capture individual sites and documents • Assess the impact of these tools on traditional collection development practices. • Explore web archiving service sustainability.

  10. Project Partners

  11. Web-at-Risk Collections

  12. Beyond the Grant • Support web archiving for the University of California • Enable collaboration across campuses • Enable collaboration between librarians and researchers/faculty

  13. Web Archiving Service (WAS) • Tangible outcome of grant work • Being developed and release over a series of pilot tests • Pilot test 5 underway until May 23 • 2008-2009 develop rights management and public access features

  14. WAS Production • Early summer 2008, Web Archiving Service goes into ‘limited’ production. • Available 24/7 to the curators who have taken part in the pilot tests so far • Expand user community within UC as CDL confirms that WAS infrastructure, user support and training is sufficient.

  15. Web Archiving ServiceWorkflow and Sample Screens

  16. WAS workflowProject > Site > Capture > Collection • Set up a project (usually a topic or event) • Define the sites to capture • Run single or multiple captures of each site • Choose which results to add to a single, searchable collection

  17. Capture sites individually

  18. Set Frequency

  19. Add metadata (or not)

  20. Sites can be captured in batches

  21. When Capture Finishes

  22. Display Results(QA capture effectiveness)

  23. Display Results: Overview & Reports

  24. Display Results: Full Text Search

  25. Display Results

  26. Display Results(metadata)

  27. Create Collection

  28. Build Collection(add entire captures)

  29. Build Collection

  30. WAS features for analysis • It’s impossible to know what a web site ‘contains’ until after you capture it! • Tools for understanding where the data comes from and how it has changed.

  31. What’s the nature of this content?

  32. What new publications are in this capture?

  33. Build Collection(Select files from “Compare” screen)

  34. How volatile is this site?(Not yet available)

  35. Potential • We can now capture the “chit chat” – the popular reaction to historic events, in ways never before possible. • How will researchers interact with captured content once it is in an archive? • Visualization • Text analysis • What is the potential, beyond simple search and display?

  36. Web Archive VisualizationDoantam Phan – Stanford University

  37. Questions? Web-at-Risk Wikihttp://wiki.cdlib.org/WebAtRiskYou Tube Video: “Web-at-Risk Collections” tracy.seneca@ucop.edu

More Related