1 / 28

Web Archiving Challenges and Opportunities Presentation for Web archiving Engineering position

Web Archiving Challenges and Opportunities Presentation for Web archiving Engineering position. Ahmed AlSum PhD Candidate Old Dominion University. Outline. Engineer What I did Web Archive What I know What I did What I can do for SUL. CCSP Project.

alaura
Télécharger la présentation

Web Archiving Challenges and Opportunities Presentation for Web archiving Engineering position

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Archiving Challenges and OpportunitiesPresentation for Web archiving Engineering position Ahmed AlSum PhD Candidate Old Dominion University

  2. Outline • Engineer • What I did • Web Archive • What I know • What I did • What I can do for SUL

  3. CCSP Project • It is an internal IBM support portal that provides client-facing audiences a by-client, holistic view of client situations. • Technologies: The project depends on IBM technologies, WebSphere Portal, DB2, and deployed on zLinux machines

  4. Responsibilities: • Software Engineer. • Administrator on production and staging. • Customer support team lead. • Software engineer team leader.

  5. Developing Enterprise Applications with J2EE platform technologies for frontend (Servlets, JSP, Portlet APIs), and the support for backend tasks based on EJB. • Lotus Sametime developer for both Plugins and Bot development. • Development front-end components based on Web 2.0 technologies (AJAX based on dojo 1.0, and Java Script). • Developing and deploying Portal solutions on WebSphere Portal. • WebSphere Portal Administration on for standalone and clustered environment. • Administration on Linux and Windows OS. • DB2 server’s administration for single instance and multiple instances with HADR support. • Leading the customer support activities. • Support in some project quality activities. • Code review and static analysis activities.

  6. Certifications: • IBM Certified System Administrator, IBM WebSphere Portal V6.0. (May. 2008) • IBM Certified Solution Developer, XML and Related Technologies. (Since Mar. 2008) • IBM Certified Solution Developer, IBM WebSphere Portal V6.0. (Since Feb. 2008) • Sun Certified Web Component Developer for the Java 2 Platform, Enterprise Edition 1.4 (Since Jan. 2008). • Sun Certified Programmer for the Java 2 Platform, SE 5.0, (Since March 2007). • IBM Rational Software Certified, RAD 6.0 Associate Developer (Since Apr. 2006) • Microsoft Certified Professional in Designing and Implementing Desktop Applications with Microsoft® Visual C++® 6.0. (Since Sep. 2002)

  7. Memento • Memento is an extension for the (HTTP) to allow the user to browse the past web as the current web. Now T1 T2 T3 I. Jacobs and N. Walsh. Architecture of the world wide web. Technical report, W3C, 2004. http://www.w3.org/TR/webarch/.

  8. Memento • Memento Aggregator • Developer and Adminstartor Aggregator

  9. Memento • Memento Client • MementoFox: Firefox addon • mcurl: command line in Perl • Both of them have been implemented based on Memento internet draft 8.0.

  10. WAT Extraction • Web Archive Transformation (WAT) is a specification for structuring metadata generated by Web crawls. • Technologies: Hadoop, PigLatin, JAVA.

  11. Challenges and Opportunities Web Archiving

  12. Web Archive Life Cycle Hockx-Yu, H., 2011. The Past Issue of the Web. In Proceedings of 3rd International Conference on Web Science. pp. 1–8.

  13. Selection • Decide what to capture • We studied what is already captured.

  14. How Much of the Web is archived? • Tell me what is your URI source!! • S. G. Ainsworth, A. AlSum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. “How much of the Web is Archived?” In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL '11, Ottawa, Canada. 2011.

  15. Where is it archived?

  16. What is missing?

  17. Selection • Curator • TwitterCrowdsource: • UK Web archive: Twittervana. • Internet Memory: Collect URIs from twitter APIs. • VA Tech: CTRNET project.

  18. Web Archive Life Cycle Hockx-Yu, H., 2011. The Past Issue of the Web. In Proceedings of 3rd International Conference on Web Science. pp. 1–8.

  19. Harvesting • Services • Archive-It • WAS @ CDLib • Dedicated server • Heritrix

  20. Harvesting • Challenges • Ajax and Web 2.0/3.0 • Streaming Media • URI challenges (i.e. twitter hash-bang) • Mobile

  21. Harvesting • SiteStory - Transaction Archive Justin F. Brunelle, Michael L. Nelson, Lyudmila Balakireva, Robert Sanderson, Herbert Van de Sompel, Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool, Proceedings of TPDL 2013.

  22. Web Archive Life Cycle Hockx-Yu, H., 2011. The Past Issue of the Web. In Proceedings of 3rd International Conference on Web Science. pp. 1–8.

  23. Storage • Flat files: • WARC files (ISO standard) • No-SQL db: • Internet memory

  24. Storage • Wrong solution could be a disater

  25. Access

More Related