1 / 29

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 5 A research perspective on Digital Libraries. DL Ancestry. URLs to some of these DLs. ADS: http://adswww.harvard.edu/

Patman
Télécharger la présentation

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 5 A research perspective on Digital Libraries

  2. DL Ancestry

  3. URLs to some of these DLs ADS: http://adswww.harvard.edu/ NCSTRL: http://www.ncstrl.org UCSTRI: http://www.cs.indiana.edu:800/cstr/cover.html arXiv: http://arXiv.org LTRS: http://techreports.larc.nasa.gov/ltrs/ NTRS: http://techreports.larc.nasa.gov/cgi-bin/NTRS

  4. DL Architectural Review Assumptions made in this perspective • things start with TCP/IP connectivity • distribute full content (reports, software, etc.) • not only metadata

  5. DL Architecture History approach1 1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only • pros: rich functionality • cons: high development cost, client distribution problem • observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!

  6. DL Architecture History approach2 2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc. • con: less functionality (restricted by protocol) • pros: less development cost, uses commonly available clients • observation: this approach is now the most common • The ones listed on slide 2 fit into this category

  7. Early TCP/IP DLs • a very old one: IETF: http://www.ietf.org/ • Internet RFC’s • Very first TCP/IP DL?

  8. Early TCP/IP DLs • Netlib • http://www.netlib.org/ • begun in 1985, distributing mathematical software via e-mail (SMTP) • other access methods and protocols added (ftp, X11 client, http)

  9. Netlib 1995

  10. Netlib 2001

  11. Los Alamos arXiv • Physics pre-print server • http://xxx.lanl.gov/ == http://arXiv.org • begun in 1991 as an e-mail service to exchange TeX source of pre-prints in high energy physics • ftp, http access added shortly • Now THE communication channel in Physics • Paul Ginsparg

  12. Characteristics of early TCP/IP, non-HTTP DLs • Useful • could get the “thing” that you were looking for • Constrained by transport protocol • SMTP, FTP, etc. interface inherently “clunky” • Higher level services such as searching, sophisticated browsing, etc. difficult to implement • Small scale • would the same systems work well if the holdings went from 100’s or 1000’s to millions?

  13. Characteristics of early TCP/IP, HTTP DLs • Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement • a “nice” ftp interface, maybe with better searching and browsing • but the nature of the DLs changed little • LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing • http://techreports.larc.nasa.gov/ltrs/ • Also check out user interface of http://arXiv.org

  14. Early TCP/IP, HTTP DLs • But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it • Combine this with the expressive HTTP client (web browser), and there is a lot of potential • Dienst • (http://www.ncstrl.org/Dienst/htdocs/Info/protocol4.html) • builds an actual DL protocol on top of HTTP • 1994 -- the first to do so? • Open Archives Initiative: metadata harvesting protocol on top of HTTP

  15. Sophistication increases, tracks meet library automation track sophistication research track http Dienst http LTRS, e-print, Netlib, etc. ftp / gopher e-mail time

  16. A Framework for Distributed Digital Object Services Kahn/Wilensky Framework [Kahn 1995] • 1995 • A high level document • Almost a definition of key concepts, terminologies, … for next generation DLs • Foundation for a research discipline? • Not detailed enough to be a real architecture. • Architecture is independent of the type of data stored in the DL

  17. KWF: key terms • digital object (do) • A do is a data structure that contains • Digital data; data is typed (cf MIME) • Persistent Key Metadata; especially handle • Other metadata (for instance Terms and Conditions) • handle • a handle is a unique, persistent name for a do • repository • The place where do’s live • Has unique global name • Repository Access Protocol (RAP) • To deposit/access do’s in repositories

  18. makes a Data which consists of Transaction record per do handle comes from a handle generator • Key-Metadata • handle at which point the do becomes a stored do which can go in a repository Properties record per do • Key metadata: handle • Other metadata: • Terms and conditions Repository which registers the do’s handle with a handle server Accesses/Deposits the do in repositories by means of the Repository Access Protocol What the client receives as a result of an access to a do is a dissemination. Handle Server at which point the do becomes a registered do client KWF: flow Originator digital object

  19. Digital objects • do = data + key-metadata • data is typed; core types include: • bit-sequence / set-of-bit-sequences • digital-object / set-of-digital-objects • handle / set-of-handles • other types can be defined, and registered with a global type registry • definition and registration left undefined • ~ similar to MIME • key-metadata includes handle • possibly other metadata (left undefined in KWF)

  20. Digital objects • Composite do’s: • a do with data of type digital-object • non-composite do’s are elementaldo’s • composite do’s can – for instance -- be used to collect similar works together • composite do than contains a do for each work of Shakespeare...

  21. Changing digital objects • Mutabledo’s can be changed once placed in a repository • key-metadata cannot be changed • the do’s handle does never change! • Immutabledo’s cannot be changed once placed in a repository • however, they can be deleted

  22. Handles • Guest lecture by Professor Arms 02/19

  23. Repositories • A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval • A storeddo is a do that resides in a repository • A registereddo is a do that the repository has registered with a handle server • storing and registering can be the same or different processes

  24. Repositories • A repository keeps a properties record for each do • contains key-metadata and any other metadata the repository chooses to keep • A do may have a transaction record associated with it in a repository

  25. Repository Access Protocol • “Protocol” may be misleading, its really just the concept for a protocol • RAP is designed to be simple; higher level services should come from other protocols • KWF defines 3 basic operation classes: • ACCESS_DO [metadata; key-metadata, digital object] • A dissemination of a do is the result of a request to access a do • DEPOSIT_DO [metadata; key-metadata, digital object] • ACCESS_REF • this is a means to tell the world about other ways (protocols) to access do’s in the repository.

  26. Terms and Conditions • TC are attached to: • each do • each dissemination • each repository • TC are a precondition for any operation on the above • Repositories responsible for enforcing TC

  27. Terms and Conditions 1 1 terms and conditions repository 1 N 1 1 digital object dissemination 1 1 1 1 1 1 1 1 terms and conditions data terms and conditions data Figure 1 from 95 TR-1593

  28. Digital Objects: Terms and Conditions • Set by originator and/or repository • Can be arbitrarily complex, but generally consist of: • permissions: read, write, etc. • authentication - person, group, etc. • payment • 3rd party intervention (possibly in support of the above)

  29. Readings • Kahn, R. & Wilensky, R. 1995. A Framework for Distributed Digital Object Services • http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k-w.html • Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. http://www.dlib.org/dlib/July95/07arms.html • Marc VanHeyningen. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. http://www.cs.indiana.edu/ucstri/paper/paper.html

More Related