260 likes | 458 Vues
The Digital Library: Current Technologies and Challenges. William H. Mischo w-mischo@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign SLA Global 2000 October 18, 2000. Outline. Definition of Digital Library. Elements of a Digital Library.
E N D
The Digital Library: Current Technologies and Challenges William H. Mischo w-mischo@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign SLA Global 2000 October 18, 2000
Outline • Definition of Digital Library. • Elements of a Digital Library. • Full-Text Document Technologies. • Illinois Testbed. • XML: its Role and Importance. • Distributed Repository model. • Role of Libraries and Librarians.
The Digital Library • ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. • Implementation issues. • Digital Collections vs. Digital Library. • Must Emphasize Integration of Collections and Services.
Elements of DL • Collections. • Services. • Technologies and Standards. • Integration of All.
Full-Text Technologies • Continuum of Web-Enabled Technologies. • Evolving Technologies and Standards. • All Presently being Utilized. • Role and History of Markup. • XML: its Role and Importance. • The Smart Document.
Illinois DLI-I Project • Funded under DLI-I by NSF, DARPA, and NASA, 1994--1998. Awards made to 6 universities. • Large-Scale Testbed, Distributed Repository Models, Evaluation, Web Software. • CNRI D-Lib Test Suite Program, 1998—2001. • Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier.
Illinois Testbed • American Institute of Physics--APL, JAP, RSI • 16,000+ articles, 1995--. • American Physical Society--PRL • 10,000+ articles, 1995--, weekly updates. • ASCE Journals (25 titles) • 9,000+ articles, 1995--. • IEE Proceedings and Electronics Letters • 8,500+ articles, 1993--. • ASM (American Society for Materials) Handbook. • ACM (Association for Computing Machinery). • Elsevier Science.
Project Issues • Evolution of the Document. • Information Environment. • Use of Metalanguages & Transformations (SGML, XML). • Searching over Full-Text of Journals vs. Abstract & Index Service Database. • Rendering and Styling (SGML, XML, MathML). • Dynamic Metadata for Normalization, Linking. • Breadth and Depth of Collections. • User Needs.
Accomplishments • Process & Retrieve from Multiple Publishers & Heterogeneous DTDs. • Cross-Repository Searching. • SGML to XML Conversion. • Metadata Extraction, Representation, Merging. • Transformation & Rendering Technologies. • Dynamic Linking: Forward/Backward, from/to A & I Services.
Ongoing Investigations • Support simultaneous searching of A & I Services, Distributed Repositories, enhanced navigation, expanded gateway functions. • Metadata Harvesting: Replicative or Distributed Approaches. • Z39.50 protocols, HTTP Harvesting, Spider Technology. • Archiving of Electronic Resources. • Local Resolution of Resources.
XML (eXtensible Markup Language) • Subset of SGML, a Data Description Language (Metalanguage). • Allows fine-granularity markup of content and structure. Author can create their own elements (extensible). • Tags define the Structure of Document not Presentation Format. • Two types of valid XML: well-formed document structure without DTD and well-formed with validating DTD. • Displays natively only in IE 5.0 and Netscape 6.0. • Powers B2B, compatible with Relational DBs.
Role of XML • “If you ask 20 people in the industry, ‘what is XML?’ You’ll get 20 different answers – Dale Fuller, CEO, Inprise Corporation. • Vendor-Neutral, Platform-Independent Structured Information Standard. • Document Representation and Interchange Standard. • Applications can externalize their data as XML. • XML data, CSS presentation layer, XSL to modify the structure of the document.
Distributed Repository Model • Information Environment in which we Operate. • Web-Based and Publisher-Centric. • Multiple Relationships and Nodes. • Need for Gateway and Navigation Tools. • Need for Integration, Linking. • Publisher Repository approaches to Retrieval. • A and I Service Issues.
Distributed Repository Issues • Integration of discrete publisher repositories, local and remote A & I services, OPAC, Web resources, and local data within gateway and navigation tools. • Issues for user access: • need to identify appropriate publisher repository, but presently interfaces are different and full-text and controlled vocabulary searching often not offered. • A & Is: not full-text but offer controlled vocabulary, no links to full-text repositories.
Distributed Repository Search • Needed feature set: • A & Is: need links to full-text at article level via Digital Object Identifier (DOI), vocabulary switching within controlled vocabularies. Will we see consolidation of A & I services? Other information providers? PubMed/PubRef, PubSCIENCE (DOE/OSTI) • Publisher metadata repository for central searching; deposit metadata in conjunction with DOI. • Browser technology that fully incorporates XML, CSS.
Digital Object Identifier (DOI) • DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. • ‘The ISBN for the 21st Century’ -- Norman Paskin. • DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. • Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.
DOI Construction • First open standard for content identification. • DOI is a number that identifies a digital object: • 10.1063/S000369519903216 • 10 Registration Agency Prefix • 1063 Publisher Prefix • S000369519903216 Suffix (Publisher-assigned ID) • Suffix can be SICI or PII. • DOI and URL pointing to the digital object, is registered with the International DOI Foundation. • 10.1234/4356 | http://www.pubsite.org/apr99/artl1.pdf
Using a DOI • DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives). • Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g: • dx.doi.org/10.100/1 redirects to www.pub/art1.pdf • CrossRef Project: major Sci-Tech professional societies and commercial publishers.
Reference Linking • In some fields, e.g. Physics, publishers have linking agreements already in place. • Alternatives to DOI: • PubMed/PubRef (National Library of Medicine) • PubSCIENCE (DOE/OSTI) • OpCit project • Proprietary Link Managers (AIP, APS) • System design calls for one URL for each DOI; underlying technology can handle multiple URLs however.
Current Work • Pilot Project involving CNRI, SFX, Academic Ideal. • OpenURL Protocol. • Recent Letter to CrossRef and IDF. • Demonstration Project at Illinois and OhioLink. • Local Resolver. • Localizing Name Resolution for AIP, ASCE, Elsevier, other publishers. • Use of CrossRef Metadata Database for identifying Publisher from DOI and linking to Local Copy, A & I Services, Library Assistance.
Computer Technologies • XML Appliances: Intel XML Accelerator. • Thin Desktops: • Legacy-free PCs; • Network appliances (Sun Rays). • Ubiquitous Computing: • Pocket PCs --Windows CE machines; • PalmPilots.
Wireless Technologies • Wireless Computing • Security issues; • Bandwidth and throughput limited; • CDPD (Cellular Digital Packet Data); • Web clipping vs. portable HTML; • Cell Phone/Pocket PC combination. • With Pocket Devices, use by patrons and staff for remote search, processing.
Role of the Sci-Tech Library • Function of Library: • Collect source materials; • Organize materials; • Provide access to materials. • Change: above activities are now distributed, not confined to a specific place. • Question: How do the support services for these activities need to change?
Issues • Library as Function not Place. • Acknowledgment of and Support for the Library’s Role in the Campus Information Infrastructure. • Provide a ‘Digital Library’ out of digital collections. • Moving up on the Information Food Chain: personal collection, colleague, e-mail, Web, Library. • Archiving issues (Open Archive Initiative); Archive implies an access mechanism).
4th Generation Information System • Simultaneous Searching of Multiple Resources. • Remote Reference and Instruction (Collaboration and Whiteboard--apply Help Desk Software). • Software-Aided Search Navigation and Modification. • Dynamic Links to Full-Text. Appropriate Copy problem. • One-Stop-Shopping.
Role of the Academic Librarian • In addition to Raising money & dealing with Publishers/vendors. • Experts in Information Seeking Process, Research, and Instructional Programs. • Knowledge of Emerging Information Technologies. • Ability to Work Effectively at Campus Level. • Ability to Train, Mobilize, and Enthuse Staff. • Cooperative Endeavors with other Departments, Grant Agencies, and Government Agencies.