1 / 27

William H. Mischo University of Illinois at Urbana--Champaign

The Illinois Digital Library Initiative: Processing and Access Issues for Full-Text Journals May 27, 1998 Pennsylvania State University. William H. Mischo University of Illinois at Urbana--Champaign Grainger Engineering Library Information Center. Overview.

nell
Télécharger la présentation

William H. Mischo University of Illinois at Urbana--Champaign

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Illinois Digital Library Initiative:Processing and Access Issues for Full-Text JournalsMay 27, 1998Pennsylvania State University William H. Mischo University of Illinois at Urbana--Champaign Grainger Engineering Library Information Center

  2. Overview • Testbed Goals & Mission. • Testbed Issues. • Testbed Technologies. • SGML Processing Methodology. • Accomplishments. • Transaction Log Analysis • Federation Tests & Distributed Repository Model. • Future Foci. • What We Have Learned. • Questions

  3. “The Business of a University is Information…The Production and Dissemination of Information is the Work of the University.” • Tom Everhart, President, California Institute of Technology

  4. Digital Library Initiative Program • Funded by National Science Foundation (NSF), DARPA, and NASA. • Awarded grants to 6 universities (and partners), September 1994--August 1998. • The 6: Illinois, Michigan, Stanford, Berkeley, Carnegie Mellon, Santa Barbara. • Each project: $4 million over 4 year project. • Illinois: Testbed, Research, Evaluation, Web Software.

  5. Scholarship, Publishing, Libraries • Changing Paradigm: Authors, Publishers, Libraries, A & I Services. • Scholarly Publishing Issues (We Pay Twice). • Publisher Costs (85% for First Copy). • Idea of Universities as Publishers. • Users’ Information Seeking Behavior (personal collection, colleagues, e-mail, Web, Library). • Archiving Issues (Depository idea GB, Canada) • Role of the Library (Function as well as Place).

  6. Scholarship • “The normal mode of scientific growth is exponential…(we are) entering a period of crisis marked by rapidly increasing concern over problems of manpower, literature, and expenditure that demand solution by reorganization.” • Derek de Solla Price, 1986. • Year and Number of Journals: • 1665 1 • 1932 6,000 • 1981 96,000 • 1996 165,000 • Avg. Price of U.S. Periodical rose 155%, 1986-96.

  7. Testbed Goals & Objectives • Construct Large-Scale, Multipublisher, SGML-Based Full-Text Testbed. • Investigate Processing, Indexing, Normalization, Retrieval and Rendering. • Study End-User Searching Behavior and Needs. • Look at One-Stop-Shopping Retrieval Models (Integration of Services). • Identify Models for Effective Retrieval in Electronic Full-Text Publishing Environment.

  8. Testbed: 54 Journals, 39K ArticlesAll items in SGML & 2/3 in PDF • American Institute of Physics--APL, JAP, RSI • 12,000 articles, 1995--, weekly updates. • American Physical Society--PRL • 8,800 articles, 1995--, weekly updates. • ASCE Journals (25 titles) • 5,000 articles, 1995--. • IEE Proceedings and Electronics Letters • 7,400 articles, 1993--. • IEEE Computer Society (14 titles): 5,000 articles, 1996--.

  9. Issues • Toward the Holy Grail of Smart Document. • Top Menu Integration and Cross-Resource Links. • Searching over Full-Text of Journals vs. Abstract & Index Service Database. • Full-Text Display (Mathematics Rendering: SGML, HTML, PDF, XML, Math ML, TeX.). • Web-Based Problems & Connectivity. • Breadth and Depth of Collections. • User Response.

  10. Testbed Technologies • Open Text (HPUX) Search Engine / LiveLink Web. • Item Metadata for Normalization and Short-Entry Display. • TCP/IP and HTTP for Full-Text, DCOM DLLs for A&I Links, Java Applets (Wordwheels). • SGML rendering via Panorama. • Custom Processing Programs on NT and Unix Platforms (Visual Basic, C++, Perl). • Microsoft IIS (Web Retrieval, ASP for Links and Top Menu, Authentication w/ Bluestem).

  11. Accomplishments (Overview) • Distributed Repository Model (within Testbed & with AIP). • Process & Retrieve from Multiple Publishers & Heterogeneous DTDs. • Use of Aliasing (Normalization) for Cross-Repository Access from Single Client Search Argument. • Item Metadata Definition. • Dynamic Linking of Resources and Proxy A&I Service Access from / to Testbed. • Focused User Studies.

  12. UIUC DLI Testbed Architectures Under Investigation Metadata Indexes Gateways Repositories (SGML, PDF) Testbed Links to: A & I Services, Other Full Text HTTP JAVA ASP LiveLink IEE IEEE CS Urbana APS ASCE Clients New York AIP Authentication Authorization

  13. DeLIver Features • Retrieval over Subset of Repositories. • Forward (Citation) & Backward (Bibliography) Links to Testbed. • Links to INSPEC, Compendex, Current Contents from Items & Bibliography. • Ovid INSPEC/Compendex Proxy. • Integration with Other Library Resources • Web-Kerberos Based Authentication. • Capability of Digital Signing. • User Transaction Logs.

  14. Toplevel Menu Transactions(Total 19738)

  15. Transaction Logs (1) 4035 total end-user sessions (September through May). 3023 end-user sessions where searches were performed Top Bar# Sessions Total # About DeLIver 427 536 Browse (all) 1585 2277 Browse Only 1012 Help 175 190 Quicktips 189 245 Download Software 1001 1086 Other Resources 230 289

  16. Transaction Logs (2) 4035 total end-user sessions (September through May). 3023 end-user sessions where searches were performed Search Fields # Sessions Total # Keyword 2083 6090 Abstract 194 747 Article Title 368 976 Article Author 377 926 All Author 185 468 Citations 39 74 Body of Article 76 336 Figure Caption 26 60 Table Caption 9 12 Journal Title 218 530 Title, Headings, Caption 118 358

  17. Transaction Logs (3) 4035 total end-user sessions (September thru May). 3023 end-user sessions where searches were performed. Searching Characteristics # Sessions Total # Average Length of Search 727 seconds Display Full-Text 2079 4267 PDFs 842 10104 SGMs 1516 4660 Extended Citation 578 2212 Boolean Operators 856 5773 ANDS 682 Ors 204 668 NOTs 30 79 KWIC Display 389 780 Links to Inspec/Compendex 261 404 Multiword Search Arguments 1848 6134

  18. Transaction Logs (4) 4055 end-user sessions (September thru May) 3023 end-user sessions where searches were performed Publisher Choices # Sessions Total # All Publishers 2535 9185 AIP 65 238 APS 33 84 ASCE 96 247 IEE 38 98

  19. Transaction Logs (5) 4055 end-user sessions (September thru May) 3023 end-user sessions where searches were performed Points: Not much use of Help or Quicktips; a lot of Browsing but < 50% of search sessions; Not jumping to A&I Services from DeLIver; mostly Keyword Searching, also fair amount of Author, Article Title, Journal Title; much more Display Full-Text than Extended Citation (why?); 25% of sessions use Boolean operators; Multiword Search Arguments (complex terms, not single words) being entered; Linking to INSPEC/Compendex in 20% of sessions; predominantly All Publishers being searched.

  20. Testbed User Authentication • Approach: • Authenticate Once per Session / Authorize per Use • Current Mechanism: • On 1st Request, User Referred to Bluestem Script • Upon Bluestem Authentication: • Authorization Record Written to SQL Database • Cookie Set Which Points to that Record • Need to Fix Redirection Problem with MS IE • Need to Extend Outside Cookie-Setting Domain

  21. Future Work • Implementation of Distributed Repository Model. • Expand Breadth of Testbed (Loading Locally and Linking to other Repositories). • Use of Digital Object Identifiers and other Standards. • Rendering via HTML 4.0 & CSS, XML & XSL. • Adding Dynamic retrieval Mechanisms (Wordwheels, Co-Occurrence Matrices). • Expand Simultaneous Search Mechanisms. • Expanded User Studies.

  22. SGML vs. HTML vs. XML • SGML: • Supports Powerful Indexing, Search & Retrieval • But Client, Delivery, & Rendering Issues Remain • HTML: • Ubiquitous; Rendering Has Become More Robust • But Remains Presentation Oriented, Less Semantic • XML: • Subset Retains SGML Features of Primary Interest • But XML Is New, Untested, Under-Supported

  23. Converting DLI Testbed to XML • XML Differences from SGML: • No SHORTREF (Tag Minimization) • Tags Are Case Sensitive • Restrictions on Entities, Attributes, Link Mechanisms • Empty Tags Handled Differently • Math ML vs. ISO 12083 Math • Math ML a Major Departure -- Adds Semantics • Focus on Java / ActiveX for Initial Deployment; Long-Term Success May Hinge on XSL / DSSSL • ‘Content-Markup’ requires XSL, Dynamic HTML functionality

  24. CSS, XSL, DSSSL • CCS1 & CCS2 Have Added: • Overlapping Glyphs, Absolute & Relative Positioning • Downloadable Fonts (Platform, Browser Variable) • Styling by Attributes, 2 Levels of Hierarchy • XSL, DSSSL, DSSSL-O: • XSL Uses XML Notation, Is Extensible (ECMAScript) • Allows More Extensive Manipulation In Formatting • Supports Re-arrangement, Navigator Frames, etc. • Not Yet Implemented in Production Browsers

  25. What We Have Learned (1) • Power of SGML for Indexing & Retrieval. • Problems with rendering mathematics--SGML, TeX, HTML, XML, Math ML. • Depth and breadth of collection (TULIP/ Red Sage Syndrome; note use of Ovid client). • Local Processing Implications • Metadata needs and robustness of Distributed Model.

  26. What We Have Learned (2) • Efficacy of Full-Text (stand-alone, integrated with A & I, part of TOC Service). • The Idea of a Digital Library in the Digital Chaos--the role of the Gateway and Linking of Resources. • Changing roles of Authors, Publishers, A & I Services, Libraries. • These Technologies Will Transfer to the Web (CSS I & II, HTML 4.0, Dynamic HTML, XML).

More Related