CS 430: Information Discovery

CS 430: Information Discovery Lecture 27 Large-scale information discovery: the NSDL

Laptop returns Dates: Tuesday, May 8th 9:00 - 11:00 a.m. Monday, May 14th 1:00 - 3:00 p.m. Tuesday, May 15th 9:00 - 11:00 a.m. Place: Upson Hall 5130 Receipts: Bring a copy of your receipt to the examination

Course Administration Reading for Wednesday, May 2: Read the chapter for its general themes. Why is cluster analysis relevant to information discovery? Do not read the details of all the algorithms, but pay attention to the single link and complete link methods.

NSDL The National Science Foundation's National Digital Library for Science, Mathematics, Engineering and Technology Education [a.k.a. Smete, NSDL, Learns, ...]

The NSDL Library Project 1996 Vision articulated by NSF's Division of Undergraduate Education 1997 National Research Council workshop 1998 Preliminary grants through Digital Libraries Initiative 2 1998 SMETE-Lib workshop 1999 NSDL Solicitation 2000 6 Core Integration System projects + 23 others funded 2001 1 very large Core Integration System project to be funded

Collections and Services Scientific and technical information Materials used in education Materials tailored to education

Core Partners

All Partners

Collections Track Biology Education Online – An Interactive Electronic Journal A Digital Multimedia Library for Health Sciences Education Bioscience Education Net Digital Library for Earth Systems Education Atmospheric Visualization Collection MATHDL – Online Learning Materials in Mathematics A Digital Library Network for Engineering and Technology Mathematics, Science, and Technology Teacher Preparation Geoscience (Solid Earth) Data Sets [Muamia Barazangi] The Alsos Digital Library

Services Track Prioritizing Content Creation in Digital Libraries Peer Review of Digital Learning Materials Electronic Journal of Earth System Science Education Resources Breaking the Metadata Generation Bottleneck Discovering, Recommending, and Combining Learning Objects Information Pathways through NSDL Video Component Repository & Environment for Teaching Environments

Fundamental Question: Leverage • How can the NSDL Library be more than the sum of its parts? • Which separate activities can NSDL bring • together? • Which existing, fragmented activities can be • combined as the initial nucleus of NSDL? • The projects are attempting to develop a shared vision

Collection Development Policy • The NSDL partners could: • concentrate on educational materials • be a general purpose science library • concentrate on open access materials • include formally published materials, preprints, web sites and similar materials • be a long term archive • The vision: The NSDL must have a very comprehensive collections development policy

Audience The NSDL could: concentrate on the needs of science teachers serve students directly emphasize independent learners The vision: The NSDL should aim to serve every one of these communities and more.

Information Discoveryand Quality of Materials • The NSDL could: • help people find information • provide catalogs and indexes • review educational materials and validate • them for scientific and educational content • The vision: The NSDL should aim to provide every one of these and more.

The Architecture User portals Central services, metadata collections, etc. Distributed collections

Distributed information discovery Technical agreements cover formats, protocols, security systems so that messages can be exchanged, etc. Content agreements cover the data and metadata, and include semantic agreements on the interpretation of the messages. Organizational agreements cover the ground rules for access, for changing collections and services, payment, authentication, etc. Challenge is to create incentives for independent digital libraries to adopt agreements

Levels of Interoperability Level Agreements Example Federation Strict use of standards AACR, MARC (syntax, semantic, Z 39.50 and business) Harvesting Digital libraries supply Open Archives basic metadata; simple protocol and registry Gathering Digital libraries do not Web crawlers cooperate; services must and search engines seek out information

Metadata Harvesting Central services, metadata collections, etc. Central data Metadata harvest Distributed collections

Metadata Collections must support: Unqualified Dublin Core Collections may support: IMS FGDC or other recognized metadata sets Simple XML tagged format -- protocol derived from Dienst The big question: How can we have effective information discovery with such minimal metadata?

A User's Wish List To discover materials and services: • Good science • Comprehensible to students -- effective for teaching • Stable -- will not change or disappear Through services that are appropriate to the user's needs. • No uniform catalog or index to everything • Mixture of for-profit and open access information

Virtual Collections NSDL Links show the members of the virtual collection

The Information Discovery System Items are stored in (usually) independent repositories. Surrogates for items and resources are stored in a central metadata repository. Items and surrogates become part of the library by way ofgathering, harvesting and federated services. A search service allows items in the library to be discovered. The metadata repository and search service may be distributed.

Short Term: Demonstration System Harvesting: Open Archives alpha test with pilot registry and harvester. Collections: 10 large collections selected for varied content Metadata: Unqualified Dublin Core, IMS, FGDC Services:Basic searching and filtering (using minimal metadata), mockup of vocabulary support

Long Term: Collaboration • Harvesting and gathering provided by Cornell, using the Metadata Harvesting of the Open Archives initiative. • Search service provided by University of Illinois Emerge system. • Integration of collections and services provided by UCHAR, using SDLIP protocol. • User authentication and profile database provided by Columbia University.

Cornell Team for the NSDL William Arms Overall coordination Elly Cramer Programming and systems Carl Lagoze Architecture Diane Hillmann Metadata Dean Krafft Systems Rich Marisa Architecture and engineering John Saylor Collections development Carol Terrizzi Design and communications Sarah Thomas Cornell University Library Herbert Van de Sompel Architecture

CS 430: Information Discovery