Metasearch

Metasearch NISO Metasearch Initiative Overview Local Uses of Metasearch John Little Senior Analyst, IT Duke Libraries Member, NISO-MI TG3 John_R_Little@notes.duke.edu Andrew K. Pace Head, Systems NCSU Libraries Co-chair, NISO-MI andrew_pace@ncsu.edu Tim Shearer Library Systems UNC Libraries Member, NISO-MI TG1 sheat@ils.unc.edu

Rumsfeld’s Law of Metasearch You metasearch with the standard you have, not the standard you wish you had.

Credits & Thanks • NISO Metasearch Initiative Team • Jenny Walker, VP Marketing, Ex Libris Co-chair of the initiative • Mike Teets, OCLC, Task Group Chair • Juha Hakala, Nat’l Lib Finland, Task Group Chair • Sara Randall, Endeavor and Katherine Kott, DLF, Task Group chairs • All the active participants of the 3 task groups • TRLN (for co-hosting 2 critical meetings)

Why I’m Here • What is metasearch? • Talk about the history, work, and present status of the NISO Metasearch Initiative Committee • Convey the complexity of improving the standing of metasearch • Talk about the work left to be done

I wish I had time to do more of…. • Convincing even the unconvinced that metasearch is a worthwhile endeavor (I will try to do this anyway) • Talk more about Google (I will do this anyway) • I do want to leave plenty of time for discussion

What’s in a name? • Federated search • Channel (RSS) search • Metasearch

Query form ? Query form Query form Query form ? ? ? Diverse information resources

Just-in-case Federated search Query form ? Diverse information resources

Federated search examples • ENCompass for Journals OnSite (EJOS) • SCIRUS • Google Scholar

Query form ? Diverse information resources Turned on or off Channel (RSS) Search Just-in-time On request `

Example of Channel Search

Query form ? Diverse information resources Metasearch Just-in-time

Query form ? Diverse information resources Metasearch integrated searching = metasearching = cross database searching = parallel searching = broadcast searching = …

MetaSearch Technology Query form ? Metasearch agent Translators/connectors Diverse information resources

Metasearch….Why bother? • Because most patrons do not care where information is or who packaged it • Present systems require users to know • How to select / access a database • How to get to them • How to use unique search options • Because Google cannot do it all • Challenge is creating a system that helps users find what they need while minimizing what they need to know

Tennant’s Tenets • Only librarians like to search, everyone else likes to find • All things being equal, one place to search is better than two or more. • “Good enough” is often just that • Users are not lazy, they’re human • Our ability to create effective one-stop searching is dependent on our ability to appropriately target user needs • The size of the result set doesn’t matter as much as how the results are presented. (‘the Google lesson’) • Services should be placed as close to the user as possible http://www.cdlib.org/inside/projects/metasearch/nsdl/

NISO-MI History • ALA (Philadelphia) Midwinter 2003 • NISO-MI Planning (Denver), Spring 2003 • NISO-MI Proselytizing (Washington, D.C.), Fall 2003 • Task Groups, 2004 - present

The NISO Metasearch Initiative • Any standards identified must help all the stakeholders: • libraries to deliver services that distinguish their offerings from other free web services • metasearch service providers to offer more effective and responsive services • content providers to deliver enhanced content and protect their intellectual property • Win – Win - Win

NISO-MI History • ALA Midwinter 2003 • Meeting called by 3 providers: Ebsco, Gale, Proquest • Concerned about impact on services • NISO offered to take leadership role and formed a planning committee • Identified key issues • Access Management (a.k.a. authentication/authorization) • Resource Identification • Metasearch Identification • The Search Itself • Results Management • Statistics • Planned another meeting

NISO-MI History • Denver Spring 2003 • Access Management • Understand metasearch needs; find best solutions available; develop best practices • Resource Identification • Work with Dublin Core RSLP and ISO Directories group; exchange format for collection and service descriptions • Search, Retrieve, Results Management • Current environment analysis (Z39.50, SRW/SRU, Proprietary API’s, XML Gateways); develop best practice for API’s; continue Z39.50 profiling ========================================== • Metasearch Identification • Solution: Register a practice that metasearch engines can use to identify themselves • Statistics • Work with Z39.7 and COUNTER; Explain Metasearch environment; Adapt existing standards; Publicize importance of statistics

NISO-MI History • D.C., Fall 2003 • Combined with OpenURL for 2-day workshop; briefed a larger audience on the broad issues discussed in Denver; Agreed that a focused initiative was needed • Approved Recommendations • Appointed leadership

NISO-MI Leadership • Overall Co-chairs • Jenny Walker, ExLibris • Andrew Pace, NCSU • Access Management (TG1 / NISO BA) • Mike Teets, OCLC • Collection Description (TG2 / NISO BB) • Juha Hakala, National Library of Finland • Pete Johnston, UKOLN, Collection Description • Larry Dixson, LC, Service Description • Search and Retrieve (TG3 / NISO BC) • Sara Randall, Endeavor • Matt Goldner, OCLC (formerly of Fretwell-Downing) • Katherine Kott, Digital Library Federation

TG 1: Access Management

Active Participants • Katie Anstock – Talis Information Ltd. • Susan Campbell - CCLA • Frank Cervone – Northwestern University • Paul Cope – Auto-Graphics, Inc. • David Fiander – University of West. Ontario • Ted Koppel – The Library Corporation • Mark Needleman – SIRSI Corporation • Ed Riding - Dynix • RL Scott – US DOE, OSTI • Tim Shearer – University of North Carolina • Mike Teets – OCLC, Inc. (Chair)

TG1 – Access management • Authentication • The process where a network user establishes a right to an identity -- in essence, the right to use a name (Lynch 1998) • Are you who you say you are? • Authorization • The process whereby a network user, based on their attributes, receives entitlements or authority to use a resource • So, can you use this?

Access Management Charter • Gather requirements for Metasearch authentication and access needs, inventory existing processes now in place, and develop a series of formal use cases describing the needs. • Deliver • Definitions document of Access Management and Metasearch terms. • Inventory of methods and techniques in use today • Use cases describing authentication and access needs.

TG1’s Plan of Attack Inventorying Current Approaches and Technologies Breaking apart the problem Identifying (defining) all the actors Enumerating functions Developing Use Cases Analyzing Use Cases • Ranking appropriateness of solutions to use cases • Recommend standard or best practice

Situations Can Be Complex Citizen Student Library Auth State Authen Metasearch Student Library Menu Campus Authent Databases Student

Current authentication technologies. Potential solutions? • Proprietary APIs? • NCIP? SIP2? • LDAP? • Shibboleth? • Kerberos? • Athens (UK) ? • PAPI? • Tequila? • Non-authenticated identification? • IP recognition? • Proxy Servers? • Referring URL? • Embedded data in URL? • Vendor provided Javascript? • Cookies? • Shouting?

Status • Completed survey of authentication methods in use. • Developed comprehensive use cases then simplified to a three metasearch specific cases. • Ranked authentication methods in use by their ability to deliver on use case needs. • Introduced an environmental ranking to cover factors such as ease of use, adoption, complexity, cost, etc. • Developed a charting model to identify best solutions.

Access Management Process Objects Processes Asserts Credentials Authentication Approved Passes Attributes Authorization Determines Entitlements Passes Certification Delivers The AMP A Mike Teets Invention Certificate

Access Management Instances of Authentication that take place in a simple metasearch transaction Resource 2 User MetaSearch 1 Resource 3 = AMPS, Access Management Process Symbol

Relative Rankings of Authentication Methods

Decisions to be Reached • Are any current approaches universally applicable? • Can/Should we develop our own authentication standard that addresses all situations? <not desirable> • Is authentication conducive to a standard at all? Possible result: a series of “best practices”?

TG1 Recommendations • Now • IP authentication • Username / Password • Potential for the future • Shibboleth

What’s next… RANKINGS AND RECOMMENDATIONS • Text document with comprehensive analysis of methods in use. • Recommend best practices where available. • Recommend development necessary for models with the most promise for metasearch. • Liaison with Shibboleth community started

TG2: Collection Description

The Meta-Problem (from a Discovery Standpoint) • Many database (content) providers, each with their own web presence and means of interaction • User wants to use data from many providers at the same time

User Needs • Find/discover collections that match a certain list of criteria • Obtain enough descriptive information to be able to identify a desired collection • Discover the services that provide access to the collection(s) • Interpret items retrieved from the collection in the context of the collection

TG2 Mission • Understand how portals use collection and/or service descriptions • Analyze options; recommend schemas and syntax for implementation of collection (S1) and service (S2) descriptions

TG2 Work Plan • Create data models for collections and services • Design metadata semantics for models • Design syntax for representation and data exchange • Build on existing work where possible • Ensure linkages between Collections (S1) and Services (S2) • Don’t build a whole new service • Don’t specify the architecture for a given service • Don’t specify protocols for exchange of collection and service metadata

Goals (Solutions) • Create two element sets to be used by metasearch (and other) applications • Collections descriptions: human readable text to describe contents of database • Building on Significant previous work, notably • Research Support Libraries Programme, UK, 1999-2002 • Dublin Core Collection Description Working Group, 2003+ • Service descriptions: to be used by applications to access remote database services

Relations between collections and services • A collection may have a parent, and may have multiple sub-collections (children) • Each collection description has 0-to-many service descriptions • Aservice may make multiple resources available • Each service description has 1 (only) collection description

DC Collection Description Application Profile (DC CD AP) • A "core" set of collection description properties • For simple collection-level descriptions • Suitable for a broad range of collections • Primarily to support discovery of collections • Includes: • Collection title • Description • Size • Subject(s) • Language • Type • Intellectual Rights • Access Rights • Data Range • Collection method • Logo • Collection history • Etc.

TG2-S1 progress to date • Working with/around DC CD AP issues (some joint membership) with data model • Metasearch Initiative introduced some library-specific requirements out of scope for DC CD AP. • TG2-S1 ends up with super-set of DC CD AP

Service Description Goals • Ultimately, a mechanism to describe (and access) informational services that, in turn, provide access to collections • How? • Indicate protocol used • Provide access point(s) for service • Provide authentication/authorization guidelines • Lists operations/queries supported • TG2-S2 using Zeerex as vehicle

Zeerex: A Starting Point • Originally a Z39.50 based specification • Based on Z39.50 “Explain” service, which was never fully or particularly well implemented • Flexible enough to deliver collection descriptions, relatively easy to implement • “Z39.50 Explain, Explained and Re-Engineered in XML”

Under discussion: • Maintaining and exchanging collection description and service access information • Auto-generate descriptions? • Harvest descriptions? • Collection Identifiers • Metasearch needs globally unique and persistent identifiers for collections ( and services) • Also needed by ONIX community, e-resource management systems and more

Future • Publish/promote standardized Collection and Service Description schemas • Write guidelines, best practices for implementation • Promote creation of, and facilitate sharing of, collection and service descriptions among metasearch providers • Ensure interoperability (or at least consistency) with TG1 (Authentication) and TG3 (Search and Retrieve)

TG 3: Search/Retrieve

Metasearch

Metasearch

Presentation Transcript

Lecture 9: Rank Aggregation in MetaSearch

MetaSearch

Metasearch Technologies: Definitions, Issues, Reference Applications

NISO Metasearch Meeting

Metasearch

Metasearch requirements for MIT

Lecture 9: Rank Aggregation in MetaSearch

Metasearch: Selection Considerations

METASEARCH, FEDERATED SEARCH AND AUBG LIBRARY EXPERIENCE

Metasearch

Search Engine – Metasearch Engine Comparison

Metasearch Unbound: Corporate Library Perspectives

Metasearch Usage Statistics

CDL’s Metasearch Infrastructure

Search Engines and Metasearch Engines

Users and Metasearch Applications: New Challenges for Usability Assessment

Dogpile Metasearch Engine

Metasearch engine for Austrian research information

MetaSearch

Metasearch Technologies: Definitions, Issues, Reference Applications

Metasearch

Metasearch