1 / 12

Information Integration for Digital Libraries

Information Integration for Digital Libraries. August 10, 2000 Prof. Sang Ho Lee Soongsil University Seoul, Korea shlee@computing.soongsil.ac.kr. Information integration. Provision of integrated access to multiple, distributed, heterogeneous databases and other information sources

Télécharger la présentation

Information Integration for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Integration for Digital Libraries August 10, 2000 Prof. Sang Ho Lee Soongsil University Seoul, Korea shlee@computing.soongsil.ac.kr

  2. Information integration • Provision of integrated access to multiple, distributed, heterogeneous databases and other information sources • Mediator approach • More up-to-date data • No need to copy data • Query needs can be unknown • Data warehouse approach • High query performance • Can operate when sources unavailable • Extra information at warehouse • Modify, summarize (store aggregates), add historical information

  3. Client Client Mediator Wrapper Wrapper Wrapper Source Source Source Mediator Approach

  4. Client Client Query & Analysis Warehouse Integration Source Source Source Data Warehouse Approach Metadata

  5. Web Searching Practice • Approx. 800 million indexable Web pages (Feb. 1999) • Low coverage of the Web • No engine indexing more than 16% of indexable web pages • Out of date • New pages take months to be indexed • Low metadata use • 34% use “keywords” or “description” metatags • 0.3% use the Dublin Core metadata standard • Simple queries • Most queries use 1-3 search words • Poor relevancy ranking and precision

  6. Meta Search engines • USA • SavvySearch (www.savvysearch.com) • MetaCrawler (www.go2net.com/search.html) • Ask Jeeves (www.askjeeves.com) • ProFusion (www.profusion.com) • Mamma (www.mamma.com) • Ixquick (www.ixquick.com) • Korea • Wakano (www.wakano.co.kr) • Ms. DaChanni (www.mochanni.com) • Over 3000 metasearch engines around the world

  7. Operation Flow and Technical Issues User query Decompose and format queries Send queries and get results Post processing (ranking, clustering, etc.) Output result

  8. Current Practice of Metasearch Engines • Tend to a least-common-denominator interface • Not utilize function of individual sources completely • Covers general area, not a specific area • Little utilization of domain knowledge • Little consideration to personal profiles

  9. Proposed Research Topics (1) • Theme: focused on mediator-based integration techniques (in particular, metasearch engines) • Intelligent wrapper techniques • To extract, combine, and reconcile information for external sources • Exploit user profiles and utilize function of each sources as much as possible • Should be flexible and adaptable, as external sources change • Several approaches • Formal language based, machine learning based, heuristic based, extended CFG based, …

  10. Proposed Research Topics (2) • Efficiency issues • How to cache results and queries, to provide a fast response to users • How to do parallelism when accessing external sources

  11. Research/Development Strategies • Categorize objects and develop specialized search mechanism for each category • Build a working system to experiment theories • Experiment new ranking methods • Google, Goto, …

More Related