1 / 29

A Knowledge-Based Search Engine Powered by Wikipedia

A Knowledge-Based Search Engine Powered by Wikipedia. David Milne, Ian H. Witten, David M. Nichols (CIKM 2007). Introduction. Whenever we seek out new knowledge — how can one describe the unknown? What knowledge seekers need is a bridge between what they know and what they wish to know.

river
Télécharger la présentation

A Knowledge-Based Search Engine Powered by Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Knowledge-Based Search EnginePowered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)

  2. Introduction • Whenever we seek out new knowledge — how can one describe the unknown? • What knowledge seekers need is a bridge between what they know and what they wish to know. • Knowledge seekers can benefit from a thesaurus that covers the terminology of both documents and potential queries, and describes relations that bridge between them.

  3. KORU

  4. Creating A Relevant Knowledge Base • To work well, Koru relies on a large and comprehensive thesaurus. • Our own technique automatically extracts thesauri from a huge manually defined information structure. • From Wikipedia, we derive a thesaurus that is specific to each particular document collection.

  5. Creating A Relevant Knowledge Base • The basic idea is to use Wikipedia’s articles as building blocks for the thesaurus. • Each article describes a single concept. • Concepts are often referred to by multiple terms and Wikipedia handles these using “redirects”.

  6. Measuring semantic relatedness • Semantic relatedness concerns the strength of the relations between concepts. • The measure that we use quantifies the strength of the relation between two Wikipedia articles by weighting and comparing the links found within them.

  7. Measuring semantic relatedness • Links are weighted by their probability of occurrence. • Links are less significant for judging the similarity between articles if many other articles also link to the same target. • We simply sum the weights of the links that are common to both articles.

  8. Disambiguating unrestricted text • To identify the concepts relevant to a particular document collection, we work through each document in turn, identifying the significant terms and matching them to individual Wikipedia articles. • Disambiguate each term using the context surrounding it.

  9. Identifying relations between concepts • Wikipedia contains many more links than the redirects we use to identify synonymy. • We gather all relations from article and category links, but weight them so that only the strongest are emphasized.

  10. Weighing topics, occurrences and relations • Every occurrence of every topic is weighted within the thesaurus. • 1. tf-idf:A significant topic for a document should occur many times. • 2. average semantic relatedness measure:A significant topic should relate strongly to other topics in the document.

  11. KORU in action

  12. KORU in action

  13. KORU in action

  14. Evaluation • 2種版本:Topic browsing:有使用thesaurusKeyword Searching:沒有使用thesaurus • 12名參加者、10個task

  15. Evaluation

  16. Evaluation

  17. Evaluation

  18. Evaluation

More Related