1 / 18

Ranking Definitions with Supervised Learning Methods

Ranking Definitions with Supervised Learning Methods. J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu. Motivation. People may need to find definitions of terms from Web. Traditional information retrieval is designed to search for relevant document, not suitable for this.

sargent
Télécharger la présentation

Ranking Definitions with Supervised Learning Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu

  2. Motivation • People may need to find definitions of terms from Web. • Traditional information retrieval is designed to search for relevant document, not suitable for this. • Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order.

  3. Task for definition search • Receive a query term, usually a noun. • Extract definition candidates from the document collection. • Rank the candidates according to the degree to which each one is good. • Output the result.

  4. Definition search is useful

  5. Candidates are not all good definitions

  6. Three categories of definitions • Good: must contain the general notion of the term and several important properties. • Bad: neither describes the general notion nor the properties of the term. • Indifferent: between good and bad.

  7. First step: collecting candidates • Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with • <term> is the first Base NP of the first sentence. • Two Base NPs separated by “of” or “for” are considered as <term> • Extract definition candidates with patterns: • <term> is a|an|the * • <term>, *, a,|an|the * • <term> is one of *

  8. Second step: Ranking candidates • Ranking based on Ordinal Regression (ordinal classification). • Ranking SVM is used. • Ranking based on classification • SVM is used.

  9. Ranking based on Ordinal Regression • Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories. • Ranking SVM is used as the model. • For each candidate x, • U(x)=wTx, where w represents a vector of weights. • The higher of U(x), the better x is as a definition

  10. Ranking based on Classification • Only good and bad definitions are used. It is a binary classification. • SVM is used as the model. • F(x)= wTx+b

  11. Features

  12. Removing redundant candidates • After ranking, duplicate definition may exist. • Use Edit distance to remove the one with a lower ranking score.

  13. Sample result

  14. Evaluation metric

  15. Results: For intranet data

  16. Results: For TREC.gov data

  17. Results: for definitional sentences

  18. Conclusions • Address the issue of searching for definitions by definition ranking. • Results are better than traditional IR. • Enterprise search system has been developed. • Not limited to search of definitions.

More Related