1 / 19

A semantic approach for question classification using WordNet and Wikipedia

A semantic approach for question classification using WordNet and Wikipedia. Presenter : Cheng- Hui Chen Authors : Santosh Kumar Ray, Shailendra Singh, B.P. Joshi PRL, 2010. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

adonia
Télécharger la présentation

A semantic approach for question classification using WordNet and Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A semantic approach for question classification using WordNet and Wikipedia Presenter : Cheng-HuiChen Authors : SantoshKumar Ray, ShailendraSingh, B.P. Joshi PRL, 2010

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation Question classification module of a Question Answering System plays a very important role. Web pages retrieved by these search engines do not provide precise information and may contain irrelevantinformation in even top ranked results. Moldovan et al. (2003) showed that 36.4% of the errors were generated due to incorrectquestion classification.

  4. Objectives • Proposed a question classificationmethod that exploits the powerfulsemantic features of the WordNet and the vast knowledge repositoryof the Wikipedia to describe informative terms explicitly. • Provide answers of the user queries in succinct form.

  5. Methodology • Questionclassification algorithm to classify questions using WordNet andWikipedia. • Detail • Question database collection • Identification of question patterns • Question classification algorithm

  6. Question database collection The question database consists of 5500 training and 500 test questions collected from english questions published by USC. All questions of the dataset have been manually labeled by Li and Roth according to the coarse and fine grained categories

  7. Identification of questionpatterns

  8. Identification of questionpatterns

  9. Question classification algorithm Where is my dog? I don't know the man. Delete do and return the man Location label If any of the question patterns matches with the given question, its entity type will be determined using algorithm QC (question classification).

  10. Question classification algorithm The man Human, Vehicle Takes a string as an input and calls the Procedure online for determination of expected entity type.

  11. Vehicle , Human, Location (TE1) Question classification algorithm Human, Indiviadual , Vehicle (TE2) Human, Vehicle (C) • Input and uses online resources, Wikipedia and WordNet, to determinethe type of expected entity. • It was observed that a typical article in Wikipedia starts like‘‘. . .X is a Y, Z, . . .” • Y, Z etc. are synonyms, hypernyms, hyponyms or some semantically related term to X and these are considered to be possible entity types. • If a sentence written in Wikipedia is ‘‘X is Y, Z, . . .”, the procedure online takes Y, Z, . . . as possible entity type of X.

  12. An application of question classification:answer validation • The question‘‘In what year did Arundhati Roy receive a Booker Prize?” • Similarity computation • Similarity score • The Question contains five tokens ‘‘a Number”,‘‘Arundhati Roy”, ‘‘received”, ‘‘Booker”, ‘‘Prize”. • If a candidate answersentence when parsed contains two tokens out of these fivetokens, it has similarity score of 0.4. • The expanded query ‘‘ In what year did (‘‘Arundhati Roy” or Arundhati) (Receive OR Get) Booker (Prize OR Award)?”. • The passage retrieval phase return top 10 answer sentences. Five answer sentences out of these 10 answer sentences got required similarity score.

  13. An application of question classification :answer validation • Entity type • The questionclassification module computes ‘‘date” as expected entity type forthis question. • It considering date to be a number (optionallywith month name or word ‘‘year”), four candidate answer sentencescontaining some numberwere sent to the next stage for further processing.

  14. An application of question classification :answer validation • World Wide Web validation • Four candidates passed the first two tests. Three contained‘‘1997” as answer in them and the fourth returned ‘‘£20,000”. Only the first answer (1997) was validated by topmost documents returned by Google. • Hence, the three candidate answersentences containing this answer were validated as correct answers.

  15. Experiments (QC algorithm)

  16. Experiments (Answer validation) • Sourse • TREC (Text REtrieval Conference) • WorldBook (The World Book) • Worldfactbook (CIA the world Factbook) • Other standard resources.

  17. Experiments (Answer validation)

  18. Conclusions • Question classification algorithm with high accuracy. • The proposed method seems to be promising for question classification in the field ofopen-domain question answering. • The proposed method combines the World Wide Web with Natural Language Processing (NLP) techniques.

  19. Comments • Advantages • The distinctive points of the algorithm are lying in its dynamic and extendible properties. • Proposed method promising for question classification. • Shortages • It is having few limitations • Applications • Information retrieval

More Related