1 / 58

Learn

Learn Question Focus and Dependency Relations from Web Search Results for Question Classification. Wen-Hsiang Lu ( 盧文祥 ) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering,

Télécharger la présentation

Learn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learn Question Focus and Dependency Relations from Web Search Results for Question Classification Wen-Hsiang Lu (盧文祥) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering, National Cheng Kung University WMMKS Lab

  2. Research Interest Web Mining Natural Language Processing Information Retrieval WMMKS Lab

  3. Research Issues • Unknown Term Translation & Cross-Language Information Retrieval • A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results • Question Answering & Machine Translation • Using Web Search Results to Learn Question Focus and Dependency Relations for Question Classification • Using Phrase and Fluency to Improve Statistical Machine Translation • User Modeling & Web Search • Learning Question Structure based on Website Link Structure to Improve Natural Language Search • Improving Short-Query Web Search based on User Goal Identification • Cross-Language Medical Information Retrieval • MMODE: http://mmode.no-ip.org/ WMMKS Lab

  4. 雅各氏症候群 WMMKS Lab

  5. Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab

  6. Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab

  7. Question Answering (QA) System 1. Question Analysis: Question Classification, Keywords Extraction. 2. Document Retrieval: Retrieve related documents. 3. Answer Extraction: Extract a exact answer. WMMKS Lab

  8. Motivation (1/3) • Importance of Question Classification • Dan Moldovan proposed a report [Dan Moldovan 2000] WMMKS Lab

  9. Motivation (2/3) • Rule-based Question Classification • Manual and unrealistic method. • Machine Learning-based Question Classification • Support Vector Machine (SVM) . Need a large number of training data. . Too many features may be noise. WMMKS Lab

  10. Motivation (3/3) • A new method for question classification. • Observe some useful features of question. • Solve the problem of insufficient training data. WMMKS Lab

  11. Idea of Approach (1/4) • Many questions have ambiguous question words • Importance of Question Focus (QF). • Use QF identification for question classification. WMMKS Lab

  12. Idea of Approach (2/4) • If we do not have enough information to identify the type of QF. Question QF Dependency Verb Dependency Quantifier Dependency Noun Question Type : Dependency Features : Question Type : (Unigram) Semantic Dependency Relation : (Bigram) Semantic Dependency Relation WMMKS Lab

  13. Idea of Approach (3/4) • Example WMMKS Lab

  14. Idea of Approach (4/4) • Use QF and dependency features to classify questions. • Learning QF and other dependency features from Web. • Propose a Semantic Dependency Relation Model (SDRM). WMMKS Lab

  15. Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab

  16. Rule-based Question Classification • [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000] 5W(Who, When, Where, What, Why) Who → Person. When → Time. Where → Location. What → Difficult type. Why → Reason. WMMKS Lab

  17. Machine Learning-based Question Classification • Several methods based on SVM. • [Zhang, 2003; Suzuki, 2003; Day, 2005] Feature Vector SVM Question Type KDAG Kernel Question WMMKS Lab

  18. Web-based Question Classification • Use a Web search engine to identify question type. [Solorio, 2004] • “Who is the President of the French Republic?” WMMKS Lab

  19. Statistics-based Question Classification • Language Model for Question Classification [Li, 2002] • Too many features may be noise. WMMKS Lab

  20. Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab

  21. Architecture of Question Classification WMMKS Lab

  22. Question Type • 6 types of questions • Person • Location • Organization • Number • Date • Artifact WMMKS Lab

  23. Basic Classification Rules • We define 17 basic rules for simple questions. WMMKS Lab

  24. Learning Semantic Dependency Features (1/3) • Architecture for Learning Dependency Features • Extracting Dependency Features Algorithm WMMKS Lab

  25. Learning Semantic Dependency Features (2/3) • Architecture for Learning Dependency Features WMMKS Lab

  26. Learning Semantic Dependency Features (3/3) • Extracting Dependency Features Algorithm . . WMMKS Lab

  27. Question Focus Identification Algorithm (1/2) • Algorithm WMMKS Lab

  28. Question Focus Identification Algorithm (2/2) • Example WMMKS Lab

  29. Semantic Dependency Relation Model (SDMR) (1/12) • Unigram-SDRM • Bigram-SDRM WMMKS Lab

  30. Semantic Dependency Relation Model (SDMR) (2/12) • Unigram-SDRM P(C|Q) Q C Question Question Type • P(C|Q) need many questions to train. WMMKS Lab

  31. Semantic Dependency Relation Model (SDMR) (3/12) • Unigram-SDRM P(DC|C) P(Q|DC) C DC Q Question Type Web search result Question • P(DC|C): Collect related search results by every type. • P(Q|DC): Use DC to determine the question type. WMMKS Lab

  32. Semantic Dependency Relation Model (SDRM) (4/12) • Unigram-SDRM WMMKS Lab

  33. Semantic Dependency Relation Model (SDRM) (5/12) DV: Dependency Verb DQ: Dependency Quantifier DN: Dependency Noun • Unigram-SDRM • Q={QF,QD}, QD={DV,DQ,DN}. WMMKS Lab

  34. Semantic Dependency Relation Model (SDRM) (6/12) • Unigram-SDRM • DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj}, DN={ dn1, dn2,⋯, dnk}. WMMKS Lab

  35. Semantic Dependency Relation Model (SDRM) (7/12) • Parameter Estimation of Unigram-SDRM • P(DC|C) • P(QF |DC), P(dv|DC),P(dq|DC), P(dn|DC) • N(QF): The number of occurrence of the QF in Q. • NQF(DC): Total number of all QF collected from search results. WMMKS Lab

  36. Semantic Dependency Relation Model (SDRM) (8/12) • Parameter Estimation of Unigram-SDRM WMMKS Lab

  37. Semantic Dependency Relation Model (SDRM) (9/12) • Bigram-SDRM WMMKS Lab

  38. Semantic Dependency Relation Model (SDRM) (10/12) • Bigram-SDRM WMMKS Lab

  39. Semantic Dependency Relation Model (SDRM) (11/12) • Parameter Estimation of Bigram-SDRM • P(DC|C): The same as Unigram-SDRM • P(QF|DC): The same as Unigram-SDRM • P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC) • Nsentence(dv,QF): The number of sentence containing dv and QF. • Nsentence(QF): Total number of sentence containing QF. WMMKS Lab

  40. Semantic Dependency Relation Model (SDRM) (12/12) • Parameter Estimation of Bigram-SDRM WMMKS Lab

  41. Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab

  42. Experiment • SDRM Performance Evaluation . Unigram-SDRM v.s. Bigram-SDRM . Combination with different weights • SDRM v.s. Language Model . Use questions as training data . Use Web as training data . Questions v.s. Web WMMKS Lab

  43. Experimental Data • Collect questions from NTCIR-5 CLQA. • 4-fold cross-validation. WMMKS Lab

  44. Unigram-SDRM v.s. Bigram-SDRM • Result WMMKS Lab

  45. Unigram-SDRM v.s. Bigram-SDRM (2/2) • Example • For unigram: “人”,”創下”,”駕駛” are trained successfully. • For bigram: “人_創下” are not trained successfully. WMMKS Lab

  46. Combination with different weight (1/3) • Different weights for different features • α: The weight of QF, β: The weight of dV, • γ: The weight of dQ, δ: The weight of dN. WMMKS Lab

  47. Combination with different weight (2/3) • Comparison of 4 dependency features WMMKS Lab

  48. Combination with different weight (3/3) • 16 experiments • Best weighting: 0.23QF, 0.29DV, 0.48DQ. • To solve some problem about mathematics. • Example: QF and DV α: The weight of QF • β: The weight of DV. • α=(1-0.77)/[(1-0.77)+(1-0.71)] • β=(1-0.71)/ [(1-0.77)+(1-0.71)] WMMKS Lab

  49. Use questions as training data (1/2) • Result WMMKS Lab

  50. Use questions as training data (2/2) • Example • For LM: “網球選手”,”選手為” are not trained successfully. • For SDRM: “選手”, ”奪得” are trained successfully. WMMKS Lab

More Related