580 likes | 687 Vues
Learn Question Focus and Dependency Relations from Web Search Results for Question Classification. Wen-Hsiang Lu ( 盧文祥 ) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering,
E N D
Learn Question Focus and Dependency Relations from Web Search Results for Question Classification Wen-Hsiang Lu (盧文祥) whlu@mail.ncku.edu.tw Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineering, National Cheng Kung University WMMKS Lab
Research Interest Web Mining Natural Language Processing Information Retrieval WMMKS Lab
Research Issues • Unknown Term Translation & Cross-Language Information Retrieval • A Multi-Stage Translation Extraction Method for Unknown Terms Using Web Search Results • Question Answering & Machine Translation • Using Web Search Results to Learn Question Focus and Dependency Relations for Question Classification • Using Phrase and Fluency to Improve Statistical Machine Translation • User Modeling & Web Search • Learning Question Structure based on Website Link Structure to Improve Natural Language Search • Improving Short-Query Web Search based on User Goal Identification • Cross-Language Medical Information Retrieval • MMODE: http://mmode.no-ip.org/ WMMKS Lab
雅各氏症候群 WMMKS Lab
Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab
Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab
Question Answering (QA) System 1. Question Analysis: Question Classification, Keywords Extraction. 2. Document Retrieval: Retrieve related documents. 3. Answer Extraction: Extract a exact answer. WMMKS Lab
Motivation (1/3) • Importance of Question Classification • Dan Moldovan proposed a report [Dan Moldovan 2000] WMMKS Lab
Motivation (2/3) • Rule-based Question Classification • Manual and unrealistic method. • Machine Learning-based Question Classification • Support Vector Machine (SVM) . Need a large number of training data. . Too many features may be noise. WMMKS Lab
Motivation (3/3) • A new method for question classification. • Observe some useful features of question. • Solve the problem of insufficient training data. WMMKS Lab
Idea of Approach (1/4) • Many questions have ambiguous question words • Importance of Question Focus (QF). • Use QF identification for question classification. WMMKS Lab
Idea of Approach (2/4) • If we do not have enough information to identify the type of QF. Question QF Dependency Verb Dependency Quantifier Dependency Noun Question Type : Dependency Features : Question Type : (Unigram) Semantic Dependency Relation : (Bigram) Semantic Dependency Relation WMMKS Lab
Idea of Approach (3/4) • Example WMMKS Lab
Idea of Approach (4/4) • Use QF and dependency features to classify questions. • Learning QF and other dependency features from Web. • Propose a Semantic Dependency Relation Model (SDRM). WMMKS Lab
Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab
Rule-based Question Classification • [Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000] 5W(Who, When, Where, What, Why) Who → Person. When → Time. Where → Location. What → Difficult type. Why → Reason. WMMKS Lab
Machine Learning-based Question Classification • Several methods based on SVM. • [Zhang, 2003; Suzuki, 2003; Day, 2005] Feature Vector SVM Question Type KDAG Kernel Question WMMKS Lab
Web-based Question Classification • Use a Web search engine to identify question type. [Solorio, 2004] • “Who is the President of the French Republic?” WMMKS Lab
Statistics-based Question Classification • Language Model for Question Classification [Li, 2002] • Too many features may be noise. WMMKS Lab
Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab
Architecture of Question Classification WMMKS Lab
Question Type • 6 types of questions • Person • Location • Organization • Number • Date • Artifact WMMKS Lab
Basic Classification Rules • We define 17 basic rules for simple questions. WMMKS Lab
Learning Semantic Dependency Features (1/3) • Architecture for Learning Dependency Features • Extracting Dependency Features Algorithm WMMKS Lab
Learning Semantic Dependency Features (2/3) • Architecture for Learning Dependency Features WMMKS Lab
Learning Semantic Dependency Features (3/3) • Extracting Dependency Features Algorithm . . WMMKS Lab
Question Focus Identification Algorithm (1/2) • Algorithm WMMKS Lab
Question Focus Identification Algorithm (2/2) • Example WMMKS Lab
Semantic Dependency Relation Model (SDMR) (1/12) • Unigram-SDRM • Bigram-SDRM WMMKS Lab
Semantic Dependency Relation Model (SDMR) (2/12) • Unigram-SDRM P(C|Q) Q C Question Question Type • P(C|Q) need many questions to train. WMMKS Lab
Semantic Dependency Relation Model (SDMR) (3/12) • Unigram-SDRM P(DC|C) P(Q|DC) C DC Q Question Type Web search result Question • P(DC|C): Collect related search results by every type. • P(Q|DC): Use DC to determine the question type. WMMKS Lab
Semantic Dependency Relation Model (SDRM) (4/12) • Unigram-SDRM WMMKS Lab
Semantic Dependency Relation Model (SDRM) (5/12) DV: Dependency Verb DQ: Dependency Quantifier DN: Dependency Noun • Unigram-SDRM • Q={QF,QD}, QD={DV,DQ,DN}. WMMKS Lab
Semantic Dependency Relation Model (SDRM) (6/12) • Unigram-SDRM • DV={ dv1, dv2,⋯,dvi}, DQ={ dq1, dq2,⋯, dqj}, DN={ dn1, dn2,⋯, dnk}. WMMKS Lab
Semantic Dependency Relation Model (SDRM) (7/12) • Parameter Estimation of Unigram-SDRM • P(DC|C) • P(QF |DC), P(dv|DC),P(dq|DC), P(dn|DC) • N(QF): The number of occurrence of the QF in Q. • NQF(DC): Total number of all QF collected from search results. WMMKS Lab
Semantic Dependency Relation Model (SDRM) (8/12) • Parameter Estimation of Unigram-SDRM WMMKS Lab
Semantic Dependency Relation Model (SDRM) (9/12) • Bigram-SDRM WMMKS Lab
Semantic Dependency Relation Model (SDRM) (10/12) • Bigram-SDRM WMMKS Lab
Semantic Dependency Relation Model (SDRM) (11/12) • Parameter Estimation of Bigram-SDRM • P(DC|C): The same as Unigram-SDRM • P(QF|DC): The same as Unigram-SDRM • P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC) • Nsentence(dv,QF): The number of sentence containing dv and QF. • Nsentence(QF): Total number of sentence containing QF. WMMKS Lab
Semantic Dependency Relation Model (SDRM) (12/12) • Parameter Estimation of Bigram-SDRM WMMKS Lab
Outline • Introduction • Related Work • Approach • Experiment • Conclusion • Future Work WMMKS Lab
Experiment • SDRM Performance Evaluation . Unigram-SDRM v.s. Bigram-SDRM . Combination with different weights • SDRM v.s. Language Model . Use questions as training data . Use Web as training data . Questions v.s. Web WMMKS Lab
Experimental Data • Collect questions from NTCIR-5 CLQA. • 4-fold cross-validation. WMMKS Lab
Unigram-SDRM v.s. Bigram-SDRM • Result WMMKS Lab
Unigram-SDRM v.s. Bigram-SDRM (2/2) • Example • For unigram: “人”,”創下”,”駕駛” are trained successfully. • For bigram: “人_創下” are not trained successfully. WMMKS Lab
Combination with different weight (1/3) • Different weights for different features • α: The weight of QF, β: The weight of dV, • γ: The weight of dQ, δ: The weight of dN. WMMKS Lab
Combination with different weight (2/3) • Comparison of 4 dependency features WMMKS Lab
Combination with different weight (3/3) • 16 experiments • Best weighting: 0.23QF, 0.29DV, 0.48DQ. • To solve some problem about mathematics. • Example: QF and DV α: The weight of QF • β: The weight of DV. • α=(1-0.77)/[(1-0.77)+(1-0.71)] • β=(1-0.71)/ [(1-0.77)+(1-0.71)] WMMKS Lab
Use questions as training data (1/2) • Result WMMKS Lab
Use questions as training data (2/2) • Example • For LM: “網球選手”,”選手為” are not trained successfully. • For SDRM: “選手”, ”奪得” are trained successfully. WMMKS Lab