1 / 30

Three-level approach for Passage Retrieval in Arabic Question/Answering Systems

The 3rd International Conference on Arabic Natural Language Processing . Three-level approach for Passage Retrieval in Arabic Question/Answering Systems. Lahsen Abouenour 1 , Karim Bouzoubaa 1 , Paolo Rosso 2. Mohammadia School of Engineers, Rabat, Morocco - May 2009.

Samuel
Télécharger la présentation

Three-level approach for Passage Retrieval in Arabic Question/Answering Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 3rd International Conference on Arabic Natural Language Processing Three-level approach for Passage Retrieval in Arabic Question/Answering Systems Lahsen Abouenour1, Karim Bouzoubaa1, Paolo Rosso2 Mohammadia School of Engineers, Rabat, Morocco - May 2009

  2. Arabic Question/Answering Systems Classical IR User Query (keywords) 2 1 List of documents/links ? User Checking 3 Answer to User Query 4 ???

  3. Arabic Question/Answering Systems Question/Answering User Query (question = keywords+structure) 1 ? List of documents/links 2 User Checking Answer to User Query 3 ???

  4. Arabic Question/Answering Systems Existing Arabic Q/A Systems • QARAB (based on Al-Raya corpus) • AQAS (extract answers from only structured texts) • ArabiQA (deal with factoid questions, embeds NER module ) • QASAL (semi-automatic Q/A system for factoid questions) Three Modules Question Analysis Passage Retrieval Answer Extraction Question type Candidate passage Answer identification Keywords Passage ranking Answer construction Named Entities … … …

  5. Arabic Question/Answering Systems Challenges of Arabic Q/A Systems • short vowels, • absence of capital letters, • complex morphology, • etc.

  6. Arabic Question/Answering Systems Question/Answering User Query (question = keywords+structure) 1 Natural Language (أين توجد مدينة مراكش ؟ | Where is the city of Marrakech ?) -- Keywords : Where | is | the | city | of | Marrakech أين| توجد| مدينة | مراكش ? -- Structure : أين توجد مدينةمراكش ؟ Where isthecity of Marrakech ? ≠ ≠ IsMarrakechacity? هلمراكشمدينة ؟

  7. Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Xxxxx مراكش (Marrakech)xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة (city) xxxxx xx xxx توجد (exist in) xxx No answer Passage N المغرب (Morroco) xxx مراكشإقليميوجد (the region of marrakech exists in) xxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx The answer

  8. Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Xxxxx مراكش (Marrakech) xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة (city) xxxxx xx xxx توجد (exist in) xxx (Is in | Marrakech | city) توجد | مراكش | مدينة Morphological relation hyponymy/semantic relation Passage N المغرب (Morroco) xxx مراكشإقليميوجد (the region of marrakech exists in) xxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx يوجد | مراكش | إقليم (Is in | Marrakech | city)

  9. Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Passage N Xxxxx مراكش xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة xxxxx xx xxx توجد xxx المغرب xxx مراكشإقليميوجدxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx Vs ??? With respect to Morphological and Semantic Relation relevance(P1)=relevance(PN) What about the question structure ?

  10. Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Expected Answer: Question: أين توجدمدينةمراكش ؟ توجدمدينةمراكش في (The city of Marrakech is in …) (Where is the city of Marrakech ?) Passage 1 structures Passage N structures

  11. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Levels Semantic Query Expansion (extending the list of keywords related to the user question) Keyword-based level (candidate passages with related keywords) Structure-based level (candidate passages with related structure) Semantic reasoning level (comparing CG representations)

  12. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Resources & Tools Semantic Query Expansion (Arabic WordNet, Amine Plateform) Keyword-based PR (Yahoo API) Structure-based PR (The Java Information Retrieval System - JIRS) Semantic reasoning level (Amine Plateform)

  13. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Ontology • AWN is a free Lexical resource • AWN containsOver than 20 000 arabic words grouped into synsets • AWN is connected with the SUMO (Suggested Upper Merged Ontology) • SUMO has about 2000 general concept • SUMOMany relations between concepts (hyponymy, hypernymy, ...)

  14. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Amine Platform • Amine is a multi-layer platform dedicated to the development • of Intelligent Systems and Multi-Agents Systems • - Amine is an Open Source Platform • - Amine is 100 % Java implementation • - Amine provides a set of operations related to Ontologies

  15. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Arabic WordNet Temporary DataBase(MySQL) Content Structure Link with SUMO Amine Platform API JAVA Program Amine AWN ontology

  16. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion

  17. Concept/Term Global Expansion Morphological Expansion AAWN Ontology Expansion 1 - By synonyms 2 – By supertypes 3 – By definition 4 – By subtypes Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion

  18. Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Structure-based PR The Java Information Retrieval System (JIRS) • a language-independent PR system • adpated for many non-agglutinative European languages (English, French, Spanish, Italian, ...) • adapted for the Arabic language • re-ranking of the retrieved passages is based on a distance density n-gram model URL : http://sourceforge.net/projects/jirs/

  19. Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process CLEF Questions TREC Questions 1 - Manual Process 2 - Automatic Process Google Semantic QE Yahoo Semantic QE JIRS Semantic QE JIRS Google Yahoo Keyword-based Structure-based

  20. Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process The Questions • a set of 82 of the CLEF and TREC questions • facoid questions seeking for NE • significant coverage : questions classified into different domains

  21. Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Keyword-based evaluation  Accuracy and MRR have been improved after using semantic QE

  22. Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Structure-based evaluation  Accuracy and MRR have been improved after using semantic QE  Compared to the keyword-based PR, the structure-based PR gives The best Accuracy and MRR

  23. Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Summarize Yes No Semantic Query Expansion Acc. 1,22% MRR 0,99 Acc. 7,32% MRR 3,25 Keyword-based PR Acc. 19,51% MRR 7,85 Acc. 15,85% MRR 5,46 Structure-based PR

  24. Question Expected Answer CG-EA Semantic score (p1) Generalization (CG-P1,CG-EA) P1 sub passage CG1 Semantic score (pi) Generalization (CG-Pi,CG-EA) Pi sub passage CGi Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Presentation

  25. Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) >> Using Google Search Engine

  26. Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) >> Passages Ranks after LEVEL 1 (Keyword-based) and LEVEL 2 (Structure-based)

  27. Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) The expected answer is: تقع أعلى نقطة على سطح الأرض في ... • CG-EA : [نقطة]- • -attr->[أعلى], • -ala->[الأرض], • <-agnt-[تقع]-fi->[مفهوم عام]

  28. Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) Semantic Score Formula SemanticScore(P) = ∑(weight(ci)*β(ci,π(ci)))/ ∑(weight(ci) ci  C

  29. Conclusion & Future Work • The keyword-based and structure-based levels of our Arabic PR approach have improved the Accuracy and the MRR in the context of Q/A systems • A semantic reasoning level on top of the first and second levels could impove even more the reached performances • Covering all CLEF and TREC questions • Automating the semantic reasoning level module • Conducting corresponding experiments • Integrating more enriched releases of Arabic WordNet

  30. Thank you for your attention >> Questions

More Related