ASQA – Academia Sinica Question Answering System

Question Processing InfoMap SVM AutoTag Passage Retrieval Lucene AutoTag ASQA – Academia Sinica Question Answering System A Hybrid Architecture for Answering Chinese Factoid Questions Cheng-Wei Lee, Min-Yuh Day, Tzorg-Han Tsai, Tian-Jian Jiang, Cheng-Wei Shih, Chia-Wei Wu, Yu-Ren Chen, Shih-Hung Wu, Wen-Lian Hsu We propose an architecture for answering Chinese factoid questions in NTCIR CLQA C-C Task. Our system outputs exact answers for factoid questions of six types. Questions are analyzed to obtain question types, keywords, and focuses. Through a simple mapping table, question types are used to constrain possible answer types. Documents are segmented and indexed with both characters and words. After question analysis, queries are constructed from keywords to retrieve possible document passages, which are then sent to a named entity recognition system to obtain answer candidates. Finally, answers are ranked based on the needs of question focuses. Our proposed ASQA successfully combines machine learning and knowledge-based approaches, achieving approximately 40% and 50% accuracy for the Top 1 and Top 5 answers respectively. 1. Answering Factoid Questions 3. Question Analysis 2. Architecture questions When ASQA receives a question, it is analyzed by the Question Processing module to obtain question segments, question types, and question focuses. There are 6 coarse-grained question types and 62 fine-grained question types, which are identified by both InfoMap and a Support Vector Machines (SVM) model. 請問台灣童謠「天黑黑」是由哪位作曲家所創作？ Answer Extraction 郭芝苑 Mencius types focus documents segments, pos answer candidates passages Answer Ranking char index word index answers 5. Answering Extraction and Ranking 4. Passage Retrievals Passages are sent to Mencius, our Chinese named entity recognition (NER) engine, to retrieve answer candidates. Mencius use maximum entropy (ME) models to discriminate coarse-grained NE types. In addition, Mencius encodes templates and knowledge in InfoMap to recognize fine-grained NEs and other coarse-grained NEs, such as TIME, NUMBER, and ARTIFACT. After filtering by a QType-to-AType mapping table, answer candidates are given ranking We split documents into small passages using punctuation marks “,!?” and index them by Lucene, a high-performance, full-featured text search engine library. We use DotLucene, the C# version of Lucene, to index our documents. Two indices are created in ASQA. One is indexed by Chinese characters; the other is by Chinese words. Queries are sent to the indices separately, and the results are combined and sorted by the scores returned by Lucene. In the query, different weights are given to keywords with different POS. If there are not sufficient returned passages of the original query, a relaxed version of query will be sent to obtain more passages. 6. Evaluation There are two metrics reported from NTCIR CLQA Organizer about our system (ASQA) that we achieve 37.5% and 44.5% accuracy of correct answers and correct+unsupported answers respectively. scores according to question focus or cue of the question. The answer candidate with the highest score is the one that fits most clues about the question and become the system output. Original Query: +"作曲家"^1.2 +"台灣"^1.2 "創作"^0.7 +"童謠"^1.2 +"天黑黑"^2 Relaxed Version: 作曲家"^1.2 "台灣"^1.2 "創作"^0.7 "童謠"^1.2 "天黑黑"^2 Acknowledgement We would like to thank CKIP for providing us AutoTag for Chinese word segmentation.

ASQA – Academia Sinica Question Answering System

ASQA – Academia Sinica Question Answering System

Presentation Transcript

Multi-Perspective Question Answering

SIMS 290-2: Applied Natural Language Processing

After all the coding is done ...

The Clinician’s Guide to Answering the Service-Connected Classification Question

Quiz 1-B Question Menu

Intelligent Web-based Interactive Language Learning

Answering Drug Information Questions

Open Domain Question Answering: Techniques, Resources and Systems

The Integumentary System

Urinary System

REVIEWBOWL RULES

Answering Queries Using Views: A Survey

MC-Quiz: Chapter 7 - Storage

AP Government Review

November 18, 2013 Ch. 9 Key Issue 1

Question Answering Tutorial

Study of Exclusive Hard Processes with Hadron Beams at J-PARC

Advanced Question Answering: Plenty of Challenges to Go Around

Quiz 1-A Question Menu

Publishing biodiversity data through IPT2