200 likes | 207 Vues
Summarizing Denition fromWikipedia Articles. Zhicheng Zheng, Xiaoyan Zhu Tsinghua University. Outline. Introduction Related work Method Experiments Conclusion. Outline. Introduction Related work Method Experiments Conclusion. Introduction.
E N D
Summarizing Denition fromWikipedia Articles Zhicheng Zheng, Xiaoyan Zhu Tsinghua University
Outline • Introduction • Related work • Method • Experiments • Conclusion
Outline • Introduction • Related work • Method • Experiments • Conclusion
Introduction • “Definition” from Wikipedia • A definition is a passage describing the meaning of a term (a word, phrase or other set of symbols), or a type of thing. • “Definition” treated in TREC-QA • “What is XXX” or“Who is XXX” • Or translated to “Tell something important or interesting about XXX”
Outline • Introduction • Related work • Method • Experiments • Conclusion
Related work • Pattern based methods [1,2] • Using patterns to generate potential definitional sentences • Advantages: accurate • Disadvantages: require manual labor or labeled data to generate patterns
Related work • Relevance based methods • interesting nuggets often come in the form of trivia, novel or rare facts about the topic that tend to strongly co-occur with direct mention of topic keywords [3] • Using summarization techniques • Using Wikipedia articles [4] • Using single article
Related work • Using multiple Wikipedia articles? • Benefits: improve summarization performance • users would better understand a topic if they read more related articles
Outline • Introduction • Related work • Method • Experiments • Conclusion
Method • Representation Model • Represent articles/sentences as vector of words/concepts • Word Model: TF/IDF • Not accurate enough • “Jordan Hill“ vs “Michael Jordan”, “Grant Hill” • Wikipedia Concept Model
Wikipedia concept model • Wikipedia concept: Title of Wikipedia article • Represent a piece of text with the Wikipedia concepts in the text • Similar as TF/IDF way of word model
Method • Wikipedia article expansion • Extract all related Wikipedia articles with main Wikipeida article • d and d’ are related d and d’ link with each other
Method • Summarizing definitions • Constraint: select sentences only from the original article (not from the other related articles) • Summarization method: • Maximal Marginal Relevance (MMR) [5] • Maximum Coverage (MC) [6]
Outline • Introduction • Related work • Method • Experiments • Conclusion
Experiment • Wikipedia articles: • Snapshots of English Wikipedia (2009) • Question sets: • Questions from TREC 13 – 15 • 215 definition questions • 190 could be found in Wikipedia • Evaluation metrics • Precision, Recall, F1, F3
Experiment • Results • Compare: • Word model vs. Wikipeida concept model • Single article vs. Multiple articles
Experiments • Analysis • Both the two algorithms benefit from the Wikipedia concept model • The related article set can help improving the performance in both the two algorithms. • The Wikipedia concept model contributes more to precision than to recall • The related article set leads to more improvement in terms of nugget recall than the Wikipedia concept model
Conclusion • A method of summarizing definition from multiple Wikipedia articles • Experiments show than Wikipedia concepts benefits the extraction of definition • Also show that the related articles help weight concepts more effectively
Thank you Any questions?
References [1] Xu, J., Licuanan, A., Weischedel, R.: Trec 2003 qa at bbn: Answering definitional questions. In: TREC 2003. [2] Cui, H., Kan, M.Y., Chua, T.S.: Generic soft pattern models for definitional question answering. In: SIGIR 2005. [3] Kor, K.W., Chua, T.S.: Interesting nuggets and their impact on definitional question answering. In: SIGIR 2007 [4] Ye, S., Chua, T.S., Lu, J.: Summarizing definition fromwikipedia. In: ACL 2009 [5] Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998 [6] Gillick, D., Favre, B.: A scalable global model for summarization. In: ACL 2009