1 / 57

Answer Extraction as Sequence Tagging with Tree Edit Distance

Answer Extraction as Sequence Tagging with Tree Edit Distance. Xuchen Yao, Benjamin Van Durme (Hopkins), Chris Callison-Burch (UPenn) and Peter Clark (Vulcan). B-ANS. O. O. O. O. O. Tennis. player. 23. Jennifer. Capriati. is. Treating QA as Sequence Tagging.

maine
Télécharger la présentation

Answer Extraction as Sequence Tagging with Tree Edit Distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answer Extraction as Sequence Taggingwith Tree Edit Distance Xuchen Yao, Benjamin Van Durme (Hopkins), Chris Callison-Burch (UPenn) and Peter Clark (Vulcan)

  2. B-ANS O O O O O Tennis player 23 Jennifer Capriati is Treating QA as Sequence Tagging What sport does Jennifer Capriati play? • 3 types of hidden states: • B-ANS (beginning of answer) • I-ANS (inside of answer) • O (outside of answer, i.e., not an answer) NAACL 2013, ATLANTA

  3. Yt-1 Yt Yt+1 hidden states Y Xt-1 Xt Xt+1 observation X A Sequence Tagging Tasklinear-chain Conditional Random Field (CRF) • a conditional model p(y|x) • θ:feature weights to learn • f(yt,yt-1,xt): feature function • first order (only look back the previous state) • inspect the whole sequence of x NAACL 2013, ATLANTA

  4. It's The features That Matter • We aim at: • question/answer template-free • easy to design • fast to extract • We end up with: • Chunking-like features • Question type features • Edit script features • Alignment distance features NAACL 2013, ATLANTA

  5. It's The features That Matter • We aim at: • question/answer template-free • easy to design • fast to extract • We end up with: • Chunking-like features • Question type features • Edit script features • Alignment distance features NAACL 2013, ATLANTA

  6. B-ANS O O O O O Tennis player 23 Jennifer Capriati is nnp nmod PERSON-B vbz root - nnp nmod GAME-B nn nmod PER_DESC-B nnp subj PERSON-I cd prd CARDINAL-B Chunking-like Features • Each word t comes with pos[t], dep[t], ner[t] • Design features within a window size of 3: • pos[t], pos[t-1], pos[t-2], pos[t+1], pos[t+2] • pos[t-1]pos[t-2], pos[t-1]pos[t], pos[t]pos[t+1], pos[t+1]pos[t+2] • pos[t-2]pos[t-1]pos[t], pos[t-1]pos[t]pos[t+1], pos[t]pos[t+1]pos[t+ 2] • so as for dep[t], ner[t] NAACL 2013, ATLANTA

  7. Chunking-like Features + Question Type features • Combining chunking-like features with question types • who, whom, when, where, how many, how much, how long NAACL 2013, ATLANTA

  8. Chunking-like and Question-type Features capture a lot of traditional intuitions of question/answer templates

  9. But wait a minute... How do we know my Q and S are talking about the same thing?

  10. We don't know yet! • What sport does Jennifer Capriati play? • "Play" is a song by recording artist Jennifer Lopez. • Tennis player Capriati is 23. • Jennifer Lopez played softball in high school. • Answer-bearing sentence validation as early as during answer extraction • TED provides a mean to extract knowledge of shared structure between question and candidate sentences. NAACL 2013, ATLANTA

  11. Motivation for Tree Edit Distance • What sport does Jennifer Capriati play? • "Play" is a song by recording artist Jennifer Lopez. • Tennis player Capriati is 23. • Jennifer Lopez played softball in high school. • Answer-bearing sentence validation as early as during answer extraction • TED provides a mean to extract knowledge of shared structure between question and candidate sentences. NAACL 2013, ATLANTA

  12. B-ANS O O O O O Tennis player 23 Jennifer Capriati is What Sport play does Jennifer Capriati Edit Script Features (New)Are Q and S really talking about the same thing? NAACL 2013, ATLANTA

  13. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod nmod nmod subj tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod What sport does Jennifer Capriati play? Tennis player Jennifer Capriati is 23. jennifer nnp Tree Edit Modeldirection: answer tree → question treesince answer tree usually contains much more info prd TreeEditDist NAACL 2013, ATLANTA

  14. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod subj nmod nmod tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod jennifer nnp TED Alignment prd align(capriati/nnp/subj) renamePos(nn, nnp) NAACL 2013, ATLANTA

  15. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod subj nmod nmod tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod jennifer nnp TED Alignment with Wordnet prd Derived Form Hypernym NAACL 2013, ATLANTA

  16. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod nmod nmod subj tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod What sport does Jennifer Capriati play? Tennis player Jennifer Capriati is 23. jennifer nnp TED Deletion (w/o WordNet)delLeaf/delSubTree/del prd NAACL 2013, ATLANTA

  17. ins(play/vbz/vmod) ins(do/vbz/root) do vbz insSubTree: vmod vmod what wp play vbz nmod sport nn capriati nnp What sport does Jennifer Capriati play? jennifer nnp TED InsertioninsLeaf/insSubTree/ins subj capriati nnp nmod subj jennifer nn nmod Jennifer Capriati NAACL 2013, ATLANTA

  18. Cost Design(only allowing renaming when lemmas match) • Simple Cost Function: • If lemmas of two nodes are different: • Deletion and Insertion is encouraged • A single Del/Ins cost is 3 since lemma/pos/dep are deleted/inserted • If lemmas of two nodes are same: • Renaming is encouraged • Rename cost is either 1 if renaming only POS or Dep, or 2 if both NAACL 2013, ATLANTA

  19. Search Method • Heilman & Smith 2010: • greedy local search with tree kernels • slow • This work: • polynomial time dynamic programming (Zhang & Shasha, 1989) • walks in the post-order traversal of two trees and compare them • much faster: 10,000 tree pairs per second • for fast feature extraction on the go NAACL 2013, ATLANTA

  20. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod nmod nmod subj tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod What sport does Jennifer Capriati play? Tennis player Jennifer Capriati is 23. jennifer nnp A Complete Tree Edit ScriptinsLeaf(sport), ins(what), delLeaf(tennis), delLeaf(player), renamePos(Jennifer), insLeaf(play), del(23), del(is), ins(does) prd TreeEditDist NAACL 2013, ATLANTA

  21. be vbz prd subj capriati nnp 23 cd nmod nmod nmod nmod tennis nn sport nn jennifer nn sport nn player nn sport Tennis player Jennifer Capriati is 23. insLeaf(sport) NAACL 2013, ATLANTA

  22. be vbz vmod prd subj what wp capriati nnp 23 cd nmod nmod nmod nmod tennis nn sport nn jennifer nn sport nn player nn What sport Tennis player Jennifer Capriati is 23. ins(what) NAACL 2013, ATLANTA

  23. be vbz vmod prd subj what wp capriati nnp 23 cd nmod nmod nmod sport nn jennifer nn player nn What sport player Jennifer Capriati is 23. delLeaf(tennis) NAACL 2013, ATLANTA

  24. be vbz vmod prd subj what wp capriati nnp 23 cd nmod nmod sport nn jennifer nn What sport Jennifer Capriati is 23. delLeaf(player) NAACL 2013, ATLANTA

  25. renamePos(Jennifer/nn, Jennifer/nnp) be vbz vmod prd subj what wp capriati nnp 23 cd nmod nmod sport nn jennifer nnp What sport Jennifer Capriati is 23. NAACL 2013, ATLANTA

  26. be vbz vmod prd subj what wp 23 cd capriati nnp nmod nmod sport nn jennifer nnp What sport Jennifer Capriati play is 23. NAACL 2013, ATLANTA

  27. vmod be vbz play vbz vmod prd subj what wp 23 cd capriati nnp nmod nmod sport nn jennifer nnp What sport Jennifer Capriati play is 23. ins(play) NAACL 2013, ATLANTA

  28. vmod be vbz play vbz vmod subj what wp capriati nnp nmod sport nn jennifer nnp What sport Jennifer Capriati play is del(23) nmod NAACL 2013, ATLANTA

  29. be vbz vmod vmod what wp play vbz nmod subj sport nn capriati nnp nmod What sport Jennifer Capriati play is jennifer nnp del(23) NAACL 2013, ATLANTA

  30. vmod vmod what wp play vbz nmod subj sport nn capriati nnp nmod What sport Jennifer Capriati play? jennifer nnp del(be) NAACL 2013, ATLANTA

  31. do vbz vmod vmod what wp play vbz nmod subj sport nn capriati nnp nmod What sport does Jennifer Capriati play? jennifer nnp ins(do) NAACL 2013, ATLANTA

  32. B-ANS O O O O O Tennis player 23 Jennifer Capriati is What Sport play does Jennifer Capriati Edit Script Features (New)Are Q and S really talking about the same thing? NAACL 2013, ATLANTA

  33. Edit Script Featuresanswers are more likely to be in deleted info • Could've combined these features with question-type features • But adding them alone already boosted the performance quite a bit NAACL 2013, ATLANTA

  34. TED POS DEP NER System Architecture NAACL 2013, ATLANTA

  35. Feature extractor 1 chunking-like 2 question type 3 edit script based 4 align distance TED POS DEP NER System Architecture NAACL 2013, ATLANTA

  36. Feature extractor 1 chunking-like 2 question type 3 edit script based 4 align distance TED POS DEP NER CRFsuite output seq. tags: System Architecture NAACL 2013, ATLANTA

  37. Feature extractor 1 chunking-like 2 question type 3 edit script based 4 align distance TED POS DEP NER CRFsuite output seq. tags: B-ANS O O O O O Tennis player 23 Jennifer Capriati is System Architecture NAACL 2013, ATLANTA

  38. CRF What sport does Jennifer Capriati play? NAACL 2013, ATLANTA

  39. Dataset • Based on the dataset from Wang et. al (2007) • training set contains about half of TREC8-12 data • test set contains roughly half of TREC13 (2004) data • trained on only positive examples but tested on all examples • majority vote as the final answer NAACL 2013, ATLANTA

  40. Performance NAACL 2013, ATLANTA

  41. do vbz be vbz vmod subj vmod what wp capriati nnp play vbz 23 cd nmod nmod subj nmod nmod tennis nn jennifer nn sport nn sport nn player nn capriati nnp nmod jennifer nnp Naive Baselineforce alignment prd align(capriati/nnp/subj) align(tennis, what) renamePos(nn, nnp) NAACL 2013, ATLANTA

  42. Performance NAACL 2013, ATLANTA

  43. CRF What sport does Jennifer Capriati play? NAACL 2013, ATLANTA

  44. Performance NAACL 2013, ATLANTA

  45. During what war did Nimitz serve? CRF Forced • If CRF didn't give an answer, then mark those whose prob is 50 times of the median absolute deviation (MAD) from the median prob. • MAD= median(|X-median(X)|) NAACL 2013, ATLANTA

  46. How do the new TED features help?

  47. Ablation Test NAACL 2013, ATLANTA

  48. if the ablation test hasn't convinced you

  49. the QA Sentence Ranking Task • Introduced by Wang et al. 2007 • Mengqiu Wang, Noah A. Smith and Teruko Mitamura, What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering, In Proceedings of EMNLP '07, 2007 (Nominated for Best Paper Award) • Rank whether a candidate sentence contains an answer to the question • What sport does Jennifer Capriati play? • YES: Tennis player Jennifer Capriati is 23 • NO: Capriati, 23, was beaten in the second round on Wed. • Essentially an IR ranking task NAACL 2013, ATLANTA

  50. Using TED for Ranking QA Pairs • Extract features from the edit script of transforming from a candidate answer tree to question tree • Treat it as a binary classification task and train a logistic regression model on these features. NAACL 2013, ATLANTA

More Related