1 / 17

D 3 : Passage Retrieval

Group 3 Chad Mills Esad Suskic Wee Teck Tan. D 3 : Passage Retrieval. Outline. System and Data Document Retrieval Passage Retrieval Results Conclusion. System and Data. System: Indri http://www.lemurproject.org / Data:. Document Retrieval. Baseline: Remove “?” Add Target String

jamal
Télécharger la présentation

D 3 : Passage Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Group 3 Chad Mills Esad Suskic Wee Teck Tan D3: Passage Retrieval

  2. Outline • System and Data • Document Retrieval • Passage Retrieval • Results • Conclusion

  3. System and Data • System: Indri http://www.lemurproject.org/ • Data:

  4. Document Retrieval • Baseline: • Remove “?” • Add Target String • MAP: 0.307

  5. Best so far: 0.307 Document Retrieval • Attempted Improvement 1: • Settings From Baseline • Rewrite “When was…” questions as “[target] was [last word] on” queries • MAP: 0.301

  6. Best so far: 0.307 Document Retrieval • Attempted Improvement 2: • Settings From Baseline • Remove “Wh” words • Remove Stop Words • Replaced Pronoun with Target String • MAP: 0.319

  7. Best so far: 0.319 Document Retrieval • Attempted Improvement 3: • Settings From Improvement 2 • Index Stemmed (Krovetz Stemmer) • MAP: 0.336

  8. Best so far: 0.336 Document Retrieval • Attempted Improvement 4: • Settings From Improvement 3 • Remove Punctuations • Remove Non Alphanumeric Characters • MAP: 0.374

  9. Best so far: 0.374 Document Retrieval • Attempted Improvement 5: • Settings From Improvement 4 • Remove Duplicate Words • MAP: 0.377

  10. Passage Retrieval • Baseline: • Out-of-the-box Indri • Same Question Formulation • Changed “#combine(“ to “#combine[passageX:Y](” • Passage Window, Top 20, No Re-ranking

  11. Passage Retrieval • Attempted Re-ranking • Mallet MaxEnt Classifier • Training Set TREC 2004 • 80% Train : 20% Dev • Split by Target • Avoid Cheating • e.g. Question 1.* all in either Train or Dev • Labels: • + Passage has Correct Answer • - Passage doesn’t have Answer

  12. Passage Retrieval • Features used: • For both Passage and Question+Target: • unigram, bigram, trigram • POS tags – unigram, bigram, trigram • Question/Passage Correspondence: • # of Overlapping Terms (and bigrams) • Distance between Overlapping Terms • Tried Top 20 Passages from Indri, and Expanding to Top 200 Passages

  13. Passage Retrieval • Result: all attempts were worse than before • Example confusion matrix: • Many negative examples, 67-69% accurate on all feature combinations tried

  14. Passage Re-Ranking • Indri was very good to start with • E.g. Q10.1 • Our first 2 were wrong, only 1 of Indri’s top 5 in our top 5 • If completely replacing rank, must be very good • Many low confidence scores (e.g. 7.6% P(Yes) was best) • Slight edit to Indri ranking less bad, but no good system found • E.g. bump high-confidence Yes to top of list, leave others in Indri order

  15. Results • TREC 2004: • TREC 2005:

  16. References • Fang – “A Re-examination of Query Expansion Using Lexical Resources” • Tellex – “Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering”

  17. Conclusions • Cleaned Input • Small Targeted Stop Word List • Minimal Setting • Indri Performs PR Well OOTB • Re-ranking Implementation Needs to be Really Good • Feature Selection didn’t Help • Slight Adjustment Instead of Whole Different Ranking Might Help

More Related