1 / 15

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008. UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to… Main task QA organizing committee. Answer Validation Exercise 2008. Validate the correctness of real systems answers. Question.

adila
Télécharger la présentation

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd Answer Validation Exercise (AVE 2008)QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to… Main task QA organizing committee

  2. Answer Validation Exercise 2008 Validate the correctness of real systems answers Question Question Question Answering Candidate answer Answer Validation Answer is correct Supporting Text Answer is not correct or not enough evidence

  3. Collections Candidate answers grouped by question • <q id="116" lang="EN"> • <q_str>What is Zanussi?</q_str> • <a id="116_1" value=""> • <a_str>was an Italian producer of home appliances</a_str> • <t_str doc="Zanussi">Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought</t_str> • </a> • <a id="116_2" value=""> • <a_str>who had also been in Cassibile since August 31</a_str> • <t_str doc="en/p29/2998260.xml">Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August 31.</t_str> • </a> • <a id="116_4" value=""> • <a_str>3</a_str> • <t_str doc="1618911.xml">(1985) 3 Out of 5 Live (1985) What Is This?</t_str> • </a> • </q> - Accept or Reject all answers - Select one of the accepted answers

  4. Collections • Remove duplicated answers inside the same question group • Discard NIL answers, void answers and answers with too long supporting snippet • This processing lead to a reduction in the number of answers to be validated

  5. AVE Collections 2008(# answers to validate) Available for CLEF participants at nlp.uned.es/clef-qa/ave/

  6. Evaluation • Not balanced collections (real world) • Approach: Detect if there is enough evidence to accept an answer • Measures: Precision, recall and F over ACCEPTED answers • Baseline system: Accept all answers

  7. Participants and runs

  8. Evaluation: P, R, F Precision, Recall and F measure over correct answers for English

  9. Additional measures • Compare AVE systems with QA systems performance • Count the answers SELECTED correctly • Reward the detection of groups in which all answers are incorrect • Allows a new justified attempt to answer the question new

  10. Additional measures new new new

  11. Evaluation: estimated performance

  12. Comparing AV systems performance with QA systems (English)

  13. Techniques reported at AVE 2007 & 2008 • 10 reports (2007) • 9 reports (2008)

  14. Conclusion (of AVE) • Three years of evaluation in a real environment • Real systems outputs -> AVE input • Developed methodologies • Build collections from QA responses • Evaluate in chain with a QA Track • Compare results with QA systems • Introduction of RTE techniques in QA • More NLP • More Machine Learning • New testing collections for the QA and RTE communities • In 8 languages, not only English

  15. Many Thanks!! • CLEF • AVE QA Organizing Committee • AVE participants • UNED team

More Related