Outline

Motivations Objectives QAST 2007 Tasks Participants Results QAST 2008 Conclusion Outline

QAST Organization Evaluation campaign is jointly organized by : • UPC, Spain (J. Turmo, P. Comas) Coordinator • ELDA, France (N. Moreau, C. Ayache, D. Mostefa) • LIMSI, France (S. Rosset, L. Lamel)

Much of human interaction is via spoken language QA research developed techniques for written texts with correct syntactic and semantic structures Spoken data is very different from textual data Speech phenomena, false starts, speech corrections, truncated words, etc Grammatical structure of spontenous speech is very particular No punctuation and no capitalization For meetings, interaction creates run-on sentences where the distance between the first part and the last one can be very long Motivations

In general, motivating and driving the design of novel and robust factual QA architectures for automatic speech transcriptions. Comparing the performances systems dealing with both types of transcriptions and both types of questions (fatual and definitional). Measuring the loss of each system due to ASR. Measuring the loss of each system due to the ASR output degradation. Objectives

Corpus: The CHIL corpus: 25 seminars of 1 hour each Spontenous speech English spoken by non native speakers Domain of lectures: Speech and language processing Manual transcription done by ELDA Automatic transcription provided by LIMSI The AMI corpus: 168 meetings (100 hours) Spontenous speech English Domain of meetings: Design of television remote control Manual transcription done by AMI Automatic transcription provided by AMI 4 tasks: T1 : QA in manual transcriptions of lectures T2 : QA in automatic transcriptions of lectures T3 : QA in manual transcriptions of meetings T4 : QA in automatic transcriptions of meetings QAST 2007: Resources and tasks

For each task, 2 sets of questions were provided: Development set: Lectures: 10 seminars, 50 questions Meetings: 50 meetings, 50 questions Evaluation set: Lectures: 15 seminars, 100 questions Meetings: 118 meetings, 100 questions Factual questions. No definition questions. Expected answers = named entities. List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material. QAST 2007 : development and evaluation

Assessors used QASTLE, an evaluation tool developed by ELDA, to evaluate the data. QAST 2007: Human judgment

Four possible judgments: Correct Incorrect Non-Exact Unsupported Two metrics were used: Mean Reciprocal Rank (MRR): measures how well ranked is a right answer. Accuracy: the fraction of correct answers ranked in the first position in the list of 5 possible answers Participants could submit up to 2 submissions per task and 5 answers per question. Task: Scoring

Five teams submitted results for one or more QAST tasks: CLT, Center for Language Technology, Australia ; DFKI, Germany ; LIMSI, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ; TOKYO, Tokyo Institute of Technology, Japan ; UPC, Universitat Politècnica de Catalunya, Spain. In total, 28 submission files were evaluated: Participants

Results for CHIL lectures (T1 and T2)

Results for AMI meetings (T3 and T4)

Extension of QAST 2007: 3 languages: French, English, Spanish 4 domains: Broadcast news, Parliament speeches, Lectures, Meetings Different level of WERs (10%, 20% and 30%) Factual and Definition questions 5 corpora CHIL lectures AMI meetings TC-STAR05 EPPS English corpus TC-STAR05 EPPS Spanish corpus ESTER French broadcast news corpus Evaluation from June 15-June 30 QAST 2008

T1a: Question Answering in manual transcriptions of lectures (CHIL corpus) T1b: Question Answering in automatic transcriptions of lectures (CHIL corpus) T2a: Question Answering in manual transcriptions of meetings (AMI corpus) T2b: Question Answering in automatic transcriptions of meetings (AMI corpus) T3a: Question Answering in manual transcriptions of broadcast news for French (ESTER corpus) T3b: Question Answering in automatic transcriptions of broadcast news for French (ESTER corpus) T4a: Question Answering in manual transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T4b: Question Answering in automatic transcriptions of European Parliament Plenary sessions in English (EPPS English corpus) T5a: Question Answering in manual transcriptions of European Parliament Plenary sessions in Spanish (EPPS Spanish corpus) T5b: Question Answering in automatic transcriptions of European Parliament Plenary in Spanish (EPPS Spanish corpus) QAST 2008 tasks

We presented the Question Answering on Speech Transcripts evaluation campaigns framework QAST 2007 5 participants from 5 different countries (France, Germany, Spain, Australia and Japan)  28 runs Encouraging results High loss in accuracy with ASR output Conclusion and future work (1/2)

QAST 2008 is an extension of QAST 2007 (3 languages, 4 domains, definition and factual questions, multiple ASR outputs with different WERs) It’s still time to join QAST 2008 (participation is free) Future work aims at including: Cross lingual tasks, Oral questions, Other domains. Conclusion and future work (2/2)

The QAST Website: http://www.lsi.upc.edu/~qast/ For more information

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

OUTLINE