1 / 11

Enhancements in Transcription Techniques: The AMITIES Corpus Progress Report

This report details the current status of the AMITIES project, highlighting the transcription of approximately 716 English dialogues from GE Leeds. Out of these, 642 were classified as "good," with transcribers utilizing the Transcriber tool version 1.4.2. Though most transcriptions adhered to guidelines, challenges such as overlapping and incomplete annotations were identified. A new "exception" section was introduced to effectively encapsulate these inconsistencies, thereby improving annotation quality. Recent dialogues show a significant enhancement in accuracy.

taber
Télécharger la présentation

Enhancements in Transcription Techniques: The AMITIES Corpus Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The AMITIÉS Corpus up to the minute report

  2. The GE English corpus • Around 716 English dialogues were received so far from GE Leeds of which 642 are “good ones”. • The GE transcribers use the Transcriber tool version 1.4.2 to deliver ( *.TRS ) documents based on an XML syntax

  3. Good things • The TRS documents being XML based are very suitable for automatic processing and delivering of the format we are interested in (DAMSL like for example ). • The transcribers successfully applied the AMITIES guidelines for transcribing.

  4. Issues • They started to transcribe the audio files using the Turn and Utterance levels of annotation provided by the Transcriber tool. • We noticed that some strange situations like:overlapping, acknowledging, completion failed to be represented correctly in the received TRS documents.

  5. Solution and examples • Making use of the third logical level of annotation provided by the Transcriber, called Section. • The transcribers were required to create a new Section level called “exception” and to use it to encapsulate all the Turns containing one of the situations described previously.

  6. Example of overlapping BEFORE using the “exception” section DAMSL LIKE annotation A: That’s A: [lovely](1) C: [Hello](1) A: my name’s Louise Mr Smith and you want to change address? AFTER using the “exception” section DAMSL LIKE annotation A: That’s [lovely](1) my name’s Louise Mr Smith and you want to change address? C: [Hello](1)

  7. Example of acknowledging similar to completion BEFORE using the “exception” section DAMSL LIKE annotation A: And your telephone number please? C: 11111 A: Uh hmmm C: 111 A: Uh hmmm C: 111111 AFTER using the “exception” section DAMSL LIKE annotation A: And your telephone number please? C: 11111 [](1) 111 [](2) 111111 A: [Uh hmmm](1) [Uh hmmm](2)

  8. Addition facts • The Turns that were not considered to be exceptions were encapsulated by the default Section. • We trained the transcribers to use this logical level and the last 100 dialogues received are annotated with the “exception” level. • 542 dialogues are not annotated with this level.

  9. A rough classification of the corpus

  10. Task distribution inside the 100 exception annotated dialogues

  11. Thank you.

More Related