240 likes | 322 Vues
Explore various transcription approaches, genres, and languages for linguistic research tasks. Analyze inter-transcriber consistency and study results to enhance future transcription work. Understand the impact of genres, dialects, and transcription methods on overall consistency.
E N D
Transcription methods for consistency, volume and efficiency Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li
Outline • Introduction • Manual transcription overview • Approaches, genres, languages • Inter-transcriber consistency analysis • Results • Discussion • Conclusions and future work
Introduction • Linguistic Data Consortium supports language-related education, research and technology development • Programs that call for manual transcription include DARPA GALE, Phanotics, and NIST LRE, SRE, and RT • Transcription is a core component of many HLT research tasks, such as machine translation • Manual transcription efforts have been undertaken many languages, such as Chinese, English, Modern Standard Arabic, Arabic dialects, Pashto, Urdu, Farsi, Korean, Thai, Russian, and Spanish • LDC makes recommendations about transcription approaches for individual projects by balancing efficiency, cost, program needs to make such recommendations • Evaluation vs. training data? Genre? Timeline? • Current consistency study is informative for transcription teams • Could be used to establish baseline human performance for each task
Transcription spectrum overview • All manual transcripts share the same core elements • Time alignment at some level of granularity • Speaker identification • Transcript • Created in XTrans, LDC’s in-house transcription tool • Transcription methodologies target a range of data needs: • Small volumes (2 hours) vs. large volumes (thousands of hours) • Quick content transcript vs. meticulous transcription of all speaker utterances or noises • Auto time alignment vs. manual word- or phoneme-level alignment
Transcription consistency studies • Design overview • Pair-wise comparison of independent dual transcripts • Identical time-alignment • Scored with NIST’s SCLITE toolkit (Fiscus, 2006) • For English subset, differences adjudicated with in-house adjudication tool • Background study: EARS RT03 • English broadcast news (BN) and conversational telephone speech (CTS) • Careful transcription approach • All discrepancies adjudicated
Results of RT03 study • Error expressed in terms of “Word Disagreement Rate”, though derived from Word Error Rate • Not all transcription “errors” are truly mistakes, so “disagreement” is a more accurate term. (Strassel, 2004)
Current consistency study • Basic assumptions • Same as previous • For the purposes of the current study, LDC ignored stylistic differences such as capitalization or punctuation • Subset of English transcripts further analyzed using LDC’s in-house adjudication tool. • Approximately • 65% of the differences across all the English quick transcripts were labeled insignificant differences • 20% were labeled judgment calls • 15% were labeled transcriber errors
Genre effect • Spontaneous, conversational genres are worse overall than broadcast news • BN is often very scripted • Limited overlapping speech • Unplanned speech is hard! • Overlapping speech or cross-talk • Background noise • Fast speech • More regions of disfluency • Dialect or accent • Meeting, interview and telephone domains also add • Jargon about specific topics • Challenging acoustic conditions
Disfluencies • Regions of disfluency are by far the most prevalent contributors to transcriber disagreement • Hesitation sounds • Stammering • Restarts
Dialect • Arabic transcription typically targets Modern Standard Arabic (MSA) • Dialect poses a particular challenge in transcribing Arabic conversations • Real data contains significant volumes of dialectal Arabic, especially in the broadcast conversation domain • Transcribers may differ in their rendering of non-MSA regions • In the following examples, Non-MSA regions are underlined. • Discrepancies are highlighted
Conclusions and future work • Cross-language, cross-genre inter-annotator analysis showed agreement in the 90-100% range • Transcripts for planned speech are generally more consistent than those for spontaneous speech • Careful transcription methods result in higher rates of agreement than quick transcription methods • Agreement is strongly conditioned by genre and language • When choosing a transcription approach for a project, LDC must balance efficiency, consistency, and cost • More investigation into what’s going on with Chinese is necessary • Future study on larger datasets • Examination of selection sample • The data discussed here ultimately will be compiled into one corpus and distributed to via the LDC catalog
Acknowledgments • Many thanks to the LDC transcription team for their hard work and analysis for the 2010 consistency study • Thanks to Jonathan Fiscus for his guidance in running SCLITE • This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this paper does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.