1 / 19

Context in Multilingual Tone and Pitch Accent Recognition

Context in Multilingual Tone and Pitch Accent Recognition. Gina-Anne Levow University of Chicago September 7, 2005. Roadmap. Motivating Context Data Collections & Processing Modeling Context for Tone and Pitch Accent Context in Recognition Conclusion. Challenges.

breena
Télécharger la présentation

Context in Multilingual Tone and Pitch Accent Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005

  2. Roadmap • Motivating Context • Data Collections & Processing • Modeling Context for Tone and Pitch Accent • Context in Recognition • Conclusion

  3. Challenges • Tone and Pitch Accent Recognition • Key component of language understanding • Lexical tone carries word meaning • Pitch accent carries semantic, pragmatic, discourse meaning • Non-canonical form (Shen 90, Shih 00, Xu 01) • Tonal coarticulation modifies surface realization • In extreme cases, fall becomes rise • Tone is relative • To speaker range • High for male may be low for female • To phrase range, other tones • E.g. downstep

  4. Strategy • Common model across languages, SVM classifier • Acoustic-prosodic model: no word label, POS, lexical stress info • No explicit tone label sequence model • English, Mandarin Chinese (also Cantonese) • Exploit contextual information • Features from adjacent syllables • Height, shape: direct, relative • Compensate for phrase contour • Analyze impact of • Context position, context encoding, context type • > 20% relative improvement over no context • Preceding context greater enhancement than following

  5. Data Collection & Processing • English: (Ostendorf et al, 95) • Boston University Radio News Corpus, f2b • Manually ToBI annotated, aligned, syllabified • Pitch accent aligned to syllables • Unaccented, High, Downstepped High, Low • (Sun 02, Ross & Ostendorf 95) • Mandarin: • TDT2 Voice of America Mandarin Broadcast News • Automatically force aligned to anchor scripts (CUSonic) • High, Mid-rising, Low, High falling, Neutral

  6. Local Feature Extraction • Uniform representation for tone, pitch accent • Motivated by Pitch Target Approximation Model • Tone/pitch accent target exponentially approached • Linear target: height, slope (Xu et al, 99) • Scalar features: • Pitch, Intensity max, mean (Praat, speaker normalized) • Pitch at 5 points across voiced region • Duration • Initial, final in phrase • Slope: • Linear fit to last half of pitch contour

  7. Context Features • Local context: • Extended features • Pitch max, mean, adjacent points of preceding, following syllables • Difference features • Difference between • Pitch max, mean, mid, slope • Intensity max, mean • Of preceding, following and current syllable • Phrasal context: • Compute collection average phrase slope • Compute scalar pitch values, adjusted for slope

  8. Classification Experiments • Classifier: Support Vector Machine • Linear kernel • Multiclass formulation • (SVMlight, Joachims), LibSVM (Cheng & Lin 01) • 4:1 training / test splits • Experiments: Effects of • Context position: preceding, following, none, both • Context encoding: Extended/Difference • Context type: local, phrasal

  9. Results: Local Context

  10. Results: Local Context

  11. Results: Local Context

  12. Discussion: Local Context • Any context information improves over none • Preceding context information consistently improves over none or following context information • English: Generally more context features are better • Mandarin: Following context can degrade • Little difference in encoding (Extend vs Diffs) • Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory

  13. Results & Discussion: Phrasal Context • Phrase contour compensation enhances recognition • Simple strategy • Use of non-linear slope compensate may improve

  14. Conclusion • Employ common acoustic representation • Tone (Mandarin), pitch accent (English) • Cantonese, recent experiments • SVM classifiers - linear kernel: 76%, 81% • Local context effects: • Up to > 20% relative reduction in error • Preceding context greatest contribution • Carryover vs anticipatory • Phrasal context effects: • Compensation for phrasal contour improves recognition

  15. Current & Future Work • Application of model to different languages • Cantonese, Dschang (Bantu family) • Cantonese: ~65% acoustic only, 85% w/segmental • Integration of additional contextual influence • Topic, turn, discourse structure • HMSVM, GHMM models • http://people.cs.uchicago.edu/~levow/projects/tai • Supported by NSF Grant #: 0414919

  16. Confusion Matrix (English)

  17. Confusion Matrix (Mandarin)

  18. Related Work • Tonal coarticulation: • Xu & Sun,02; Xu 97;Shih & Kochanski 00 • English pitch accent • X. Sun, 02; Hasegawa-Johnson et al, 04; Ross & Ostendorf 95 • Lexical tone recognition • SVM recognition of Thai tone: Thubthong 01 • Context-dependent tone models • Wang & Seneff 00, Zhou et al 04

  19. Pitch Target Approximation Model • Pitch target: • Linear model: • Exponentially approximated: • In practice, assume target well-approximated by mid-point (Sun, 02)

More Related