Automatic Prosody Labeling Final Presentation

Automatic Prosody LabelingFinal Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio Processing and Recognition 4/27/05

Overview • Project Goal • ToBI standard for prosodic labeling • Previous Work • Method • Results • Conclusion

Project Goal: • Automatic assignment of tones tier elements • Given the waveform, orthographic and break index tiers, predict a subset/simplification of elements in the tones tier. • Distinct experiments for determining each of pitch accents, phrase tones, and phrase boundary tones

ToBI Annotation • Tones and Break Index (ToBI) labeling scheme consists of a speech waveform and 4 tiers: • Tones • Annotation of pitch accents and phrasal tones • Orthographic • Transcription of text • Break Index • Pauses between words, rated on a scale from 0-4. • Miscellaneous • Notes about the annotation (e.g., ambiguities, non-speech noise)

ToBI Transcription Example

ToBI Examples • Pitch Accents (made3.wav): • H*, L*, L+H* • Boundary Tones (money.wav): • L-H%, H-H%, L-L%, H-L%, (H-, L-)

Previous Work • Ross: “Prediction of abstract prosodic labels for speech synthesis” 1996 • BU Radio News Corpus (~48 minutes) • Public news broadcasts spoken by 7 speakers • Uses decision tree output as input to an HMM for pitch accent identification; Decision trees for phrase/boundary tone identification • Employs no acoustic features. • Narayanan: “An Automatic Prosody Recognizer using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language Model” 2005 • BU Radio News Corpus • Detects stressed syllables (collapsed ToBI labels) and all boundaries. • Uses CHMM on pitch, intensity and duration to track these “asynchronous” acoustic features, and a trigram POS/stress-boundary language model • Wightman: “Automatic Labeling of Prosodic Patterns” 1994 • Single speaker subset of BNC and ambiguous sentence corpus (read speech). • Like Ross, uses decision tree output as input to HMM • Uses many acoustic features

Method • JRip • Classification rule learner • Better at working with nominal attributes • Easier to read output • Corpus • Boston Direction Corpus • 4 speakers • ~65 minutes of semi-spontaneous speech • Original Plan: • HMMs and SVMs • SVMs took a prohibitive amount of time to learn and performed worse. • HMM implementation problems, and not enough time to implement my own

Method - Features • Min, max, mean, std.dev. F0 and Intensity • # Syllables, Duration, approx. vowel length, POS • F0 slope (weighted) • zscore of max F0 and intensity • Phrase-length F0, intensity and vowel length features • Phrase position

Results - Tasks • Pitch Accent • Identification • Detection • Phrase Tone identification • Boundary Tone identification • Phrase/Boundary Tone • Identification • Detection

Results - Pitch Accent Identification • Accuracy • Relevant Features • # syllables, duration (previous 2), vowel length (prev, next 2), POS, max & stdev F0, slope F0, max & stdev intensity, zscore of F0, phrase level zscore of F0 and intensity *Ross identifies a different subset of ToBI pitch accents

Results - Pitch Accent Detection Baseline: 58.9% On BNC, human agreement of 91%, in general 86-88% Idenical relevant features as id task

Results - Phrase Tone • Accuracy • Relevant Features • Duration of next word, max, min, mean F0. • Linear slope F0, zscore of intensity, phrase zscores of F0 and intensity

Results - Boundary Tone Identification • Accuracy • Relevant Features • Quadratically weighted F0 slope

Results - Phrase/Boundary Tone Identification • Accuracy • Relevant Features • Duration of next two words, POS (current and 2 next), max, mean and slope (all weighting) of F0, mean intensity, phrase zscores of F0 and intensity, • zscore of difference in max intensity in the current word and the phrase.

Results – Phrase/Boundary Tone Detection • Accuracy • Human agreement (in general): 95% • Best agreement: 93.0% over 77% baseline • Relevant Features • Vowel length (current and next word) • POS of the next word

Conclusion • Relatively low-tech acoustic features and ml algorithms can perform competitively with more complicated NLP approaches • Break index information was not as helpful as initially suspected. • Potential Improvements: • Sequential Modeling (HMM) • Different features • More sophisticated pitch contour feature • Content-based features (similar to Ross)

Automatic Prosody Labeling Final Presentation

Automatic Prosody Labeling Final Presentation

Presentation Transcript

Final Presentation

Building a sentential model for automatic prosody evaluation

Final Presentation

Prosody

Final Presentation

Final Presentation

Automatic Labeling of Multinomial Topic Models

Automatic presentation

AMP: Automatic Meeting Planner Final presentation

Automatic Semantic Role Labeling

Final Presentation

Final Presentation

Automatic Labeling of Multinomial Topic Models

Project 2: Automatic Image Labeling

Automatic Labeling of Semantic Roles

Prosody

Final Presentation

Final Presentation

Final Presentation

Final Presentation

How to Do Automatic Labeling