1 / 70

Intonation and Multi-Language Scenarios

Intonation and Multi-Language Scenarios. Andrew Rosenberg Candidacy Exam Presentation October 4, 2006. Talk Overview. Use and Meaning of Intonation Automatic Analysis of Intonation “Multi-Language Scenarios” Second Language Learning Systems Speech-to-Speech Translation.

talon
Télécharger la présentation

Intonation and Multi-Language Scenarios

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intonation and Multi-Language Scenarios Andrew Rosenberg Candidacy Exam Presentation October 4, 2006

  2. Talk Overview • Use and Meaning of Intonation • Automatic Analysis of Intonation • “Multi-Language Scenarios” • Second Language Learning Systems • Speech-to-Speech Translation

  3. Use and Meaning of Intonation • Why do multi-language scenarios need intonation? • Intonation indicates focus and contrast • Intonation disambiguates meaning • Intonation indicates how language is being used • Discourse structure, Speech acts, Paralinguistics

  4. Examples of Intonational FeaturesToBI Examples • Categorical Features • Pitch Accent • H* - Mariana(H*) won it • L+H* - Mariana(L+H*) won it • L* - Will you have marmalade(L*) or jam(L*) • Phrase Boundaries • Intermediate Phrase Boundary (3) • Intonational Phrase Boundary (4) • Oh I don’t know (4) it’s got oregano (3) and marjoram (3) and some fresh basil (4) • Continuous Features • Pitch • Intensity • Duration

  5. Use and Meaning of IntonationPaper List • Emphasis • Accent is Predictable (If You’re a Mind Reader) Bolinger, 1972 • Prosodic Analysis and the Given/New Distinction Brown, 1983 • The Prosody of Questions in Natural Discourse Hedberg and Sosa, 2002 • Syntax • The Use of Prosody in Syntactic Disambiguation Price, et al., 1991 • Discourse Structure • Prosodic Analysis of Discourse Segments in Direction Giving Monologues Nakatani and Hirschberg, 1996 • Paralinguistics • Acoustic Correlates of Emotion Dimension in View of Speech Synthesis Schröder, et al., 2001

  6. Accent is Predictable (If You're a Mind Reader)Dwight Bolinger, 1972Harvard University • Nuclear Stress Rule • Stress is assigned to the rightmost stress-able vowel in a major constituent • (Chomsky and Halle 1968) “Once the speaker has selected a sentence with a particular syntactic structure and certain lexical items...the choice of stress contour is not a matter subject to further independent decision” • Selected Counterexamples to NSR • Coordinated Infinitives can be accented or not • I have a clock to clean and oil v. I still have most of the garden to weed and fertilize • Terminal prepositions are rarely accented • I need a light to read by • Focus v. Topic v. Comment • Why are you coming indoors? -- I’m coming indoors because the sun is shining • Predictable or less semantically rich items are less likely to be accented • I have a point to make v. I have a point to emphasize • I’ve got to go see a guy v. I’ve got to go see a friend [semi-pronouns?]

  7. Prosodic Analysis and the Given/New DistinctionGillian Brown, 1983 • Information Structure of Discourse Entities (Prince 1981) • Given (or Evoked) Information: “recoverable either anaphorically or situational” (Halliday,1967) • New Information: “non recoverable...” • Inferable Information: e.g. driver is inferable given bus • Experiment • One subject was asked to describe a diagram to another who would reproduce it. • Entities are marked as new/brand-new, new/inferred, evoked/context (pen, paper, etc.), evoked/current (most recent mention) or evoked/displaced (previously mentioned) • Prominence realizations • 87% of new/brand-new and 79% of new/inferred entities • 2% of evoked/context, 0% of evoked current, 4% of evoked/displaced

  8. The Prosody of Questions in Natural DiscourseNancy Hedberg and Juan Sosa, 2002Simon Fraser University • Accenting behavior in question types • Wh-Questions (whq) vs. Yes/No Questions (ynq) in spontaneous speech from “McLaughlin Group” and “Washington Week” • The “locus of interrogation” • Either Wh-word or Fronted Auxiliary Verb • Where are you? • Do you like pie? • Wh-words are often accented with L+H* and rarely deaccented • Ynqs show no consistent accenting behavior • 70.5% of positive ynqs deaccent • 88% of negative ynqs use L+H* • Nuclear Tune • Whqs are produced with falling intonation 80% of the time • Only 34% of Ynqs are produced with rising intonation • Topic Pitch Accent • The topic of both whqs and ynqs are less often accented with L+H* than the locus of interrogation

  9. The Use of Prosody in Syntactic DisambiguationPatti Price1, Mari Ostendorf2, Stefanie Shattuck-Hufnagel3, Cynthia Fong2, 19911SRI, 2Boston University, 3MIT • Relationship between syntax and intonation. • Methodology • 7 Types of syntactically ambiguous sentences spoken by 4 professional radio announcers • Ambiguous sentences were produced within disambiguating paragraphs. • The speakers were not informed of the sentence of interest and only produced one version per session. • Subjects selected the more appropriate surrounding context. • Subjects only rated one version per session. • Analysis • Manual labelling of phrase breaks and accents [not ToBI] • Phrase breaks and their relative size differentiate the two versions. • Characterized by lengthening, pauses and boundary tone

  10. Example Syntactic ambiguities • Parentheticals v. non-parenthetical clause • [Mary knows many languages,][you know] • [Mary knows many languages (that) you (also) know] • Apposition v. attached NP • [Only one remembered,][the lady in red] • [Only one remembered the lady in red] • Main clauses w/ coordinating conjunction v. main and subordinate clause • [Jane rides in the van][and Ella runs] • [Jane rides in the van Ann Della runs] • Tag question v. attached NP • [Mary and I don’t believe][do we?] • [Mary and I don’t believe Dewey.]

  11. Example Syntactic ambiguities • Far v. Near attachment of final phrase • [Raoul murdered the man][with the gun] (Raoul has a gun) • [Raoul murdered [the man with the gun]] (the man has a gun) • Left v. Right attachment of middle phrase • [When you learn gradually][you worry more] • [When you learn][gradually you worry more] • Particles v. Prepositions • [They may wear down the road] (the treads hurt the road) • [They may wear][down the road] (the treads erode)

  12. Prosodic Analysis of Discourse Segments in Direction Giving MonologuesJulia Hirschberg1 and Christine Nakatani2,19961AT&T Labs, 2Harvard University • Intonation is used to indicate discourse structure • Is a speaker beginning a topic? ending one? • Entails a broader information (linguistic, attentional, intentional) structure than given/new entity status. • Boston Directions Corpus • Manual ToBI annotation, and Discourse segmentation • Acoustic-prosodic correlates of discourse segment initial, medial and final phrases. • Segment Initial v. non-initial • Higher max, mean F0 and Energy. • Longer preceding and shorter following pauses • Increases in F0 and Energy from previous phrase • Segment Medial v. Final • Medial has a slower speaking rate and shorter subsequent pause • Relative increase in F0 and Energy from previous phrase

  13. BDC Discourse structure example • [describe green line portion of journey]and get on the Green Line • [describe direction to take on green line]we will take the Green Linesouthtoward Park Street • [describe which green line to take (any)]we can get on any of the Green Linesat Government Center and take them southto Park Street • [describe getting off the green line]once we are at Park Street we will get off • [describe red line portion of journey]and get on the red lineof the T

  14. Acoustic Correlates of Emotion Dimension in View of Speech SynthesisMarc Schröder1, Roddy Cowie2, Ellen Douglas-Cowie2, Machiel Westerdijk3, Stan Gielen3, 20011University of Saarland, 2Queen’s University, 3University of Nijmegen • Paralinguistic Information • That information that is transmitted via language that is not strictly “linguistic”. • E.g. emotion, humor, charisma, deception • Emotional Dimensions • Activation - Degree of readiness to act • Evaluation - Positive v. Negative • Power - Dominance v. Submission • For example, • Happiness - High Activation, High Evaluation, High Power • Anger - High Activation, Low Evaluation, Low Power • Sadness - Low Activation, Low Evaluation, Very Low Power

  15. Acoustic Correlates to Emotion Dimensionctd. • Manual annotation of emotional content of spontaneous speech from 100 speakers in activation-evaluation-power space. • High Activation strongly correlates • High F0 mean and range, longer phrases, shorter pauses, large and fast F0 rise and fall, increased intensity, flat spectral slope • Negative Evaluation correlates • Fast F0 falls, long pauses, increased intensity, more pronounced intensity maxima • High Power correlates • Low F0 mean, (female) shallow F0 rise and falls, reduced intensity (male) increased intensity

  16. Use and Meaning Of IntonationSummary • Intonation can provide information about: • Focus • Contrast • I want the red pen (...not the blue one) • Information Status (given/new) • Speech Acts • Discourse Status • Syntax • Paralinguistics

  17. Automatic Analysis of Intonation • How can the information transmitted via intonation be understood computationally? • What computational techniques are available? • How much human annotation is needed?

  18. Automatic Analysis of Intonation Paper List (1/2) • Supervised Methods • Automatic Recognition of Intonational Features Wightman and Ostendorf, 1992 • An Automatic Prosody Recognizer Using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language Model Ananthakrishnan and Narayanan, 2005 • Perceptually-related Acoustic-Prosodic Features of Phrase Finals in Spontaneous Speech Ishi, et al., 2003 • Alternate ways of representing Intonation • Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing Shriberg and Stolcke 2004 • The Tilt Intonation Model Taylor, 1998

  19. Automatic Analysis of Intonation Paper List (2/2) • Unsupervised Methods • Unsupervised and Semi-supervised Learning of Tone and Pitch Accent Levow, 2006 • Reliable Prominence Identification in English Spontaneous Speech Tamburini, 2006 • Feature Analysis • Spectral Emphasis as an Additional Source of Information in Accent Detection Heldner, 2001 • Duration Features in Prosodic Classification: Why Normalization Comes Second, and what they Really Encode. Batliner, et al., 2001

  20. Supervised Methods • Require annotated data • Pitch Accent and Phrase Boundaries are the two main prosodic events that are detected and classified

  21. Automatic Recognition of Intonational FeaturesColin Wightman and Mari Ostendorf, 1992Boston University • Detection of boundary tones and pitch accents on syllables • Decision tree-based acoustic quantization for use with an HMM • Four-way classification {Pitch Accent, Boundary Tone, Both, Neither} • Features • Is the syllable lexically stressed? • F0 contour representation • Max, min F0 context normalization • Duration • Pause information • Mean energy • Results • Prominence: Correct 86% False alarm 14% • Boundary Tone: Correct 77% False alarm 3%

  22. An Automatic Prosody Recognizer Using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language ModelShankar Aranthakrishnan and Shrikanth Narayanan, 2005University of Sothern California • There are three asynchronous information streams that contribute to intonation • Pitch - duration and distance from mean of piecewise linear fit of f0 • Energy - frame level intensity normalized w.r.t utterance • Duration - normalized vowel duration of the current syllable and following pause duration • Coupled HMM trained on 1 hour of radio news speaker data with ASR hypotheses and POS tags • Tag syllable as long/short, stressed/unstressed, boundary/non-boundary • Includes language model relating POS and prosodic events • Syntax alone provides the best results for boundary tone detection: • Correct 82.1% False Alarm 12.93% • Stress detection false alarm rate is nearly halved by inclusion of acoustic information • Syntax alone: 79.7% / 22.25% • Syntax + acoustics: 79.5% / 13.21%

  23. Perceptually-related Acoustic-Prosodic Features of Phrase Finals in Spontaneous SpeechCarlos Toshinori Ishi, Parham Mokhtari, Nick Campbell, 2003ATR/Human Information Science Labs • Phrase-final behavior can indicate speech act, certainty, discourse/topic structure, etc. • Classification of phrase-final behavior in Japanese • Pitch features • Mean F0 of first and second half of phrase final • Pitch target of first and second half • Min, max, (pseudo-) slope, reset of phrase final • Using a classification tree, 11 tone classes could be classified with 75.9% accuracy • Majority class baseline: 19.6%

  24. Perceptually-related Acoustic-Prosodic Features of Phrase Finals in Spontaneous SpeechCarlos Toshinori Ishi, Parham Mokhtari, Nick Campbell, 2003ATR/Human Information Science Labs

  25. Direct Modeling of Prosody: An Overview of Applications in Automatic Speech ProcessingElizabeth Shriberg and Andreas Stolcke, 2004SRI, ICSI • Do we need to explicitly model prosodic features? • Why not provide acoustic/prosodic information directly to other statistical models? • Task-based integration of features and models • Event Language Model • Augment a typical n-gram language model with prosodic event classes • Event Prosodic Model • Grow decision trees or use GMMs to generate P(Event|Signal) • Continuous Prosodic Features • Duration from ASR • Pitch, energy, voicing normalizations and stylizations • Task specific features: e.g. Number of repeat attempts

  26. Direct Modeling of Prosody Tasks • Structural Tagging • Sentence/topic boundary and disfluency (interruption point) detection • Uses Language Model + Event Prosodic Model • Sentence boundary results • Telephone: accuracy improved 7% • BN: 19% error reduction • Pragmatics/Paralinguistics • Dialog act classification and frustration detection • Uses Language Model and Dialog “grammar” + Event Prosodic Model • Results: • Statement v. Question 16% error reduction • Agreement v. Backchannel 16% error reduction • Frustration 27% error reduction (using “repeated attempt” feature)

  27. Direct Modeling of Prosody Tasks • Speaker Recognition • Typical approaches use spectral information • Use Continuous Prosodic Features • Including phone duration features can reduce error by 50% • Word Recognition • Words can be recognized simultaneously with prosodic events (Event Language Model) • Spectral and prosodic information can be used to model word hypotheses • Phone duration, pause information along with sentence and disfluency detection reduces error by 3.1%

  28. The Tilt Intonation ModelPaul Taylor, 1998University of Edinburgh • The Tilt Model describes accent and boundary tones as “intonational events” characterized by pitch movement • Events (accent, boundary, neither, silence) are automatically detected using an HMM with pitch, energy, and first and second order difference of both • Accuracy ranged from 35%-47% with correct identification of events between 60.7% and 72.7% • Tilt parameter was then extracted from the HMM hypotheses. • F0 synthesis with machine and human derived Tilt parameters differed by < 1Hz rmse on DCEIM test set

  29. Tilt parameter

  30. Unsupervised Models of Intonation • Annotating Intonation is expensive • 100x real time for full ToBI labeling • Human Annotations are errorful • Human agreement ranges from 80-90% • Unsupervised Methods are • Inexpensive • Data doesn’t require manual annotation • Consistent • Performance is not reliant on human consistency

  31. Unsupervised and Semi-supervised Learning of Tone and Pitch AccentGina-Anne Levow, 2006University of Chicago • What can we do without gold-standard data • Also, does Lexical Tone in Mandarin Chinese vary in similar dimensions as Pitch Accent? • Semi- and Unsupervised speaker-dependent clustering into 4 accent classes (unaccented, high, low, downstepped) • Forced alignment-based syllable Features: • Speaker normalized f0, f0 slope and intensity • Context: prev. following syllables values and first order differences • Semi-supervised: Laplacian Support Vector Machines • Tone (clean speech): 94% accuracy (99% supervised) • Pitch Accent (2-way): 81.5% accuracy (84% supervised) • Unsupervised: k-means clustering, Asymmetric k-lines clustering • Tone (clean speech): 77% accuracy • Pitch Accent (4-way): 78.4% accuracy (80.1% supervised)

  32. Reliable Prominence Identification in English Spontaneous SpeechFabio Tamburini, 2005University of Bologna • Unsupervised metric to describe the prominence of a syllable • Calculated over the nucleus • Prom = en500-4000 + dur + enov (Aevent + Devent) • High spectrum energy • Duration • Full spectrum energy • Tilt parameters (f0 amplitude and duration) • By tuning a threshold, 18.64% syllable error rate on TIMIT

  33. Feature Analysis • Intonation is generally assumed to be realized as a modification of • Pitch • Energy • Duration • How do each of these contribute to realization of specific prosodic events?

  34. Spectral Emphasis as an Additional Source of Information in Accent DetectionMattias Heldner, 2001Umeå University • Close inspection of spectral emphasis as discriminating accented and non-accented syllables in read Swedish • Spectral emphasis:difference (in dB) of energy in the first formant and full spectrum • First formant energy was extracted using a dynamic low pass filter with a cut off that followed f0 • Classifier: The word in a phrase with the highest spectral emphasis/intensity/pitch is “focally accented”. • Results: • Spectral Emphasis: 75% correct • Overall Intensity: 69% correct • Pitch peak: 67% correct

  35. Duration Features in Prosodic Classification: Why Normalization Comes Second, and what they Really EncodeAnton Batliner, Elmar Nöth, Jan Buckow, Richard Huber, Volker Warnke, Heinrich Niemann, 2001University of Erlangen-Nuremberg • When vowels are stressed, accented or phrase-final they tend to be lengthened. What’s the best way to measure the duration of a word? • Duration is normalized in three ways • DURNORM - normalized w.r.t. ‘expected’ duration • Expected duration calculated by the mean and std.dev. of a vowel scaled by a ROS approximation. • DURSYLL - normalized w.r.t. number of syllables • DURABS - raw duration • In both German and English on boundary and accent tasks, DURABS classified the best followed by DURSYLL followed by DURNORM • Duration inadvertently encodes semantic information • Complex words tend to have more syllables and tend to be accented more frequently; common words (particles, backchannels) tend to be shorter • DURNORM and DURSYLL are able to classify well (if worse) despite obfuscating this information

  36. Automatic Analysis of IntonationSummary • Various of models can be used to analyze both pitch accents and phrase boundaries: • Supervised • Direct Discriminative modelling • Semi- and unsupervised learning • Research has also examined how accents and phrase breaks are realized in a constrained acoustic dimensions

  37. Second Language Learning Systems • Automated systems can be used to improve pronunciation and intonation of second language learners. • Native intonation is rarely emphasized in classrooms and is often the last thing non-native speakers learn. • Focus will be more on computational approaches (diagnosis, evaluation) over pedagogical concerns

  38. Second Language Learning SystemsPaper List (1/2) • Pronunciation Evaluation • The SRI EduSpeakTM System: Recognition and Pronunciation Scoring for Language Learning • Franco, et al., 2000 • Automatic Localization and Diagnosis of Pronunciation Errors for Second-Language Learners of English • Herron, et al., 1999 • Automatic Syllable Stress Detection Using Prosodic Features for Pronunciation Evaluation of Language Learners • Tepperman and Narayanan, 2005

  39. Second Language Learning SystemsPaper List (2/2) • Fluency, Nativeness and Intonation Evaluation • A Visual Display for the Teaching of Intonation • Spaai and Hermes, 1993 • Quantitative Assessment of Second Language Learner’s Fluency: An Automatic Approach • Cucchiarini, et al., 2002 • Prosodic Features for Automatic Text-Independent Evaluation of Degree of Nativeness for Language Learners • Teixeira, et al., 2000 • Modeling and Automatic Detection of English Sentence Stress for Computer-Assisted English Prosody Learning System • Imoto, et al., 2002 • A study of sentence stress production in Mandarin speakers of American English • Chen, et al., 2001

  40. Pronunciation Evaluation • The segmental context and lexical stress of a production determines whether it is pronounced correctly or not.

  41. The SRI EduSpeakTM System: Recognition and Pronunciation Scoring for Language LearningHoracio Franco, Victor Abrash, Kristin Precoda, Harry Bratt, Ramana Rao, John Butzberger, Romain Rossier, Federico Cesari, 2000SRI • Recognition • Non-native speech recognition is errorful. • A native HMM recognizer was adapted to non-native speech. • Non-native WER was reduced by half, while not affecting native performance • Pronunciation Evaluation • Combine scores using a regression tree to generate scores that correlate with scores from human raters • Spectral Match: Compare the spectrum of a candidate phone to a native, context-independent phone model. • Also used for mispronunciation detection • Phone Duration: Compare the candidate duration to a model of native duration, normalized by rate of speech • Speaking rate: phones/sentence

  42. Automatic Localization and Diagnosis of Pronunciation ErrorsDaniel Herron1, Wolfgang Menzel1, Erica Atwell2, Roberto Bisiani6, Fabio Deaneluzzi4, Rachel Morton5, Juergen Schmidt3, 19991U. of Hamburg, 2U. of Leeds, 3Ernst Klett Verlag, 4Dida*El S.r.l., 5Entropic, 6U. of Milan-Bicocca • Locating and describing errors is critical for instruction • Identifying segmental errors • In response to a read prompt, lax recognition followed by strict recognition • Some errors are predictable based on L1. • Vowel, pre-vocalic consonant, and word-final devoicing errors are modelled explicitly, and tested on artificial data. • Vowel - /ih/ -> /ey/ “it” -> “eet” • PV consonant - /w/ - > /v/ “was” -> “vas” • WF devoicing - /g/ -> /k/ “thinking” -> “thinkink” • Using a word-internal tri-phone model vowel and PV consonant errors can be diagnosed with >80% accuracy with a FA rate less than 5%. WF devoicing can only be diagnosed with ~40% accuracy. • Stress-errors are detected by deviation from trained models of stressed and unstressed syllables

  43. Automatic Syllable Stress Detection Using Prosodic Features for Pronunciation Evaluation of Language LearnersJoseph Tepperman and Shrikanth Narayanan, 2005University of Southern California • Lexical stress can change POS and meaning • “INsult” v. “inSULT” or “CONtent” v. “conTENT” • Detecting stress on read speech with content determined a priori • Use forced alignment to id syllable nuclei (vowels) • Extract f0 and energy features. Duration features are manually normalized by context. • Classified using a supervised Gaussian Mixture Model • Post-processed to guarantee exactly 1 stressed classification per word. • Mean f0, energy and duration discriminate with >80% accuracy on English spoken by Italian and German speakers

  44. Nativeness, Fluency and Intonation Evaluation • Intonational information can influence the proficiency and understandability of a second-language speaker • Proficient second-language speakers often have difficulty producing native-like intonation

  45. A Visual Display for the Teaching of IntonationGerard Spaai and Dik Hermes, 1993Institute for Perception Research • Tools for guided instruction of intonation • Intonation is difficult to learn • It is acquired early, so it is resistant to change • Native language intonation expectations may impair the perceptions of foreign intonation • Intonation Meter • Display the pitch contour. • Interpolate non-voiced region • Mark vowel onsets

  46. Quantitative Assessment of Second Language Learners' Fluency: An Automatic ApproachCatia Cucchiarini, Helmer Strik and Lou Boves, 2002University of Nijmegen • Does ‘fluency’ always mean the same thing? • Linguistic knowledge, segmental pronunciation, native-like intonation. • Three groups of raters, 3 phoneticians, 6 speech therapists, assessed fluency of read speech on a scale from 1-10. • With 1 exception the raters agreed with  > .9 • Native speakers are consistently rated as more fluent than non-native • Time/Lexical correlates to fluency • High rate of speech (segments/duration) • High phonation/time (+/- pauses) ratio • High mean length of runs • Low number & duration of pauses

  47. Prosodic Features for Automatic Text-Independent Evaluation of Degree of Nativeness for Language LearnersCarlos Teixeira1,2, Horacio Franco2, Elizabeth Shriberg2, Kristin Precoda2, Kemal Sönmez2, 20001IST-UTL/INESC, 2SRI • Can a model be trained to assess speakers nativeness similarly to humans without text information? • Construct Feature-Specific Decision Trees • Word stress (duration of longest vowel, duration of lexically stressed vowel, duration of vowel with max f0) • Speaking rate approximations (durations between vowels) • Pitch (max, slope “bigram” modeling) • Forced alignment + pitch (duration between max f0 to longest vowel nucleus, location of max f0) • Unique events (durations of longest pauses, longest words) • Combination (max or expectation) of “posterior probabilities” from decision trees • Results • Pitch-based features do not generate human-like scores • Only weak correlation (<.434) between machine and human scores • Inclusion of posterior recognition scores and rate of speech helps considerably. • Correlation = ~.7

  48. Modeling and Automatic Detection of English Sentence Stress for Computer Assisted English Prosody Learning SystemKazunori Imoto, Yasushi Tsubota, Antione Rau, Tatsuya Kawahara and Masatake Dantsuji, 2002Kyoto University • L1 specific errors need to be accounted for • Japanese speakers tend not to use energy and duration to indicate stress • Syllable structure “strike” -> /s-u-t-o-r-ay-k-u/ • Incorrect phrasing • Classification of stress levels • Syllable alignment was performed with a recognizer trained with common native Japanese English speech (including segmental errors) • Supervised HMM training using pitch, power, 4-th order MFCC & first and second order differences • Using distinct models for each stress type/syllable structure/position combination (144 HMMs), 93.7%/79.3% native/non-native accuracies were achieved • Two stage recognition increased accuracy to 95.1%/84.1% • Primary + Secondary stress v. Non-stressed • Primary v. Secondary stress

  49. A study of sentence stress production in Mandarin Speakers of American EnglishYang Chen1, Michael Robb2, Harvey Gilbert2 and Jay Lerman2, 20011University of Wyoming, 2University of Connecticut • Do native Mandarin speakers produce American English pitch accents “natively”? • Experiment • Compare native Mandarin English and native American English productions of “I bought a cat there” with varied location of pitch accent. • Pitch Energy and duration of vowels were calculated and compared across language group and gender • Vowel onset/offset were determined manually. • Results • Mandarin speakers produced stressed words with shorter duration than American speakers. • Female mandarin speakers produced stressed words with greater rise in f0

  50. Second Language Learning SystemsSummary • Performance assessment • Pronunciation • Intonation • Error diagnosis and (Instruction) • Influence of L1 on L2 instruction and evaluation

More Related