1 / 28

Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English

Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English. Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein. Introduction.

jordane
Télécharger la présentation

Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein

  2. Introduction Phrasal (focal) stress can be perceived visually above chance, though intonation cannot (e.g. Bernstein et al. 1989). Many studies have shown that stress is marked by longer, larger, and faster movements of jaw, lips, and tongue; sometimes by eyebrow movements; and acoustically mainly by f0 (pitch accents), lengthening, and loudness. Jaw lowering and acoustic duration are known to correlate with auditory perception of stress, and eyebrow movement with visual perception.

  3. Optical phonetics of stress • Extents, durations, and velocities of movements of lips, chin, and eyebrows, andmouth opening, are all potentially visible to perceivers. • Our production (optical) measures are position and movement measures of visible fleshpoints.

  4. This study • Production experiment: Do speakers show any consistent optical correlates of phrasal and lexical stresses? • Perception experiment: Are there differences in the visual intelligibility of phrasal and lexical stress, and of the different speakers? • Production-perception comparison: Which, if any, of the optical production correlates account for visual intelligibility?

  5. 4 minimal pairs DIScharge / disCHARGE DIScount / disCOUNT PERvert / perVERT SUBject / subJECT 4 non-minimal pairs DEbit / casSETTE INstance / conVINCE BUSiness / subMIT COUrage / gaZELLE Minimal pairs read as given, and also reiterantly Non-minimal pairs only reiterantly 2 reiterant syllables “buh” = [bʌ] / [bƏ] “fer” = [fɝ] / [fɚ] differ in mouth opening TOTAL 40 words Production methodsLexical stress materials

  6. Production methodsPhrasal stress materials “So TOMMY gave Timmy a song from Debby.” “So Tommy gave TIMMY a song from Debby.” “So Tommy gave Timmy a song from DEBBY.” “So Tommy gave Timmy a song from Debby.” • narrow (contrast)accent on one name or “neutral” broad focus • these 4 stress conditions x 6 combinations of names = 24 sentences • sentences not read reiterantly

  7. Production methodsBoth stress contrasts involve nuclear accent • Lexical stress items read in isolation • Phrasal stressitems read with narrow focus to show contrast and/or emphasis H* L-L%H* L-L% …a song from TIMMY DIScount (phrasal stress) (lexical stress)

  8. Production MethodsSpeakers • 3 male Californians differing in perceptually-determined visual intelligibility for segments • low-medium = Sp-LO • medium = Sp-MID • high = Sp-HI • VISUAL INTELLIGIBILITY SCORING: • speakers video-recorded reading 320 (other) sentences • 8 expert deaf lipreaders transcribed sentences, yielding % correct visual intelligibility scores

  9. Videorecording professional-quality teleprompter under camera DAT recording Facial motion using Qualisys™ system 120 Hz SR 20 small passive retroreflectors three cameras infrared flash 3D position for each retroreflector Production methodsRecording set-up and procedure • Items blocked by stress location • Two tokens of each item

  10. Production methodsFacepoint marker locations and measurements • Left eyebrow displacement • Head displacement • Interlip maximum distance • Interlipopening displacement • Interlip closing displacement • Lower lip opening peak velocity • Lower lip closing peak velocity • Chin opening displacement • Chin opening peak velocity • Chin closing displacement • Chin closing peak velocity eyebrow markers head marker lip markers chin marker

  11. Production methodsData analysis • Prosody of audio speech signals checked by two transcribers (some small differences found between prompted and produced stresses, but these differences generally do not affect analyses presented here) • Here, only tokens used in perception study analyzed (1 of the 2 tokens of each item) • Effects of stress on the 11 facepoint marker measurements tested by (factorial) ANOVAs

  12. Production resultsOverview • Stress is well-marked by these measures • Lexical vs. phrasal stress: more significantly different measures, and larger differences between stressed and unstressed, with phrasal stress than with lexical • Reiterant vs. nonreiterant words: both sets show stress effect

  13. Production resultsSignificant differences due to Lexical stress • 5 of 11 measures distinguish stress - 3 opening gesture measures e.g. Head, and Interlip Max. Distance • Generally holds across speakers and real vs. reiterant Interlip Opening Displacement all reiterant words syllable 1 syllable 2

  14. Production resultsSignificant differences due to Phrasal stress • All 11 measures distinguish stress, e.g. • Chin and eyebrow measures are more consistent across speakers Chin Closing Peak Velocity accented unaccented

  15. Stress in words Head moves, eyebrow not Stress in phrases Head down (2 speakers) Eyebrow up Production resultsSignificant Head and Eyebrow movements So TIMMY gave Tommy a song from Debby

  16. Production resultsAn aside: Eyebrows and F0 • 40 sentences from the phrasal stress corpus • F0 from audio, and right and left eyebrow positions, at 12 ms intervals • Significant correlations between eyebrows and F0, but accounting for little variance (only 1-4%)

  17. Perception methods • 1 token of each item from production corpus (120 words, 72 sentences), each presented twice (384 total trials) • 16 hearing perceivers (not screened for lipreading ability) • Test video clip (no sound) on right monitor, clickable response choices on left monitor • Lexical stress: Response choices were pairs of real words, even for reiterant items • Sentences: Click on one name, or on “NoStress”

  18. Perception resultsOverview • Stress is perceived above chance • Lexical vs. phrasal stress: phrasal stress is perceived better • Reiterant vs. nonreiterant words: perceived equally well

  19. Perception results Overall results, all above chance Chance 50% %correct Chance 25% N=2304 N=3072 N=768

  20. Perception resultsLexical vs. phrasal stress Individual subjects’ % correct relative to levels that are significantly above chance: phrasal perceived better (significantly so by paired t-test) phrasal all lexical

  21. Perception resultsLexical stress All lexical speech conditions equally-well perceived overall: • Reiterant & non • buh & fer • Minimal & non % correct Minimal pairs non-minimal

  22. Perception results Speakers: lexical stress • All speakers’ lexical stress perceived above chance (50%) • Sp-LO perceived better on reiterant words % correct non-reiterant reiterant minimal reiterant non-minimal

  23. Perception resultsPhrasal stress • 3 focal positions perceived equally well, and correct above chance for almost every item • Responses to Neutral condition at chance % correct Position of stress in sentence

  24. Perception results Speakers: phrasal stress • All speakers’ phrasal stress perceived above chance (25%) • Sp-MID perceived less accurately • Sp-LO best for Neutral condition (not shown here) % correct

  25. Production-perception comparisons: Speaker differences • Prosodic intelligibility: Sp-LO highest for words, Neutral sentences; Sp-MID lowest for sentences • Re production: Sp-LO shows larger lip differences than Sp-MID on sentences, and largest Chin closing displacement on words (but Sp-HI has largest head movement differences) • Unrelated to segmental intelligibility: compare above with speakers’ names LO-MID-HI, which reflect their segmental intelligibility

  26. Production-perception comparisons:Correlational analyses of sentences • Tested relations between production measures and % correct perception of phrasal stresses • 10 of 11 measures correlated significantly with perception, with chin measures accounting for the most variance (up to 40%) • Only Interlip maximum distance (mouth opening) did not correlate with perception

  27. Production-perception comparisons:Correlational analyses of sentences • Partial correlations (controlling for contributions of various lip measures) show independent contributions to perception of • Chin opening displacement (15% of variance) • Chin peak opening velocity (11%) • Lower lip peak opening velocity (11%) • Closing gestures generally make no independent contributions to perception

  28. Summary • Lexical and phrasal stress are visually perceived above chance • Phrasal stress is marked by more and larger production differences, and perceived better • Chin opening accounts formost variance in perception of phrasal stress • Speakers’ visual intelligibility for prosody does not correspond to segmental

More Related