1 / 62

The auditory and the visual percept evoked by the same audiovisual stimuli

The auditory and the visual percept evoked by the same audiovisual stimuli. Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm. Theoretical background.

yardan
Télécharger la présentation

The auditory and the visual percept evoked by the same audiovisual stimuli

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The auditory and the visual percept evoked by the same audiovisual stimuli Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm

  2. Theoretical background It is fairly obvious that acoustic speech stimuli evoke an auditory percept, while optic speech stimuli evoke a visual percept. In phonetic terms, these percepts agree with each other in congruent AV stimuli. In incongruent AV stimuli, this is not necessarily so.

  3. Theoretical background Acoustic signal Optic signal Auditory signal analysis Visual signal analysis An auditory percept A visual percept

  4. Theoretical background Acoustic signal Optic signal Auditory signal analysis Visual signal analysis An auditory percept Audiovisual integration A visual percept A common percept

  5. Theoretical background According to the Motor Theory and the Direct Realist theory of speech perception, the ‘object’ of speech perception is gestural in nature. These theories know of only one percept of speech, which may be identified with the common AV-percept in Figure 1.

  6. Theoretical background Another theory, the Modulation Theory, considers speech primarily as modulated voice. The ‘object’ of normal speech perception is vocal in nature and consists in the modulation of a voice. The theory allows for a different percept in lip reading. This is gestural and consists in the modulation of a face.

  7. Theoretical background In order to clarify the situation, it is necessary to investigate not only the effects an optic speech signal has on auditory perception, but also those an acoustic speech signal has on visual perception of speech – and to compare these effects with each other.

  8. Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The AV stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

  9. Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

  10. Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

  11. Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

  12. Earlier studies Typical result Visual roundedness combined with auditory openness.

  13. Earlier studies Explanation Acoustic cues to openness (F1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion.

  14. Earlier studies Explanation Acoustic cues to openness (F1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion. Optic cues to roundedness are more reliable; rounded lips are easy to distinguish from unrounded in most conditions. Acoustic cues to roundedness (higher formants) lack salience and are less reliable.

  15. Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories.

  16. Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories. However, subjects informally reported having heard vowels whose quality differed from that of ordinary Swedish vowels. Auditorily rounded vowels appeared to be shifted backwards in the front-back dimension when presented together with optically unrounded vowels.

  17. The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels.

  18. The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels. It has also the additional aim of comparing the auditoryand the visual perception of the same AV stimuli.

  19. The present study We reused a subset of the stimuli from the previous experiment.

  20. The present study There were 4 speakers: 2 male, 2 female.

  21. The present study There were 8 perceivers: They were selected from a previous experiment where they had shown sensitivity to the optic signal in incongruent audiovisual stimuli. The 8 subjects were all phonetically skilled and familiar with the IPA-chart for vowels.

  22. The present study The subjects perceived the stimuli by way of headphones and a computer screen. The stimuli were presented in quasi-random order. Responses were given on electronic response sheets.

  23. The present study The subjects were instructed to rate these dimensions of the vowels: • Lip rounding (6 degrees), 1st: unrounded; 5th: rounded • Lip spreading (3 degrees) • Openness (18 degrees), 2nd: close vowels, 6th: close-mid vowels • Backness (11 degrees auditorily; 7 degrees visually), 2nd: front vowels, 6th (auditorily): central vowels

  24. The present study In a first experiment, the subjects were instructed to rate the dimensions of vowels they heard. In a second experiment, the same subjects were instructed to rate the dimensions of vowels they saw. The incongruent stimuli were the same in the two experiments.

  25. Results Openness opn vs. roundedness rnd; acoustic stimuli (listening only). Symbols represent speakers.

  26. Results Openness opn vs. roundedness rnd; optic stimuli (lipreading only). Symbols represent speakers.

  27. Results Heard openness of incongruent AV-stimuli vs. opn of A-stimuli (ρ = .80*). Symbols represent acoustically presented vowels.

  28. Results Heard roundedness of incongruent AV-stimuli vs. rnd of A-stimuli (ρ = -.05). Symbols represent acoustically presented vowels.

  29. Results Heard spreadness of incongruent AV-stimuli vs. spr of A-stimuli (ρ = .07). Symbols represent acoustically presented vowels.

  30. Results Heard backness of incongruent AV-stimuli vs. roundedness of A-stimuli (ρ = .71*). Symbols represent acoustically presented vowels.

  31. Results Heard openness of incongruent AV-stimuli plotted against opn of A-stimuli (left, ρ = .71*) and of V-stimuli (right, ρ = .03). Symbols represent acoustically presented vowels.

  32. Results Heard roundedness of incongruent AV-stimuli plotted against rnd of A-stimuli (left, ρ = -.05) and of V-stimuli (right, ρ = .79*). Symbols represent acoustically presented vowels.

  33. Results Heard spreadness of incongruent AV-stimuli plotted against spr of A-stimuli (left, ρ = .07) and of V-stimuli (right, ρ = .90*). Symbols represent acoustically presented vowels.

  34. Results Heard backness of incongruent AV-stimuli plotted against roundedness of A-stimuli (left, ρ = .71*) and of V-stimuli (right, ρ = -.59*). Symbols represent acoustically presented vowels.

  35. Results The results were subjected to linear regression analyses in which the average ratings obtained in each unimodal presentation were taken as candidate independent variables together with the interaction terms. A comparison of the regression equations that describe the results of the listening task and the viewing task shows that the two percepts need to be distinguished from each other.

  36. Results The difference is particularly clear in the dimension of openness: opnheard = 0.05 + 1.00 opnA + 0.00 opnV(r2=0.97) opnseen = 0.05 + 0.59 opnA + 0.42 opnV (r2=0.81) the rounded vowels to the right of their charts.

  37. Results The difference is particularly clear in the dimension of openness: opnheard = 0.05 + 1.00 opnA + 0.00 opnV(r2=0.97) opnseen = 0.05 + 0.59 opnA + 0.42 opnV (r2=0.81) In the listening task, the estimates were based on the acoustic cues alone. In the viewing task, they were based on a weighted sum of the acoustic and the optic cues. rounded vels to the right of their rts.

  38. Results In perception of roundedness and spreadness, there were only some minor differences between the results of the two tasks. In these dimensions, our subjects relied almost totally on optic cues not only when asked what they saw, but also when asked what they heard.

  39. Results There was, however, an interesting difference in perceived backness. bacheard = 0.06 + 0.25 rndA - 0.20 rndAV (r2=0.74) bacseen = 0.09 + 0.42 bacV (r2=0.22)

  40. Results There was, however, an interesting difference in perceived backness. bacheard = 0.06 + 0.25 rndA - 0.20 rndAV (r2=0.74) bacseen = 0.09 + 0.42 bacV (r2=0.22) Note that bacheard is given by cues reflecting roundedness rather than backness.

  41. Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: • The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. • F2’ is lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

  42. Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: • The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. • The upper formants (F2’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

  43. Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: • The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. • The upper formants (F2’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

  44. Discussion Analysis of perceived backness

  45. Discussion Analysis of perceived backness Conclusion: The effect is due to auditory (F2’) rather than articulatory (gestural) associations.

  46. Discussion The observed effect of liprounding on perceived backness cannot be explained on the basis of a late-integration hypothesis. Swedish lacks non-front unrounded vowel phonemes and phones, whose existence would be required in order to apply such a hypothesis. This is clear and direct evidence for early, pre-categorical integration. The result also shows that this integration takes place in an auditory space in which roundedness and backness have an essential component in common.

  47. Discussion Acoustic signal Optic signal Auditory signal analysis Visual signal analysis An auditory percept Audiovisual integration A visual percept A common percept

  48. Discussion Acoustic signal Optic signal Auditory analysis (demodulation) Visual analysis (demodulation) Modulation of voice Modulation of face Integration of vocal information Integration of gestural information Vocal percept Gestural percept

  49. Summary Some earlier findings: • In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) • A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) • Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35: 244-258.

  50. Summary Some earlier findings: • In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) • A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) • Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35: 244-258.

More Related