1 / 19

On The Robustness Of Overall F0-only Modifications To The Perception Of Emotions In Speech

On The Robustness Of Overall F0-only Modifications To The Perception Of Emotions In Speech. Murtaza Buluta and Shrikanth Narayanan August 15, 2008, presented by Rio Akasaka. General ideas. Examines the effects of changing F0

Télécharger la présentation

On The Robustness Of Overall F0-only Modifications To The Perception Of Emotions In Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On The Robustness Of Overall F0-only Modifications To ThePerception Of Emotions In Speech Murtaza Buluta and Shrikanth Narayanan August 15, 2008, presented by Rio Akasaka

  2. General ideas • Examines the effects of changing F0 • Notes how it can be changed without changing the perception or sound quality of a particular utterance. • Introduces the concept of emotional regions. • Performs statistical analyses on the various modifications.

  3. What? • F0, pitch, is a good descriptor of emotion. • However, usefulness is limited because it isn’t so descriptive in natural speech. • Let’s introduce a new model called ‘emotional regions’ to represent utterances.

  4. Why? • Useful in making automated judgments on emotion in speech • MoodSwings (arousal content in speech), Timbre Game (F0 contours) • Can complement facial recognition research

  5. Emotion Perception • Analytic F0 contour, range, voice quality • Contextual sentence content, speaker

  6. Neutral

  7. Sad

  8. Joy

  9. Anger

  10. How? • Changing the F0 mean: Shifting the entire contour up or down • Changing the F0 range Multiplying the contour by a constant and shifting it so as to retain the original mean. • Stylizing Representing the F0 contour with linear segments of differing resolutions

  11. Data Collection 2 speakers x 2 sentences x 4 emotions x 29 modifications + original = 480 files Male, female “She told me what you did” “This hat makes me look like an aardvark.” Happy, angry, neutral, sad

  12. Analysis Listening test: 14 people Rate emotion and naturalness (quality)

  13. Emotional regions • 2D (F0 mean and range), not 3D • Mahalanobis distance Gaussian vs. Euclidean

  14. In-Depth • Important to realize that the emotional regions do not define how new emotions can be synthesized • Perception of emotions is based not only on F0, but on the combined effects of • prosody – rhythm, stress and intonation • spectral - speaker • linguistic – sentence • 4-way ANOVA with H0: emotions is equally perceived across all modifications H0: speech quality is equally perceived

  15. Observations • Increasing the F0 mean (+/- 50%) Sad and neutral emotion perception increased, angry and happy decreased • Changing the F0 range caused more variation in emotion recognition that changing F0 contours. In some cases changing the F0 range did not change the sound quality. Decreasing F0 range caused increase in sad. • Speakers were able to recognize emotion even with changes in F0 and distortion in sound quality. • Perceived speech quality drop is less severe when changing F0 range modifications instead of mean • Changes in contour shapes does not necessarily cause significant changes in emotion recognition.

  16. Things to retain from this presentation • Emotional regions can be used to parametrize emotions, but you also need to take linguistic content as a factor • Changing F0 did not necessarily change perception of emotions • Changing the F0 range affected emotion perception more than changing the F0 mean. • Also, drop in speech quality was significantly less when playing around with F0 range.

  17. Bibliography • http://emosamples.syntheticspeech.de/

More Related