1 / 1

Why predict emotions?

Exploiting Word-level Features for Emotion Prediction. Why predict emotions?. 1. Affective computing – direction for improving spoken dialogue systems Emotion detection (prediction) Emotion handling.

cullen
Télécharger la présentation

Why predict emotions?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Word-level Features for Emotion Prediction Why predict emotions? 1 • Affective computing – direction for improving spoken dialogue systems • Emotion detection (prediction) • Emotion handling Poster by Greg Nicholas. Adapted from paper by Greg Nicholas, Mihai Rotaru, & Diane Litman Feature granularity levels 2 Detecting emotion: train a classifier on features extracted from user turns. Types of features: Turn Level Word Level Previous work uses mostly features computed over the entire turn. [1] uses pitch features computed at the word-level Amplitude Approximations of pitch contours Approximation of pitch contour Lexical Pitch Duration Offers a better approximation of the pitch contour (e.g. captures the big changes in uttering the word “great.”) Efficient but offers a coarse approximation of the pitch contour. We concentrate on Pitchfeatures to detect Uncertainty 3 4 Problems classifying the overall turn emotion Techniques to solve this problem • Word-level is more complicated: • Label granularity mismatch: label at turn level, features at word level • Variable number of features per turn • Turn-level is simple: • Labeling granularity = turn • One set of features per turn Technique 1: Word-level emotion model (WLEM) Technique 2: Predefined subset of sub-turn units (PSSU) Train: word-level model with turn’s emotion label Predict: emotion label of each word Combine: majority voting of predictions Combine: Concatenate features from 3 words (first, middle, last) into a conglomerate feature set Train & predict: turn-level model with turn’s emotion Example student turn: “The force of the truck” Turn-level speech: The force of the truck Turn-level speech: The force of the truck Word-level feature set: Word-level feature set: … … … … … … … … … … extract extract “the” “force” “of” “the” “truck” “the” “force” “of” “the” “truck” Word-level feature set: (Five sets) … … … … … Turn-level feature set: predict (One set) combine …… Word-level predictions: PSSU feature set: “the” “force” “of” “the” “truck” Non-uncertain Uncertain Non-uncertain “the force of the truck” … … … Uncertain Non-uncertain predict ? predict “the” “of” “truck” Overall turn prediction: (One prediction) Overall turn prediction: Overall turn prediction: Uncertain (One prediction) combine Predict Uncertain Overall turn prediction: Non-uncertain (3/5) Non-uncertain • Issues: • Turn  Word level labeling assumption • Majority voting is a very simple scheme • Issues: • Might lose details from discarded words Experimental Results 5 Recall/Precision [1] showed that the WLEM method works better than turn-level Comparison of recall and precision for predicting uncertain turns Used in [2] at breath-group level but not at word level Corpus • ITSPOKE dialogues • Domain: qualitative physics tutoring • Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech Future work 6 Overall prediction accuracy • Many alterations could further improve these techniques: • Annotate each individual word for certainty instead of whole turns • Include other features pictured above: lexical, amplitude, etc. • Try predicting in a human-human dialogue context • Better combination techniques (e.g. confidence weighting) • More selective choices for PSSU than the middle word of the turn (e.g. longest word in the turn, ensuring the word chosen has domain-specific content) Corpus comparison with previous study [1] Baseline: 77.79% • WLEM word-level slightly improves upon turn-level (+0.56%) • PSSU word-level show a much better improvement (+2.14%) • Overall, PSSU is best according to this metric as well • Turn-level: Medium recall/precision • WLEM: Best recall, lowest precision • Tends to over-generalize • PSSU: Good recall, best precision • Much less over-generalization, overall best choice [1] M. Rotaru and D. Litman, "Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues," Proceedings of Interspeech, 2005. [2] J. Liscombe, J. Hirschberg, and J. J. Venditti, "Detecting Certainness in Spoken Tutorial Dialogues,” Proceedings of Interspeech, 2005.

More Related