50 likes | 164 Vues
This study investigates low-level cues to emotions in phone dialog systems aimed at improving customer satisfaction. The research analyzed a corpus of 5,690 dialogs and 20,013 user turns, focusing on both negative and positive emotional cues. Using the BoostTexter machine learning method, which integrates multiple weak classifiers, the study evaluates features from lexical, prosodic, and dialog act perspectives. Results showed an improvement in emotion detection accuracy from 73.1% to 79.0% by incorporating various features and contextual information, indicating the potential for better customer interaction.
E N D
Low Level Cues to Emotion Julia Hirschberg CS 4995/6998
Liscombe et al ’05a • Domain: Phone account information, How May I Help You? system • Motivation Improve customer satisfaction • Emotions examined: Negative, non-negative (collapsed from 7 classes) • Corpus: 5690 dialogs, 20,013 user turns • Training-test split: 75% - 25% • ML method: BoostTexter, combines multiple weak classifiers
~80 Features • Lexical: bag of words from transcripts 1,2,3grams • Prosodic: • Energy min, max, median, s.d., • F0 min, max, median, s.d., mean slope • Ratio of voiced frames to total (rate) • Slope after final vowel (turn-final pitch contour) • Mean F0 and energy over longest normalized vowel (accent) • Syllables per second (rate) • mean vowel length • percent internal silence (hesitation) • Local jitter over longest normalized vowel (VQ)
Results • Baseline 73.1% (majority) • Lexical + prosodic features 76.1% • Lexical + prosodic + dialog act features 77.0% • Lexical + prosodic + dialog act + context 79.0%
Dialogue Act (DA) of current turn • Context: • Change in value of prosodic features from n-1 to n and n to n+1 • Bag of words from two previous turns • Edit difference between n-1 and n, n and n-2 • DAs of n-1 and n-2 • DAs of system prompts eliciting n and n-1 • Hand-labeled emotion of n-1 and n-2