1 / 32

Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering

Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009. Emotion Tagging – A Comparative Study on Bengali and English Blogs. Outline. Motivation Resources Word Level Tagging - Baseline Model - Morphology

dawson
Télécharger la présentation

Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009 Emotion Tagging – A Comparative Study on Bengali and English Blogs ICON 2009

  2. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  3. Motivation (1/3) In psychology and common use, emotion is an aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008) ICON 2009

  4. Motivation (2/3) Natural Language Processing (NLP) tasks - Tracking users’ emotion (products, events, politics) - Customer relationship management - Question Answering (QA) systems - Modern Information Retrieval (IR) systems ICON 2009

  5. Motivation (3/3) Blogs - Communicative and informative repository of text based emotional contents in the Web 2.0. (Lin et al., 2007) - Online diary of the bloggers - Blog posts annotated by other bloggers - Large data suitable for machine learning Recognition of emotion from written text ICON 2009

  6. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  7. Resources (1/4) Bengali Blog - Web blog archive (www.amarblog.com) - 14 different comic related topics and user comments - 1200 sentences English blog - Saima Aman and Stan Szpakowicz.2007. Identifying Expressions of Emotion in Text. V. Matoušek and P. Mautner (Eds.): TSD 2007, LNAI 4629, pp. 196–205 - 1200 sentences ICON 2009

  8. Resources (2/4) English Sentiment Lexicon - SentiWordNet (Esuli et al., 2006) - WordNet Affect lists(WAL)(Strapparava et al., 2004) Updating of WAL - Inadequate number of emotion word entries - Retrieved synsets from English SentiWordNet - Update with synsets ICON 2009

  9. Resources (3/4) No Sentiment lexicon in Bengali Both SentiWordNet and WordNet Affect lists into Bengali Translation - Using Bengali synsets (English to Bengali bilingual synset dictionary being developed as part of the English to Indian Languages Machine Translation (EILMT) project, a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India WAL (termed as Emotion List) ICON 2009

  10. Resources (4/4) A knowledge base for Emoticons ICON 2009

  11. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  12. Word Level Tagging Semi-automatic annotation Emotion tag to a word with help of the Emotion list Other non-emotional words tagged with neutral type Stemming process Verified by linguists 700 sentences for training , 300 and 200 sentences as development and test set ICON 2009

  13. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  14. Baseline Model Identify word level emotion tagging accuracies for each emotion class All words incorporate no prior knowledge regarding word features Six separate modules for six emotion classes Words passed through six separate modules Tag each word with the emotion tag based on the emotion class in which that word appears ICON 2009

  15. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  16. Morphology Minimize errors to recognize emotional words Bengali, like any other Indian languages, is morphologically very rich Different suffixes (e.g. verbs, the features are Tense, Aspect, and Person) Stemmer uses suffix list to identify the stem form For English, porter stemmer (Porter, 1997) 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test set ICON 2009

  17. Baseline vs. Morphology (Result) ICON 2009

  18. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  19. CRF based Model (1/4) • 10 active features (Das and Bandyopadhyay, 2009a)   • ·         POS information (adjective, verb, noun, adverb) • ·         First sentence in a topic • ·         SentiWordNet emotion word (delight…) • ·         Reduplication (so-so, good-good..) • ·         Question words (what, why…) • ·         Colloquial / Foreign words • ·         Special punctuation symbols (!,@,?..) • ·         Quoted sentence ( “you are 2 good man”) • ·         Sentence Length (>=8,<15) • ·         Emoticons ( , ,  ..)  • Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations ICON 2009

  20. CRF based Model (2/4) Feature Analysis - Frequencies - Combination of multiple features vs. single feature - Feature with passive role (e.g. First sentence in a topic) (specific phenomenon for English blog corpus) but active for Topic or user comments or title sentences of Bengali blog - Special punctuation symbols (!,@,? Etc.), their frequencies and attachments obtain 3% and 6% improvement for Bengali and English - Length of a sentence (> eight and < fifteen words per sentence) - Added each feature if its inclusion along with the pre-selected features improves accuracy - Accuracy improvement of 20.83% for Bengali and 24.33% for English over baseline model ICON 2009

  21. CRF based Model (3/4) ICON 2009

  22. CRF based Model (4/4) ICON 2009

  23. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  24. Sentence Level Tagging (1/2) • Sense_Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet for each synset in which each of the seed words appears - Average retrieved score is fixed as Sense_Tag_Weight (STW) of that particular emotion tag ICON 2009

  25. Sentence Level Tagging (2/2) • Sense_Weight_Score (SWS) for each emotion tag - SWSi=(STWi*Ni)/(∑j=1 to 7 STWj*Nj) | i Єj - SWSi is the Sentence level Sense_Weight_Score for the emotion tag i - Ni is the number of occurrences of that emotion tag in the sentence - Sentence level emotion tag SET = [maxi=1 to 7(SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSi produced zero (0) emotion score • Post-processing for handling negative words (Das and Bandyopadhyay, 2009b) ICON 2009

  26. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  27. Evaluation (1/2) • Accuracies - By counting number of sentences whose system assigned emotion tag match with the emotion tag corresponding to its emotion class ICON 2009

  28. Evaluation (2/2) • Loss in accuracies - Frequent use of metaphoric words in blogs • Bengali blogs collected from comic articles  Emotions such as “happy”, “sad”, and “surprise” are present with sufficient number in the blog corpus  Presence of adequate number of training examples for a particular emotion tag improves accuracy of that tag ICON 2009

  29. Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion ICON 2009

  30. Conclusion • Handling of metaphors • Phrase level analysis concerning genre of corpus • Document level emotion identification • More emotion annotated data - To improve the performance - Suitable for machine learning approach ICON 2009

  31. Thank you ICON 2009

  32. Questions ? ICON 2009

More Related