1 / 106

NLP Group at Jadavpur University, Kolkata, India

NLP Group at Jadavpur University, Kolkata, India. Computer Science and Engineering Department Teaching Natural Language Processing to students of Undergraduate and Masters’ students in Computer Science and Engineering Laboratory projects for students Research and Development.

tad
Télécharger la présentation

NLP Group at Jadavpur University, Kolkata, India

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Group at Jadavpur University, Kolkata, India • Computer Science and Engineering Department • Teaching • Natural Language Processing to students of Undergraduate and Masters’ students in Computer Science and Engineering • Laboratory projects for students • Research and Development

  2. Research and Development in NLP • International Projects • "Strategic India-Japan Cooperative Programme-Project" in the area of multidisciplinary ICT,Project Entitled:: "Sentiment Analysis where AI meets Psychology" Research Leader in Japan:: Professor Manabu Okumura, Precision and Intelligence Laboratory; Tokyo Institute of Technology, Japan

  3. Research and Development in NLP • International Projects • "INDO-FRENCH CENTER FOR THE PROMOTION OF ADVANCED RESEARCH (IFCPAR)", Govt. of India and France Project Entitled:: "An advanced platform for question answering systems" Principal Collaborator in France:: Prof Patrick Saint Dizier, Institut de Recherche en Informatique du Toulouse, Toulouse, France

  4. Research and Development in NLP • International Projects • CONACYT-DST India  Project Entitled:: "Answer Validation through Textual Entailment".                  Principal Collaborator in Mexico:: Professor Alexander Gelbukh,  Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico

  5. Research and Development in NLP • National Projects (Consortium Mode) • Cross Lingual Information Access • Snippet and Summary Generation • Snippet Translation • English to Indian Languages Machine Translation Systems • Indian Language to Indian Languages Machine Translation Systems

  6. NLP Manpower • Doctoral Students • Statistical Machine Translation • Answer Validation through Textual Entailment (joint supervision with Prof. Alexander Gelbukh) • Opinion Mining • Emotion Analysis • Event Identification and Event – Time Analysis

  7. NLP Manpower • Masters’ Students • Multi Word Expressions • Comparative and Evaluative Question Answering Systems • Undergraduate Students

  8. Emotional Expression, Holder and Topic – The Three Vertices of an Emotion Triangle Prof. Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India

  9. Introduction • (Quan and Ren, 2009) “Opinion Mining and Sentiment Analyses have been attempted with more focused perspectives rather than fine-grained emotion” • Emotion - An aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008)

  10. Introduction • Emotion - A private state, not open to any objective observation or verification (Quirk et al., 1985) - Direct affective word (“He is really happy enough”) - Indirect notion (“Dream of music is in their eyes and hearts”) - Difficult to identify emotional stance in text - Need for Syntactic, Semantic and Pragmatic analysis of text (Polanyi and Zaenen, 2006)

  11. Introduction - Natural language text contains attitudinal information of a reader or writer with respect to some subject, event ortopic - Attitude may be - Judgment - Evaluation - of a Reader - of a Writer • “There is indeed a relationship between writer and reader emotions” (Yang et al., 2009)

  12. Emotion/Sentiment Triangle Expression Where from do we start ? Lexicon and Corpus ! Holder Topic

  13. Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation

  14. Existing Resources(English) WordNet (Miller, 1995) - Contains no emotion specific information WordNet Affect (Strapparava and Valitutti, 2004) - A resource for SemEval-2007 shared task of “Affective Text”. - In SemEval-2007, a set of words from WordNet Affect relevant to the Ekman’s(1993) six emotional labels (joy, fear, anger, sadness, disgust, surprise) SentiWordNet (Esuli and Sebastiani, 2006) - Assigns three sentiment scores such as positive, negative and objective to each synset of WordNet Subjectivity Wordlist (Baneaet al., 2008) - Assigns words with strong or weak subjectivity and prior polarities of types positive, negative and neutral

  15. Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation

  16. Updating (1/4) • /* WordNet Affect Synset */   n#10337658 fit(A) scene(B) tantrum • /* SentiWordNetSynset for A’*/ tantrum/scene/conniption/fit/burst/fit_out/equip/outfit/tally/jibe/match/correspond/gibe/agree/check/conform_to/meet/set/primed/fit_to/fit_for/convulsion/paroxysm • /* SentiWordNetSynset for B’ */ tantrum/scene/conniption/fit/scenery/view/prospect/vista/panorama/aspect/shot • /* Updated Synset E’ */ tantrum/scene/conniption/fit/burst/fit_out/equip/outfit/tally/jibe/match/correspond/gibe/agree/check/conform_to/meet/set/primed/fit_to/fit_for/convulsion/paroxysm/scenery/view/prospect/vista/panorama/aspect/shot

  17. Updating (2/4) Updating Using SentiWordNet (SW) (Esuli and Sebastiani, 2006) - Replace each word in the WordNet Affect by equivalent retrieved synsets of SentiWordNet if the synsets contain that emotion word - Part of speech (POS) information considered - Subjective score is not considered Updating Using VerbNet (VN) (Kipper-Schuler, 2005) - Largest online verb lexicon with explicitly stated syntactic and semantic information based on Levin’s verb classification - VerbNet files that are stored in an XML format contain member verbs with similar sense - Member verbs present for a specific class are sense based synonymous verbs and create verb synsets from each VerbNet class - Each word present in a verb synset (identified by “v” POS category in Wordnet Affect lists) is updated with VerbNetsynset - Duplicate Removal Strategy

  18. Updating (3/4) • Duplicate Removal If the words “A” and “B” in WordNetAffect entry “E” are replaced by the retrieved SentiWordNet synsets A’ and B’ such that A1, A2, A3, B3 є A’ and B1, B2, B3, A3 є B’ then the updated entry E’ = (A’ – B’ ) + (B’ – A’) + (A’ ∩ B’ ). The A1, A2 and A3 are the words present in the retrieved synset A’ and B1, B2, B3 are in retrieved synset B’ as extracted from SentiWordNet B1 A1 B1 B3 A1 B2 A2 B’ A B A2 B3 A3 B2 A’ A3 E’ E

  19. Updating (4/4) Table 1: Update of English WordNet Affect using SentiWordNet and VerbNet

  20. Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation

  21. Translation (1/2) • Samsad Bengali to English bilingual dictionary is available (http://home.uchicago.edu/~cbs2/banglainstruction.html) • English-to-Bengali bilingual synset based dictionary containing approximately 1,02,119 entries is being developed as part of the EILMT project (English to Indian Languages Machine Translation (EILMT) is a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India) • Convert the Affect word lists into Bengali using the dictionary followed by manual updates • Word combinations or idioms are not translated automatically • Total number of non-translated words in the six emotion lists is 210  figure is comprehensible for manual translation

  22. Translation (2/2) Example of a Translated Synset Table 2: Results of the Translation

  23. Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation

  24. Bengali-English bilingual dictionary (http://home.uchicago.edu/~cbs2/banglainstruction.html) Synonymous Word Set (SWS) <[ kruddha ] a angry; angered, enraged; wrathful; indignant …> <[ kruddha ] a SWS1;SWS2;SWS3;SWS4; …> Hypothesis: “ Two words belonging to same or different translated synsets are grouped together to form a new Bengali synset if there is at least one common English equivalent word present in any formed SWSs for those words ” Sense Disambiguation (1/3)

  25. Sense Disambiguation (2/3) SWS1 Example SWS2 Xb SWS1 Yb SWS2 Ze Example Synset

  26. Sense Disambiguation (3/3) • - Xb and Yb are two Bengali words - Cxb and Cyb are English equivalent classes of Xb and Yb Cxb= {SWS1; SWS2; …..; SWSq} Cyb= {SWS1; SWS2; …..; SWSp} • If for i = 1 to p, j = 1 to q , (SWSiSWSj) , or Ze | Ze € SWSiSWSj, - Where Ze is an equivalent English word present in any of the Synonymous Word Sets (SWS) of Cxband Cybsimultaneously - Then a new Bengali synset with XbandYbis formed New English equivalent class is formed by merging SWSs of both Cxband Cyb • Process continues until any word in Bengali translated synset remains unclassified

  27. Emotion lexicon Existing Resources Development - Updating - Translation - Sense Disambiguation Evaluation

  28. Evaluation (1/2) Manual Agreement (Cohen’s Kappa) - Measures agreement between two raters who each classify items into some mutually exclusive categories - Emotion words present in the translated Bengali synonym sets - Binary decision (Yes /No) - Agreement values from 0.44 to 0.56 gives a significantly moderate value

  29. Evaluation (2/2)

  30. Bengali WordNet Affect Lists Snapshot

  31. Resources • Emotion Lexicon - D.Das and S.Bandyopadhyay. 2010. Developing Bengali WordNet Affect for Analyzing Emotion. In the proceedings of the 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL-2010), pp. 35-40,California, USA - Y. Torii,D. Das, S. Bandyopadhyay and M. Okumura. 2011. Developing Japanese WordNet Affect for Analyzing Emotions. In the Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), 49th Annual Meeting of the Association for Computational Linguistics(ACL), Portland, USA. (Accepted)

  32. Emotion Corpus Guideline (1/3) • Random collection of 123 blog posts from Bengali web blog archive (www.amarblog.com) • Total 12,149 sentences (comics, politics, sports and short stories) • Three Annotators • No prior training was provided to the annotators • Instruction based on some illustrated annotated samples • Open source graphical tool (http://gate.ac.uk/gate/doc/releases.html)

  33. Emotion CorpusGuideline (2/3) Items for Annotation • Emotional Expression (word / phrase) • Emotion Holder • Emotion Topic • Sentential Emotion - Ekman’s (1993) six classes “anger”, “disgust”, “fear”, “joy”, “sad” and “surprise” • Sentential Intensity - Low (L) , General (G) and High (H)

  34. Emotion CorpusSnapshot (1)

  35. Emotion Corpus Snapshot (2)

  36. Emotion CorpusGuideline (3/3) • Relaxed Scheme - Annotators are free in selecting the texts spans (e.g. emotional expressions and topic) • Fixed Scheme - Annotators are given emotional items with fixed text spans (e.g. Emotion Holder, Sentential Emotion and Intensity)

  37. Agreement (1/4) • Emotional expressions are words or strings of words • Agreement is carried out between the sets of text spans selected by two annotators • Strategies - MASI (Measure of agreement on set-valued items) used in Co reference annotation (Passonneau, 2004), Semantic and pragmatic annotation (Passonneau, 2006) - agr metric (Wiebe et al., 2005) for measuring directional agreement - Cohen’s Kappa (κ) (Cohen, 1960)

  38. Agreement (2/4) Emotional Expressions (MASI, agr) Emoticons (Kappa) Sentential Emotions and Intensities (Kappa)

  39. Agreement (3/4) Emotion Holder Cohen’s kappa (κ) (Cohen, 1960) Inter Annotator Agreement IAA - If X is a set of emotion holders selected by first annotator and Y is a set of emotion holders selected by the second annotator, IAA = X ∩ Y / X U Y Highly moderate for single emotion holder Less for multiple holders Disagreement occurs mostly for satisfying implicit constraints Resolved the issues by mutual understanding Emotion Holder (Kappa), [IAA]

  40. Agreement (4/4) Emotion Topic • Topic consists of single or string of words • Scope of individual topics inside a target span is hard • Use of MASI and agr metric • Agreement for target span annotation is (≈ 0.9)  satisfactory annotation • Disagreement - Less in sentences containing single emotion topic - Selecting boundaries of topic spans - Selecting emotion topic from other relevant topics Emotion Topic (MASI), [agr]

  41. Resources • Emotion Corpus - D. Das and S. Bandyopadhyay. 2010. Labeling Emotion in Bengali Blog Corpus – A Fine Grained Tagging at Sentence Level. In the 8th Workshop on Asian Language Resources (ALR8), 23rd International Conference on Computational Linguistics (COLING 2010), pp. 47-55, August 21-22, Beijing, China

  42. Example • Johnsurprisingly narrated the actual story. Evaluative Expression :surprisingly Emotion Holder: <John> Emotion Topic : story • রাশেদঅনুভব করেছিল যে রামেরসুখঅন্তহীন । (Rashed) (anubhab) (korechilo) (je) (Ramer) (sukh)(antohin) Rashed felt that Ram’s pleasure is endless. Evaluative Expression :সুখ (sukh) ‘pleasure’ Emotion Holder: < writer, রাশেদ (Rashed), রাম(Ram)> Emotion Topic : রামের সুখ(Ramer sukh) ‘Ram’s pleasure’

  43. Salient Vertices • Evaluative Expressions (word/phrase/sentence/document level) • Holder Identification • Topic Detection

  44. Evaluative Expressions (word/phrase/sentence/document level) • Evaluative Expressions - Subjective or Objective • Subjective Expressions - Positive or Negative (Sentiment) - Beyond Sentiment or fine grained Sentiment • Emotional Expression (word or phrase) is the subjective counterpart Ekman’s (1993) six universal emotions (joy / happiness, sadness, anger, disgust, fear and surprise)

  45. Evaluative Expressions (word/phrase/sentence/document level) • (Ku et al., 2006) - Word - Phrase (Word + Context Features, e.g. intensifier, negation, conjunct) - Sentence (syntax + semantics + pragmatics) - Document • Hierarchical forward granular approach word  phrase phrase sentence sentence document word  sentence sentence  document word  document phrase  document

  46. Word Level Tagging • Baseline System - No prior knowledge regarding word features - Six separate modules for six emotion classes - Words passed through six separate modules - Tag each word with the emotion class • Baseline System + Stemming + WordNet Affect Lists - Stemming (Suffixes of Bengali Verbs depend on Tense, Aspect, and Person) - Bengali Stemmer uses suffix list and for English, porter stemmer (Porter, 1997) / WordNet Morphological Analyzer (Miller, 1990) - Evaluated using WordNet Affect lists (Strapparava and Valitutti, 2006; Das and Bandyopadhyay, 2010) - 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test sets

  47. Word Level Tagging • Machine Learning System (CRF, SVM) Features (Das and Bandyopadhyay, 2009)   ·         POS information (adjective, verb, noun, adverb) ·         First sentence in a topic ·         SentiWordNet emotion word (delight…) ·         Reduplication (so-so, good-good..) ·         Question words (what, why…) ·         Colloquial / Foreign words ·         Special punctuation symbols (!,@,?..) ·         Quoted sentence ( “you are 2 good man”) ·         Sentence Length (>=8,<15) ·         Emoticons ( , ,  ..)  • Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations

  48. Sentence Level Tagging (1/2) • Sense_Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet (Esuli and Sebastiani, 2006) for each synset in which each of the seed words appears - Fix the average retrieved score as Sense_Tag_Weight (STW) of that particular emotion tag Table 1: Sense_Tag_Weight (s)(STW) of six emotion tags

  49. Sentence Level Tagging (2/2) • Sense_Weight_Score (SWS) for each emotion type - SWSi=(STWi*Ni)/(∑j=1 to 7STWj*Nj) | iЄj - SWSi is the Sentence level Sense_Weight_Score for the emotion type i - Ni is the number of occurrences of that emotion type in the sentence - Sentence level emotion tag SET = [maxi=1 to 7(SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSiproduced zero (0) emotion score - Post-processing for handling negative words (Das and Bandyopadhyay, 2009)

  50. Document Level Tagging (1/2) • Heuristic features - Emotion tags of the title sentence - Emotion tags of the end sentence of a topic - Emotion tags assigned to an overall topic - Emotion tags for user comment portions of a document - Most frequent emotion tags identified from the document - Identical emotions that appear in the longest series of tagged sentences (Yang et al., 2007) - Emotion tags of the largest section among all of the user comments’ sections General Structure of a Bengali blog document

More Related