1 / 22

AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion

SemEval 2013 Task 2. AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion. Lee Becker, George Erhart , David Skiba , and Valentine Matula June 16, 2013. Labs. Participation. Guiding Intuitions. Boost recall of positive/negative instances (A,B)

chiara
Télécharger la présentation

AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SemEval 2013 Task 2 AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion Lee Becker, George Erhart, David Skiba, and Valentine Matula June 16, 2013 Labs

  2. Participation

  3. Guiding Intuitions • Boost recall of positive/negative instances (A,B) • Don’t worry about neutral instances (A,B) • Encode polarity cues into features (A,B) • Exploit the context (A)

  4. System Overview: Task B Constrained Polarity Lexicon Sentiment Labeled Tweets Constrained Model Feature Extraction

  5. System Overview: Task B Unconstrained Unlabeled Tweets Auto Labeled Tweets Constrained Model Expanded PolarityLexicon Unconstrained Model Feature Extraction

  6. Overview: Task A Models Polarity Lexicon Sentiment Labeled Contexts Feature Extraction Constrained Model Expanded Polarity Lexicon Sentiment Labeled Contexts Feature Extraction Unconstrained Model

  7. Preprocessing • Normalization: • URLS • @Mentions • NLP Pipeline • Written in ClearTK framework • ClearNLP Wrappers • Tokenization – preserves emoticons and URLs • POS Tagging • Lemmatization • Dependency Parsing • PTB POS -> ArkTweet POS (Gimpel, et. al. 2011) • Dependencies -> Collapsed Dependencies

  8. Resources • MPQA Subjectivity Lexicon (Wilson, Weibe and Hoffman, 2005) • Hand-Crafted Negation Word Dictionary • Hand-Crafted Emoticon Polarity Dictionary http://leebecker.com/resources/semeval-2013/

  9. Task B Features • Polarized Bag-of-Words • Easy way to double the feature space (e.g. happy & NOT_happy) Negation Window I am not too happy about this, but I’m still pumped and thrilled for tomorrow. • Features: • Token • Token + PTB POS • Token + Simplified POS • Lemma • Lemma + PTB POS • Lemma + Simplified POS

  10. Task B Features • Message Polarity Features • Word Sentiment Counts (pos|neg) • Emoticon Sentiment Counts (pos|neg) • Net word polarity • Net emoticon polarity • Microblogging Features • ALL CAPS word counts • Words with repeated characters (yaaaaay, booooo) counts • Emphasis (*yes*) • Winning Sports score (Nuggets 15-0) • PTB POS Tag counts • Collapsed Dependency Relations • Incorporated negation • Text-Text • Lemma+Simplified POS – Lemma+Simplified POS • POS - Lemma

  11. Task B: Constrained Model • LIBLinear with Logistic Regression loss function • Heavily boosted negative-polarity instances • wpositive =1 • wnegative= 25 • wneutral = 1

  12. Polarity Lexicon Expansion: Pointwise Mutual Information • Based on Semantic Orientation for Sentiment (Turney, 2002) • Intuition: Utilize co-occurrence statistics to measure words’ dependence/independence with a polarity. p(word, sentiment) p(word)p(sentiment) PMI(word, sentiment) = log2 polarity(word) = sgn(PMI(word, positive) – PMI(word, negative))

  13. Polarity Lexicon Expansion:From tweets to lexicon • Differences from Turney (2002) • Classifier output instead of seed words • Words instead of word phrases • Procedure • Applied to ~475k Unlabeled Tweets • Filtered and balanced corpus via classifier confidence score thresholds • 50,789 positive instances ( > 0.9) • 59,029 negative instances ( > 0.7) • 70,601 neutral instances ( > 0.8) • Removed: • f(word) < 10 • neutral polarity words • single character words (‘a’, ‘j’, ‘I’, etc…) • numbers (1, 20, 1000) • punctuation • Merged with MPQA subjectivity lexicon Final lexicon size: 11,740 entries

  14. Task B: Unconstrained Model • Self-trained model • ~470k constrained model produced instances • ~10k original instances • Expanded polarity lexicon • Heavily discounted neutral instances • wpositive=2 • wnegative= 5 • wneutral = 0.1

  15. Task B Results

  16. Task A: Features • Same as Task B • Polarized Bag of Words • Contextual Polarity Features • Word Sentiment Counts (pos|neg) • Emoticon Sentiment Counts (pos|neg) • Net word polarity • Net emoticon polarity • MicrobloggingFeatures • PTB POS tags • Additional Features: • Scoped Dependencies • Dependency Paths

  17. Task A Features: Scoped Dependencies • OUT_neg_nsubj(want,you) • OUT_neg(want, not) • IN_xcomp(want, miss) • IN_aux(miss, to) • OUT_tmod(miss, tomorrow) root nsubj xcomp tmod neg aux You do not want to miss this tomorrow night.

  18. Task A Features: Dependency Paths • POS Path: {NNP} dobj < {VBD} < conj {VBD} < root • Sentiment POS Path: {^/neutral} < {V/negative} < {V/negative} < {root} • In Subject: False • In Object: True root dobj conj Criminals killed Sadat and in the process they killed Egypt.

  19. Task A Models • Constrained: MPQA Subjectivity Lexicon • Unconstrained: Expanded Polarity Lexicon • LIBLinear • wpositive=11 • wnegative= 2 • wneutral = 1

  20. Task A Results

  21. Discussion • Dictionary expansion via supervised sentiment models provides a relatively simple way to expand the feature space and expand coverage. • Dependency-Based features provide additional context and richer information • Future work • Ablation studies • Better tuning of self-training

  22. Thank you! • Task 2 Organizers and Participants • SemEval 2013 Organizers • Anonymous Reviewers

More Related