1 / 79

Opinion mining, sentiment analysis, and beyond

Opinion mining, sentiment analysis, and beyond. Bettina Berendt Department o f Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt / Summer School Foundations and Applications of Social Network Analysis & Mining , June 2-6, 2014, Athens, Greece.

kyrie
Télécharger la présentation

Opinion mining, sentiment analysis, and beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinion mining, sentiment analysis, and beyond Bettina Berendt Department of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ Summer School Foundations and Applications of Social Network Analysis & Mining, June 2-6, 2014, Athens, Greece ‹#›

  2. ‹#› Motivation and overviewMajor dimensions: Units of analysis, methods, featuresIssues in aspect-/sentence-oriented SASocial media: the case of tweetsEvaluationSome challenges and current research directions

  3. ‹#› Motivation and overviewMajor dimensions: Units of analysis, methods, featuresIssues in aspect-/sentence-oriented SASocial media: the case of tweetsEvaluationSome challenges and current research directions

  4. Meet sentiment analysis (1) (buzzilions.com)

  5. Aggregations (buzzilions.com) 5

  6. Meet sentiment analysis (2) 6

  7. A real-life scenario (1) • A distance-learning university offers a discussion forum for each course. • But students don‘t use it. • They opened a (public) Facebook group and discuss there. • The university wants to make sure it learns about problems with the course fast: things students don‘t like, don‘t understand, worry about, ... • Also of course things the students are happy about. • They consider using sentiment analysis for this. • What questions arise?

  8. Your answers • Go to their FB page • If it‘s not big: read it • If it is: text analysis • Access: no, it‘s public • First topic, then aspect • Put questions in the group • Problems: a lot of words • Is an adjective pos or neg? („not happy“ etc.) • Maybe students won‘t talk openly any more • Unethical not to tell you‘re the lecturer 8

  9. A field of study with many names • Opinion mining • Sentiment analysis • Sentiment mining • Subjectivity detection • ... • Often used synonymously • Some shadings in meaning • “sentiment analysis“ describes the current mainstream task best  I‘ll use this term.

  10. Goals for today • This is a very busy research area. • Even the number of survey articles is large. • It is impossible to describe all relevant research in an hour. • My aims: • Give you a broad overview of the field • Show “how it works“ with examples (high-level!), give you pointers to review articles, datasets, tools, ... • Encourage a critical view of the topic • Get you interested in reading further!

  11. The data mining problem (system) infers / constructs: “has“ Is component of (user) issues Document collection audience Document (or its parts) user topic sentiment Facet

  12. What makes people happy?

  13. Happiness in blogosphere

  14. Home alone for too many hours, all week long ... screaming child, headache, tears that just won’t let themselves loose.... and now I’ve lost my wedding band. I hate this. current mood: current mood: • Well kids, I had an awesome birthday thanks to you. =D Just wanted to so thank you for coming and thanks for the gifts and junk. =) I have many pictures and I will post them later. hearts What are the characteristic words of these two moods? [Mihalcea, R. & Liu, H. (2006). In Proc. AAAI Spring Symposium CAAW.] Slides based on Rada Mihalcea‘s presentation.

  15. Data, data preparation and learning- or: sentiment analysis is generally a form of text mining • LiveJournal.com – optional mood annotation • 10,000 blogs: • 5,000 happy entries / 5,000 sad entries • average size 175 words / entry • pre-processing – remove SGML tags, tokenization, part-of-speech tagging • quality of automatic “mood separation” • naïve bayes text classifier • five-fold cross validation • Accuracy: 79.13% (>> 50% baseline)

  16. yay 86.67 shopping 79.56 awesome 79.71 birthday 78.37 lovely 77.39 concert 74.85 cool 73.72 cute 73.20 lunch 73.02 books 73.02 Results: Corpus-derived happinessfactors goodbye 18.81 hurt 17.39 tears 14.35 cried 11.39 upset 11.12 sad 11.11 cry 10.56 died 10.07 lonely 9.50 crying 5.50 happiness factor of a word = the number of occurrences in the happy blogposts / the total frequency in the corpus

  17. Aspect-oriented sentiment analysis:It‘s not ALL good or bad Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life.

  18. Liu & Zhang‘s (2012) definition DEFINITION 1.3‘ (SENTIMENT-OPINION) A sentiment-opinion is a quin-

  19. Applications • Mainstream applications • Review-oriented search engines • Market research (companies, politicians, ...) • Improve information extraction, summarization, and question answering • Discard subjecte sentences • Show multiple viewpoints • Improve communication and HCI? • Detect flames in emails and forums • Nudge people to avoid „angry“ Facebook posts? • Augment recommender systems: downgrade items that received a lot of negative feedback • Detect web pages with sensitive content inappropriate for ads placement • ...

  20. Data sources • Review sites • Blogs • News • Microblogs From Tsytsarau & Palpanas (2012)

  21. ‹#› Motivation and overviewMajor dimensions: Units of analysis, methods, featuresIssues in aspect-/sentence-oriented SASocial media: the case of tweetsEvaluationSome challenges and current research directions

  22. The unit of analysis • community • another person • user / author • document • sentence or clause • aspect (e.g. product feature) “What makes people happy“ example Phone example

  23. The analysis method • Machine learning • Supervised • Unsupervised • Lexicon-based • Dictionary • Flat • With semantics • Corpus • Discourse analysis “What makes people happy“ example “What makes people happy“ example Phone example Phone example

  24. Features • Features: • Words (bag-of-words) • N-grams • Parts-of-speech (e.g. Adjectives and adjective-adverb combinations) • Opinion words (lexicon-based: dictionary or corpus) • Valence intensifiers and shifters (for negation); modal verbs; ... • Syntactic dependency • Feature selection based on • frequency • information gain • Odds ratio (for binary-class models) • mutual information • Feature weighting • Term presence or term frequency • Inverse document frequency ( TF.IDF) • Term position : e.g. title, first and last sentence(s)

  25. TF.IDF Features • Features: • Words (bag-of-words) • N-grams • Parts-of-speech (e.g. Adjectives and adjective-adverb combinations) • Opinion words (lexicon-based: dictionary or corpus) • Opinion shifters (for negation) • Valence intensifiers and shifters; modal verbs; ... • Syntactic dependency [? Only leave in if I find an example ?] • [? More to come !] • Feature selection based on • frequency • information gain • Odds ration (for binary-class models) • mutual information • Feature weighting • Term presence or term frequency • Inverse document frequency ( TF.IDF) • Term position (e.g. title, first and last sentence(s)) 25

  26. Motivation and overviewMajor dimensions: Units of analysis, methods, featuresIssues in aspect-/sentence-oriented SASocial media: the case of tweetsEvaluationSome challenges and current research directions

  27. Objects, aspects, opinions (1) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification

  28. Objects, aspects, opinions (2) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification • Aspect extraction

  29. Find only the aspects belonging to the high-level object • Simple idea: POS and co-occurrence • find frequent nouns / noun phrases • find the opinion words associated with them (from a dictionary: e.g. for positive good, clear, amazing) • Find infrequent nouns co-occurring with these opinion words • BUT: may find opinions on aspects of other things • Improvement (Popescu & Etzioni, 2005): meronymy • evaluate each noun phrase by computing a pointwise mutual information (PMI) score between the phrase and some meronymy discriminators associated with the product class • e.g., a scanner class: “of scanner", “scanner has", “scanner comes with", etc., which are used to find components or parts of scanners by searching the Web. • PMI(a, d) = hits(a & d) / ( hits(a) * hits(d) )

  30. Simultaneous Opinion Lexicon Expansion and Aspect Extraction • Double propagation (Qiu et al., 2009, 2011): bootstrap by tasks • extracting aspects using opinion words; • extracting aspects using the extracted aspects; • extracting opinion words using the extracted aspects; • extracting opinion words using both the given and the extracted opinion words. • Adaptation of dependency grammar: • direct dependency : one word depends on the other word without any additional words in their dependency path or they both depend on a third word directly. • POS tagging: Opinion words – adjectives; aspects - nouns or noun phrases. • Input: Seed set of opinion words • Example • “Canon G3 produces greatpictures” • Rule: `a noun on which an opinion word directly depends through mod is taken as an aspect‘  allows extraction in both directions mod

  31. Objects, aspects, opinions (3) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification • Aspect extraction • Grouping synonyms

  32. Grouping synonyms • General-purpose lexical resources provide synonym links • E.g. Wordnet • But: domain-dependent: • Movie reviews: movie ~ picture • Camera reviews: movie video; picture  photos • Careniniet al (2005): extend dictionary using the corpus • Input: taxonomy of aspects for a domain • similarity metrics defined using string similarity, synonyms and distances measured using WordNet • merge each discovered aspect expression to an aspect node in the taxonomy.

  33. WordNet

  34. Objects, aspects, opinions (4a) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with goodvoice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification • Aspect extraction • Grouping synonyms • Opinion orientation classification

  35. Objects, aspects, opinions (4b) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with goodvoice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification • Aspect extraction • Grouping synonyms • Opinion orientation classification

  36. Opinion orientation • Start from lexicon • E.g. dictionary SentiWordNet • Assign +1/-1 to opinion words, change according to valence shifters (e.g. negation: not etc.) • But clauses (“the pictures are good, but the battery life ...“) • Dictionary-based: Use semantic relations (e.g. synonyms, antonyms) • Corpus-based: • learn from labelled examples • Disadvantage: need these (expensive!) • Advantage: domain dependence

  37. Objects, aspects, opinions (5) Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Object identification • Aspect extraction • Grouping synonyms • Opinion orientation classification • Integration / coreference resolution

  38. Coreference resolution: Special characteristics in sentiment analysis • A well-studied problem in NLP • Ding & Liu (2010): object&attribute coreference • Comparative sentences and sentiment consistency: • “The Sony camera is better than the Canon camera. It is cheap too.“  It = Sony • Lightweight semantics (can be learned from corpus): • „“The picture quality of the Canon camera is very good. It is not expensive either.“  It = camera

  39. Not all sentences/clauses carry sentiment Yesterday, I bought a Nokia phone and my girlfriend bought a moto phone. We called each other when we got home. The voice on my phone was not clear. The camera was good. My girlfriend said the sound of her phone was clear. I wanted a phone with good voice quality. So I was satisfied and returned the phone to BestBuy yesterday. Small phone – small battery life. • Neutral sentiment

  40. Not all sentences/clauses in a review carry sentiment neutral “Headlong’sadaptation of George Orwell’s ‘Nineteen Eighty-Four’ is such a sense-overloadingly visceral experience that it was only the second time around, as it transfers to the West End, that I realised quite how political it was. Writer-directors […] have reconfigured Orwell’s plot, making it less about Stalinism, more about state-sponsored torture. Which makes great, queasy theatre, as Sam Crane’s frail Winston stumbles through 101 minutes of disorientating flashbacks, agonising reminisce, blinding lights, distorted roars, walls that explode in hails of sparks, […] and the almost-too-much-to-bear Room 101 section, which churns past like ‘The Prisoner’ relocated to Guantanamo Bay.[…] Crane’s traumatised Winston lives in two strangely overlapping time zones – 1984 and an unspecified present day. The former, with its two-minute hate and its sexcrime and its Ministry of Love, clearly never happened. But the present day version, in which a shattered Winston groggily staggers through a 'normal' but entirely indifferent world, is plausible. Any individual who has crossed the state – and there are some obvious examples – could go through what Orwell’s Winston went through. Second time out, it feels like an angrier and more emotionally righteous play.Some weaknesses become more apparent second time too.” positive negative? Neutral?

  41. Subjectivity detection • 2-stage process: • Classify as subjective or no • Determine polarity • A problem similar to genre analysis • e.g. Naive Bayes classifier on Wall Street Journal texts: News and Business vs. Letters to the Editor – 97% accuracy (Yu & Hatzivassiloglou, 2003) • But a much more difficult problem! (Mihalcea et al., 2007) • Overview in Wiebe et al. (2004)

  42. Motivation and overviewMajor dimensions: Units of analysis, methods, featuresIssues in aspect-/sentence-oriented SASocial media: the case of tweetsEvaluationSome challenges and current research directions

  43. Special challenges in Tweets • Very popular data source • Mostly public messages • API • But: opaque sampling (“the best 1%“) • Vocabulary, grammar, ... • Length restriction • Semantic enrichment • Hyperlinked context • Thread context • Social-network context

  44. The importance of knowing your data: ex. tokenization 44 From Potts (2013), p. 22f.

  45. Combining dictionaries, corpus-based methods, and semantic enrichment Saif et al. (2014): SentiCircles • No distinction between entities, aspects and opinion words • Inference and domain adaptation with contextual and conceptual semantics of terms • tweet sentiment = median of all terms‘ sentiments or via the nouns (entities or aspects) • One finding: “the opinion of the crowd“ helps predict “the opinion of the individual“

  46. SentiCircles: contextual semantics +1 Very Positive Positive Smile yi Smile ri C1 Term (m) θi +1 -1 Great Neutral Region X xi Great Senti-ment dictionary Prior Sentiment Very Negative Negative Degree of Correlation -1 Overall sentiment of the word m („“great“): geometric median of points X = R * COS(θ) Y= R * SIN(θ) ri = TDOC(Ci) θi = Prior_Sentiment (Ci) * π

  47. SentiCircles (Example)

  48. Enriching SentiCircles with Conceptual Semantics (using the Alchemy API for extracting entities) Cycling under a heavyrain.. What a #luck! Wind Snow influences sentiment of Weather Condition Humidity influence sentiment of

  49. Sentiment is social (Tan et al., 2011) 49 From Potts (2013), pp. 83ff.

  50. Tan et al. (2011): results • The authors also derived a predictive model for tweets and users sentiment 50 From Potts (2013), pp. 83ff.

More Related